Ruby Daemons and Vendoring

April 8th, 2009

In the past in order to gather server side usage stats, we have had a procession of different methods to track and record this data in a way that was outside the main app. The reason for this separation is to ensure that search results on TouchLocal were as quick as possible while still recording the details. The first attempt was to use Stomp, which was interesting but we had problems with stability. After that, and for a long time, we had a backend Merb application that was sent the tracking information from the webserver and returned immediately. While this was quick and pretty fast, it had the downside of the HTTP request from the main site which would time out and make the site crawl to a halt if the backend processes were not responding for whetever reason.

As a result, we went on a third rewrite effort. This time we went back to basics, and decided to use log4r to output a special log format that would be parsed and inserted into the tracking tables aynchronously. This would imply a delay, but only in the vicinity of a few minutes. The main design problems were ensuring stability of the processing platform and also the ability to retry files that might have been missed or errored. The repeatability was achieved by using a combination of a hash of the tracking data + the timestamp + some random information, and a (large) table that tracks this timestamps and hash combination (given that hashes can sometimes collide, it makes sense to add the timestamp as a factor).

The backend process to parse and record the information from this log file format was written in Ruby of course. In order to achieve stability I decided to use the Ruby Daemons gem. This handles PID file management and lots of other neat things for writing a daemon so the basics of long-running processes were not my ‘problem’ as such. So that these Ruby processes could scale up, I ensured as I was writing it that it would be aware of the potential error conditions of multiple processes, such as one process moving a file when another was looking for it (race conditions, etc). While the Daemons gem uses the fork() implementation on *nix for the standard headless run mode, it also support a non-background run mode which works on Windows. I also chose to use ActiveRecord, reusing the AR models from the Merb application.

One of my personal goals was to ensure the daemon was as self-contained as possible. For me, this meant that I wanted the Sysadmin to be able to check it out and run the start command and have it work on a standard base Ruby installation.

Here’s the initialisation code for the Daemon:

################
### REQUIRES ###
################

# This loads gems in vendor/gems (abstracted out so it can be used in rake tasks that haven't loaded ENV yet)
require "lib/local_gem_loader"

require 'rubygems'
require 'daemons'
require 'activerecord'
require 'json/pure'

require 'erb'
require 'cgi'

# +require+ all the models
model_path = File.expand_path(File.join(File.dirname(__FILE__), 'models'))
$LOAD_PATH.unshift model_path
Dir.glob(File.join(model_path, '*.rb')) do |file|
  require file
end


# Log4r does not work because the Daemons gem closes all open file descriptors.
#require 'log4r'
#require 'config/logger'
#logger = ::DEFAULT_LOGGER

#################
### CONSTANTS ###
#################

SLEEP_TIME_SECONDS = 5
FILE_NAME = 'tracking_daemon.rb' # name to report as the process

#####################
### CONFIGURATION ###
#####################

database_configuration_file = File.expand_path(File.join(File.dirname(__FILE__), 'config', 'database.yml'))
database_configuration = YAML::load(ERB.new(IO.read(database_configuration_file)).result)

######################
### INITIALISATION ###
######################

# Parse out the RAILS_ENV=production setting
ARGV.each do |arg|
  if arg.include?('=')
    key, val = arg.split('=', 2)
    ENV[key] ||= val
  elsif database_configuration.keys.include?(arg)
    ENV['RAILS_ENV'] ||= arg
  end
end
RAILS_ENV = (ENV['RAILS_ENV'] || "development").dup
puts "Starting #{FILE_NAME} daemon in #{RAILS_ENV} mode"


ActiveRecord::Base.configurations = database_configuration
ActiveRecord::Base.establish_connection RAILS_ENV

root_files_path       = File.expand_path(File.join(File.dirname(__FILE__), 'files'))
incoming_files_path   = File.expand_path(File.join(root_files_path, 'incoming_files'))

options = {
             :multiple   => true,
             :ontop      => false,
             :backtrace  => true,
             :log_output => true,
             :monitor    => true
           }

##############
### DAEMON ###
##############

Daemons.run_proc(FILE_NAME, options) do
  loop do
    # 1. Get the next file to process
    Dir.glob(File.join(incoming_files_path, '*')) do |incoming_file|
    end
    
    #...

    puts "Sleep #{SLEEP_TIME_SECONDS} sec" if RAILS_ENV == "development"
    sleep(SLEEP_TIME_SECONDS)
  end
end

The local_gem_loader is something of my own invention, based on the vendor/gems loader that was introduced in Rails 2. I wrote it initially to enable our (then) Rails 1.2.6 app to have vendored gems. It was very useful here to allow me to meet my desire to have this thing be checked out from SVN and started. Here it is – it’s pretty simple really:

# Load the gems in /vendor/gems
standard_dirs = ['rails', 'plugins']
gems          = Dir[File.join(__FILE__, "vendor/*/**") ]
if gems.any?
  gems.each do |dir|
    next if standard_dirs.include?(File.basename(dir))
    lib = File.join(dir, 'lib')
    $LOAD_PATH.unshift(lib) if File.directory?(lib)
    src = File.join(dir, 'src')
    $LOAD_PATH.unshift(src) if File.directory?(src)
  end
end

After including that line, I was able to vendor all the gems I needed (activerecord, json-pure, and even daemons) in the vendor/gems directory I created. After that, the ./models directory is loaded with the lines

# +require+ all the models
model_path = File.expand_path(File.join(File.dirname(__FILE__), 'models'))
$LOAD_PATH.unshift model_path
Dir.glob(File.join(model_path, '*.rb')) do |file|
  require file
end

Also note that as per the behaviour of how Daemons is designed, as it starts it closes all open file descriptors. While I read this in the documentation, I still tried to integrate Log4r, and spent a very confused hour wondering why all my log files were erroring on write… anyhoo…

After ensuring the models are in the load path and are ready to go, ActiveRecord needs to be initialised. I added the ability to at runtime choose the environment to write to database-wise, just like Rails does. This is achieved here:

# First load the Rails config/database.yml
database_configuration_file = File.expand_path(File.join(File.dirname(__FILE__), 'config', 'database.yml'))
database_configuration = YAML::load(ERB.new(IO.read(database_configuration_file)).result)

# Parse out the RAILS_ENV=production setting, which can be either in the form
# ruby tracking_daemon.rb start RAILS_ENV=production or
# ruby tracking_daemon.rb start production
# Note that the environments allowed are vaildated against the ones available in the database.yml
ARGV.each do |arg|
  if arg.include?('=')
    key, val = arg.split('=', 2)
    ENV[key] ||= val
  elsif database_configuration.keys.include?(arg)
    ENV['RAILS_ENV'] ||= arg
  end
end

RAILS_ENV = (ENV['RAILS_ENV'] || "development").dup
puts "Starting #{FILE_NAME} daemon in #{RAILS_ENV} mode"

ActiveRecord::Base.configurations = database_configuration
ActiveRecord::Base.establish_connection RAILS_ENV

At the end of this we have loaded the database config and connected to the database. Happy days. The only thing left is to start the daemon, which I chose to do in an inline fashion. Note that Daemons allows you to have the process report whatever name you like in the process lists, but I went with the name of the file itself for clarity:

root_files_path       = File.expand_path(File.join(File.dirname(__FILE__), 'files'))
incoming_files_path   = File.expand_path(File.join(root_files_path, 'incoming_files'))

options = {
             :multiple   => true, # allow multiple concurrent of the same
             :ontop      => false, # daemonise
             :backtrace  => true, # show full failure info
             :log_output => true, 
             :monitor    => true # instantiate a monitor to restart as required
           }

Daemons.run_proc(FILE_NAME, options) do
  loop do
    # 1. Get the next file to process
    Dir.glob(File.join(incoming_files_path, '*')) do |incoming_file|
    end
    
    #...

    puts "Sleep #{SLEEP_TIME_SECONDS} sec" if RAILS_ENV == "development"
    sleep(SLEEP_TIME_SECONDS)
  end
end

All in all it’s been a great success – what was previously using a few backend servers running full whack to process all the incoming information, was now using 2 daemons on a single server in each datacentre. They hardly even show up on the top list. Excellent stuff.

Boolean.parse

April 6th, 2009

So often in Ruby on Rails I find myself needing to convert a variable to a Boolean, and there is no built in way of doing it. So, how about this:

class Boolean
  def self.parse(obj)
    %w(true t 1 y).include?(obj.to_s.downcase.strip)
  end
end

We use this at work and it’s a nice neat way of ensuring non-repetitive code reuse :)

Howto get capistrano-ext to run migrations as the correct environment

March 18th, 2009

If like me, you use the Capistrano Multistage extensions in the capistrano-ext plugin, you’d have noticed to your horror at some point that if you deploy to an environment other than production, your migrations will still run against the production database.

As the extension has no native support for this change in behaviour, the following method override will do the trick:

namespace :deploy do
  desc "Invoke the db migration using the correct stage"
  task :migrate, :roles => :app do
    send(run_method, "cd #{release_path} && rake db:migrate RAILS_ENV=#{stage} ")
  end
end

While I’m here, here’s a simple trick I came up with: at work we use SVN branches heavily, some for each Sprint and then for Releases. As each release has its own branch, we need to change the Capistrano repository URL for each release, which is tedious and easy to forget.

Instead of the manual change, why not just ask SVN to tell you the current URL for the current path, which is in our case the root of the app? If, during the cap deploy, we run some ruby code to call ‘svn info’, like so (the first line is not necessary – it’s just there for context):

set :scm, :subversion
lines = %x[svn info]

The ‘svn info’ command obviously relies on the svn executable being in your path, but it is required for the rest of deployment so I’ll assume that’s already taken care of. The output of the command is something like:

Path: .
URL: https://svn.example.com/svn/project/branches/release_12345
Repository Root: https://svn.example.com/svn
Repository UUID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
Revision: 1359
Node Kind: directory
Schedule: normal
Last Changed Author: dan
Last Changed Rev: 13582
Last Changed Date: 2009-03-17 22:34:44 +0000 (Thu, 19 Feb 2009)

the %x[] command takes this output and puts it in a long string with line breaks. Then, because this output is well formed, we can use a little regex-fu to get out the parh of the repository, like so (full commands):

lines = %x[svn info]
set :repository, /URL\: (.*)\n/.match(lines)[1]

The regular expression we use here is looking for a string that is between “URL: ” and “\n”, which as you would see from the output above, is:

https://svn.example.com/svn/project/branches/release_12345

Just what we were after!

Running one migration by hand

March 25th, 2008

Sometimes, you want to run a single migration by hand (to test it maybe…)

You can do this from the command line:


ruby script/runner 'require "db/migrate/005_create_blogs"; CreateBlogs.migrate(:down)'
ruby script/runner 'require "db/migrate/005_create_blogs"; CreateBlogs.migrate(:up)'

DRYing up Actions with a page_errorhandler

March 9th, 2008

When I am developing page actions, I quite often find myself approaching a standard get/post action in the same way. This is particularly because I want to validate a number of ActiveRecord models, only progressing if they are ALL valid, and showing all errors at once

Of course, that means that I need to have a standard way of doing this, and a standard way of reporting it. It also implies the use of transactions, so that all saves are rolled back on failure. I have

So to begin with, an (hypothetical) example of my approach is:


def edit_user
@user = User.find(session[:user_id])
@address = @user.address
if request.post?
begin
# note that I do not want to reference the objects in the
# method call - so that the changes are available to render
@user.transaction do
results = []
results << @user.save
results << @address.save
raise "Validation failed" if results.include?(false)
end
redirect_to :action => :show_user and return
rescue Exception
logger.warn{"Transaction terminated : #{e.message}"}
logger.warn{"@user : #{(@user.errors.full_messages.join('; ') rescue nil)}"}
logger.warn{"@address : #{(@address.errors.full_messages.join('; ') rescue nil)}"}
logger.debug{e.backtrace}
end
end
end

The first thing that is useful about this is that because all objects are saved together and the transaction is not terminated unless ONE of them is invalid, we will not redirect and the edit_user template will be rendered. Also, the validation errors are written to the log “just in case”

Also, note that I am using the log4r syntax of using braces instead of brackets. In Log4r, if the logger is not logging at the level that is specified, the code inside the braces will not be run. In the example above, if we are in production mode and we are not logging DEBUG, the e.backtrace method is actually not executed.

From here, lets DRY it up.

First, there’s those validation messages. There’s quite a lot of code here. I’ve approached this from 2 angles. First, in environment.rb, I put this code in


class ActiveRecord::Base
def full_messages
self.errors.full_messages.join('; ') rescue nil
end
end

and in application.rb

def validation_message(*objects)
str = "Validation error :"
objects.each{ |obj| str << "\n#{obj.class.name} => #{obj.full_messages}" }
str
end

This means that the validation message can now be handled by simply doing this in the rescue block:


logger.warn{validation_message(@user, @address)}

Next, because I use this structure regularly, I created a page_errorhandler method to contain the exception block that does this:


def page_errorhandler(*objects, &block)
begin
yield
rescue Exception => e
logger.warn{"Transaction terminated : #{e.message}"}
logger.warn{validation_message(*objects)}
logger.debug{e.backtrace}
end
end

Combining these with our method above:


def edit_user
@user = User.find(session[:user_id])
@address = @user.address
if request.post?
page_errorhandler(@user, @address) do
# note that I do not want to reference the objects in the
# method call - so that the changes are available to render
@user.transaction do
results = []
results << @user.save results << @address.save raise "Validation failed" if results.include?(false) end redirect_to :action => :show_user and return
end
end
end

Neatens things up nicely!

Master Pages for Rails (or, Heirarchial Layouts)

March 7th, 2008

One of the things that I like about the .net web application setup is the concept of Master Pages. The most common usage of this is if you have a Home Page that has a slightly different layot from the rest of the site, but you want to share the base DIV structure and assets without having to copy them into another layout… Keep it DRY!

Although there is no “out of the box” solution for sharing common layout setups in Rails except for using Partials, there is in fact a way of replicating this functionality. First, you need to be familiar with the ActionView::Helpers::CaptureHelper class and the methods in it.

So, what you do is create a container rhtml layout that contains the common layout elements, such as the base HTML and div structure.

container.rhtml
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<body>
<%= yield :layout -%>
</div>
</body>
</html>

From there, you can create as many other layouts that reference that container, such as one called application.rhtml:
<% @content_for_layout = capture do %>
<div id='a_new_div' />
<%= yield :layout %>
<% end %>
<%= render 'layouts/container', { 'content_for_layout' => @content_for_layout } %>

Note what happens there – the capture method is used to catch the rendering of this layout file. Then, instead of just allowing it to render, this output is injected into the container.rhtml file. In this way, we have chained the inner layout (application.rhtml) into the outer (container.rhtml) layout.

In our example, the “<div id=’a_new_div’ />” will be injected, along with the data from the view, into the container. There is nothing stopping you from doing this to another layer, although I doubt how often you’d need to do that!

From our example above, we could also easily create a home.rhtml file that had slightly different layout from our application.rhtml file and use that from our HomeController without concern – all of the pages would then render in the correct layout while giving us the benefit of a shared outer container!

I have used this in production, and it works a treat both in terms of performance and code maintainability.

Guide to Electronic Music

August 11th, 2007

I found this Guide to Electronic Music – it’s so comprehensive it makes my head spin… but now when someone ask me what I think of weird music variants I’ll be able to give them an answer ;)

ZFS Boot with NextentaOS

August 9th, 2007

Little bit behind the 8-ball on posting this, but I came across this post at AspiringSysadmin.com – ZFS boot without having to manually set it up like you currently have to do with OpenSolaris.

NexentaOS is a GNU operating system that uses apt on top of the OpenSolaris kernel and runtime. It looks to me as though the www.gnusolaris.org site has been abandoned for the GenUnix NextentaOS site, which is a shame because it’s damn hard to find on Google. The GenUnix site actually also hosts the Belenix and Schillix sites as well, with Belenix a LiveCD, and Schillix a rebuilt OpenSolaris distro.

The AspringSysadmin went into how to set NextentaOS in VMWare – my quest now is to combine VMWare, NextentaOS ZFS Boot and 2x750GB drives in RaidZ configuration… I hope it works!

JRuby-OpenSSL and HTTP request timeouts (in ActiveSalesforce)

July 20th, 2007

So, JRuby looks pretty cool, and I’ve got some code written that uses a weird BouncyCastle encryption implementation going in my rails app. Works well, the WAR deployment works (with goldspike) and everything’s going swimmingly.

Until, of course, I tried to integrate with Another Project that uses the ActiveSalesforce. Behind the scenes, ActiveSalesforce makes SSL encrypted connections to Salesforce API endpoints and performs operations. This was working fine until I started the app under JRuby. All of a sudden, the salesforce connections stopped working.

A bit of digging later, and I discovered that the JRuby-OpenSSL implementation is causing reads of the HTTP responses to stall. JRuby HTTP is fine, Ruby HTTPS is fine, but JRuby HTTPS is not. Boo.

It’s a bug, now. I hope it’s fixed soon!

Integrating Log4r and Ruby on Rails

June 16th, 2007

Aaaaaaaages ago, I wrote a message on the mailing list (before it moved to Google Groups!) about how to integrate Rails and Log4r. Since then a little bit has changed and that way may or may not entirely work any more. Since then, aaaaages ago Jason Rimmer asked me to update so that it’s all new and fresh, but I completely forgot in the move to the UK (very sorry Jason!). So here it is.

I’ve got a few outputters. One that acts like the default outputter, that writes “development.log” and so on. Then another that outputs to standard error for console lovin’ (in dev mode). Then another that uses a date file outputter to automatically roll over logs every day (for production mode), and finally an Email outputter that only runs in production and sends an email of the log for ERROR and FATAL log levels.

Log4r Rails configuration files

The first bit is the configuration YAML file, which is used to configure the loggers. Then there is logger.rb, which turns on and off the outputters as required. The final part is to include this logger.rb into the application configuration.

It is VERY IMPORTANT that you include the file before the call to the Rails::Initializer.run do block. This is because in this section of code the RAILS_DEFAULT_LOGGER is initialised, and if we don’t get in before that, we won’t get our logger injected into the Rails framework stack. So, configure it like this:

require File.join(File.dirname(__FILE__), 'boot')

require File.expand_path(File.dirname(__FILE__) + "/logger")

Rails::Initializer.run do |config|
...

Just drop the require line in there and it will load logger.rb, which loads log4r.yaml, and everything is up and going. You’ll see friendly [DEBUG] lines in your console and everthing! Of course, I prefer verbose logging on the console in development; you may not, customise by reading the log4r manual. Of course, if you expect your error mails to be delivered, change the SMTP server settings at the bottom of the yaml file.

Sorry for the delay Jason!