h1

Using DataFabric to replace Masochism

November 25th, 2009

When we upgraded from Rails 1.x to 2.x, we also had to migrate from Masochism to DataFabric. Unfortunately, the DataFabric way of doing things requires adding a declaration to each of the model classes. For a site like TouchLocal, where there are literally hundreds of model classes, this was a daunting task.

I’ve just gone through extracting the magic that makes it possible to do this programatically, released as DataFabric::Initializer. I hope it helps someone other than me!

h1

Howto recover lost commits from a git rebase

November 23rd, 2009

Yesterday, I was working on a branch of a branch in a git repository, and I wanted to merge the last branch back to master. Following chapter 3.6 of ProGit, I ran

git rebase --onto master branch1 branch2

Unfortunately for me, I seem to have either done it from the wrong place or done the wrong thing for my situation, because git rewound branch2 to the revision that it was branched from master and all the later commits were inaccessible!

However, I was not going to give up quite so easily, and I figured that once a revision was committed, it had to be somewhere.. Fortunately for me, I manged to find this post on how to recover lost commits in a git repository – I subsequently ran:

git reflog show
# find the commit with the last set of changes in your branch -
# good commit comments are so useful!
git co b6654bd
git co -b branch2_2

With my branch recovered, I did a simple rebase of the new (old) branch onto master and merged it, and all was well.

h1

Workling support for Synchronous AMQP RabbitMQ Clients and Amazon SQS Queues

November 16th, 2009

As a part of contracting work I have been doing for TouchLocal, I have just opensourced some code I wrote to support new Workling clients. As you may know, Workling is a Rails-oriented system for performing asynchronous processing and optionally returning data from these background workers. However, because of the implementation of the original AMQP client, you could not use RabbitMQ queues from non-evented Mongrel or Phusion Passenger servers (only evented Mongrel or Thin).

Building on the work of celdee-bunny and famoseagle-carrot, I implemented a RabbitMQ workling client that could be used from within Phusion Passenger and Mongrel. The Synchronous AMQP Workling client allows RabbitMQ to be used from Workling without requiring complicated changes to deployment scenarios. Also, I implemented the Return Store functionality, so that RabbitMQ users can get data back from the workers, just like when using Starling.

Additionally, it was useful at the time to add support for an Amazon SQS Workling client, more as an exercise in testing its performance than anything else. As with the SyncAmqpClient, support for the Return Store is present. One of the discoveries in working with Amazon SQS (via the RightAws gem) was the discovery that the default key structure for Workling (which uses colon characters for segment delimiters) is not supported by Amazon. As a result, if you define the keys used for AWS configuration, even if you don’t use them, they will change the Workling key structure. This is not a problem for new implementations, but for existing deployments adding this will mean that Workling cannot see the old queues and you may not be able to access them without removing the AWS configuration… not a deal breaker, but something to be aware of.

So, the TouchLocal github account holds the version of Workling that has these two implementation for now, at least until my pull request to the main branch is accepted :)

h1

touchlocal-openx gem released

November 4th, 2009

After the last post, I spent a bit of time integrating my work (along with other upgrades) into a fork of the openx code on github and have released a gem version of it on GemCuttter.org

As you may or may not know, GemCutter.org will be the new default gem source by becoming rubygems.org. This is pretty exciting, because previously you had to have you project registered on rubygems.org in order to publish to it, or use the non-standard github gem host. Good stuff.

In any case, the new OpenX gem can be installed now by executing

sudo gem install touchlocal-openx --source "http://gemcutter.org"

# Load it using
require 'rubygems'
gem 'touchlocal-openx'
require 'openx'

In Rails, include it like this in your Rails::Initializer block:

  config.gem "touchlocal-openx", :lib => "openx", :source => "http://gemcutter.org"

I’m a proud parent!

h1

Minimum OpenX XMLRPC Ruby Client

September 29th, 2009

2009-11-04 Update: Gem version released

For TouchLocal I am currently reworking some of the internal advertising systems to use OpenX. That way we get better reporting and better reliability, but we can use things like OpenX Direct Selection to make the best use of our existing infrastructure.

While there are 2 Ruby projects (1 a more recent fork of the other), both are oriented around the API for administering OpenX rather than serving ads via the API. So, here is the minimum you need to do from Ruby to get a banner served:

(Based on http://www.openx.org/en/docs/tutorials/Advanced+XML-RPC)

require 'xmlrpc/client'
# The settings are the  HTTP Headers - the PHP client sets many, but this is the minimum requirement
settings = {:cookies => [], :remote_addr => 'localhost'}

# 'what' in our case is a Direct Selection
params = [
  what = 'Plumber',
  campaignId = 0,
  target = '',
  source = '',
  withText = false,
  xmlContext = [{'!=' => 'campaignid:2'}]
]

b = nil
begin
  server = XMLRPC::Client.new2("http://www.example.com/www/delivery/axmlrpc.php")
  b = server.call('openads.view', settings, *params)
  b['html']
rescue Exception => e
  puts e.message
end

In particular, note the content of the xmlContext parameter – that took me quite a few hours of digging through the PHP to work out. It’s an array of hashes, that use the keys “==” and “!=” to include or exclude banners based on the values. The values must be of the form “type:id”, with type being one of “campaignid”, “clientid”, “companionid”, and “bannerdid” (although technically, anything other than the first 3 is treated as a banner ID as of version 2.8.1 of OpenX)

I hope that saves someone the pain that I just went through to discover it!

h1

can’t dup NilClass

May 11th, 2009

I’m in the middle of upgrading an old Rails 1.2.6 app to Rails 2.3, and all of a sudden when logging in as a user, I get this:

TypeError in BusinessController#view

can't dup NilClass

C:/development/InstantRails-2.0-win/ruby/lib/ruby/gems/1.8/gems/activerecord-2.3.2/lib/active_record/base.rb:2189:in `dup'
C:/development/InstantRails-2.0-win/ruby/lib/ruby/gems/1.8/gems/activerecord-2.3.2/lib/active_record/base.rb:2189:in `scoped_methods'
C:/development/InstantRails-2.0-win/ruby/lib/ruby/gems/1.8/gems/activerecord-2.3.2/lib/active_record/base.rb:2193:in `current_scoped_methods'
C:/development/InstantRails-2.0-win/ruby/lib/ruby/gems/1.8/gems/activerecord-2.3.2/lib/active_record/base.rb:2183:in `scope'
C:/development/InstantRails-2.0-win/ruby/lib/ruby/gems/1.8/gems/activerecord-2.3.2/lib/active_record/base.rb:1548:in `find_every'
C:/development/InstantRails-2.0-win/ruby/lib/ruby/gems/1.8/gems/activerecord-2.3.2/lib/active_record/base.rb:1588:in `find_one'
C:/development/InstantRails-2.0-win/ruby/lib/ruby/gems/1.8/gems/activerecord-2.3.2/lib/active_record/base.rb:1574:in `find_from_ids'
C:/development/InstantRails-2.0-win/ruby/lib/ruby/gems/1.8/gems/activerecord-2.3.2/lib/active_record/base.rb:616:in `find'
C:/development/InstantRails-2.0-win/ruby/lib/ruby/gems/1.8/gems/activerecord-2.3.2/lib/active_record/associations/belongs_to_association.rb:44:in `find_target'
C:/development/InstantRails-2.0-win/ruby/lib/ruby/gems/1.8/gems/activerecord-2.3.2/lib/active_record/associations/association_proxy.rb:240:in `load_target'
C:/development/InstantRails-2.0-win/ruby/lib/ruby/gems/1.8/gems/activerecord-2.3.2/lib/active_record/associations/association_proxy.rb:112:in `reload'
C:/development/InstantRails-2.0-win/ruby/lib/ruby/gems/1.8/gems/activerecord-2.3.2/lib/active_record/associations.rb:1231:in `user'
C:/development/community_s0024_upgrade_upgrade/webapps/touchlocal/app/controllers/application_controller.rb:178:in `check_ticket_and_session'

Of course, that error means nothing as it is. What the underlying cause is though, is that the class reloader is trying to unload a class but it can’t because of included modules that can’t be unloaded. The fix is to use the keyword unloadable in the model to tell the reloader to force a reload on each request:

class User < ActiveRecord::Base
  unloadable

  ...
end

All fixed!

h1

On comments

April 15th, 2009

Blog comments. I used to like them. Some of my blog posts even have threads of helpful and useful feedback left by you, dear reader. Or should I say, some of you.

More recently I have started receiving a new sort of comment. The specious or outright false, hit-and-run attacks from anonymous morons who have somehow decided that the access to a keyboard and a place to exercise it allows them to write whatever they want. Well not here.

Should have listened to Alex Payne earlier…

If you want to reply to me, trackbacks are still enabled (for now)…

h1

Ruby Daemons and Vendoring

April 8th, 2009

In the past in order to gather server side usage stats, we have had a procession of different methods to track and record this data in a way that was outside the main app. The reason for this separation is to ensure that search results on TouchLocal were as quick as possible while still recording the details. The first attempt was to use Stomp, which was interesting but we had problems with stability. After that, and for a long time, we had a backend Merb application that was sent the tracking information from the webserver and returned immediately. While this was quick and pretty fast, it had the downside of the HTTP request from the main site which would time out and make the site crawl to a halt if the backend processes were not responding for whetever reason.

As a result, we went on a third rewrite effort. This time we went back to basics, and decided to use log4r to output a special log format that would be parsed and inserted into the tracking tables aynchronously. This would imply a delay, but only in the vicinity of a few minutes. The main design problems were ensuring stability of the processing platform and also the ability to retry files that might have been missed or errored. The repeatability was achieved by using a combination of a hash of the tracking data + the timestamp + some random information, and a (large) table that tracks this timestamps and hash combination (given that hashes can sometimes collide, it makes sense to add the timestamp as a factor).

The backend process to parse and record the information from this log file format was written in Ruby of course. In order to achieve stability I decided to use the Ruby Daemons gem. This handles PID file management and lots of other neat things for writing a daemon so the basics of long-running processes were not my ‘problem’ as such. So that these Ruby processes could scale up, I ensured as I was writing it that it would be aware of the potential error conditions of multiple processes, such as one process moving a file when another was looking for it (race conditions, etc). While the Daemons gem uses the fork() implementation on *nix for the standard headless run mode, it also support a non-background run mode which works on Windows. I also chose to use ActiveRecord, reusing the AR models from the Merb application.

One of my personal goals was to ensure the daemon was as self-contained as possible. For me, this meant that I wanted the Sysadmin to be able to check it out and run the start command and have it work on a standard base Ruby installation.

Here’s the initialisation code for the Daemon:

################
### REQUIRES ###
################

# This loads gems in vendor/gems (abstracted out so it can be used in rake tasks that haven't loaded ENV yet)
require "lib/local_gem_loader"

require 'rubygems'
require 'daemons'
require 'activerecord'
require 'json/pure'

require 'erb'
require 'cgi'

# +require+ all the models
model_path = File.expand_path(File.join(File.dirname(__FILE__), 'models'))
$LOAD_PATH.unshift model_path
Dir.glob(File.join(model_path, '*.rb')) do |file|
  require file
end

# Log4r does not work because the Daemons gem closes all open file descriptors.
#require 'log4r'
#require 'config/logger'
#logger = ::DEFAULT_LOGGER

#################
### CONSTANTS ###
#################

SLEEP_TIME_SECONDS = 5
FILE_NAME = 'tracking_daemon.rb' # name to report as the process

#####################
### CONFIGURATION ###
#####################

database_configuration_file = File.expand_path(File.join(File.dirname(__FILE__), 'config', 'database.yml'))
database_configuration = YAML::load(ERB.new(IO.read(database_configuration_file)).result)

######################
### INITIALISATION ###
######################

# Parse out the RAILS_ENV=production setting
ARGV.each do |arg|
  if arg.include?('=')
    key, val = arg.split('=', 2)
    ENV[key] ||= val
  elsif database_configuration.keys.include?(arg)
    ENV['RAILS_ENV'] ||= arg
  end
end
RAILS_ENV = (ENV['RAILS_ENV'] || "development").dup
puts "Starting #{FILE_NAME} daemon in #{RAILS_ENV} mode"

ActiveRecord::Base.configurations = database_configuration
ActiveRecord::Base.establish_connection RAILS_ENV

root_files_path       = File.expand_path(File.join(File.dirname(__FILE__), 'files'))
incoming_files_path   = File.expand_path(File.join(root_files_path, 'incoming_files'))

options = {
             :multiple   => true,
             :ontop      => false,
             :backtrace  => true,
             :log_output => true,
             :monitor    => true
           }

##############
### DAEMON ###
##############

Daemons.run_proc(FILE_NAME, options) do
  loop do
    # 1. Get the next file to process
    Dir.glob(File.join(incoming_files_path, '*')) do |incoming_file|
    end

    #...

    puts "Sleep #{SLEEP_TIME_SECONDS} sec" if RAILS_ENV == "development"
    sleep(SLEEP_TIME_SECONDS)
  end
end

The local_gem_loader is something of my own invention, based on the vendor/gems loader that was introduced in Rails 2. I wrote it initially to enable our (then) Rails 1.2.6 app to have vendored gems. It was very useful here to allow me to meet my desire to have this thing be checked out from SVN and started. Here it is – it’s pretty simple really:

# Load the gems in /vendor/gems
standard_dirs = ['rails', 'plugins']
gems          = Dir[File.join(__FILE__, "vendor/*/**") ]
if gems.any?
  gems.each do |dir|
    next if standard_dirs.include?(File.basename(dir))
    lib = File.join(dir, 'lib')
    $LOAD_PATH.unshift(lib) if File.directory?(lib)
    src = File.join(dir, 'src')
    $LOAD_PATH.unshift(src) if File.directory?(src)
  end
end

After including that line, I was able to vendor all the gems I needed (activerecord, json-pure, and even daemons) in the vendor/gems directory I created. After that, the ./models directory is loaded with the lines

# +require+ all the models
model_path = File.expand_path(File.join(File.dirname(__FILE__), 'models'))
$LOAD_PATH.unshift model_path
Dir.glob(File.join(model_path, '*.rb')) do |file|
  require file
end

Also note that as per the behaviour of how Daemons is designed, as it starts it closes all open file descriptors. While I read this in the documentation, I still tried to integrate Log4r, and spent a very confused hour wondering why all my log files were erroring on write… anyhoo…

After ensuring the models are in the load path and are ready to go, ActiveRecord needs to be initialised. I added the ability to at runtime choose the environment to write to database-wise, just like Rails does. This is achieved here:

# First load the Rails config/database.yml
database_configuration_file = File.expand_path(File.join(File.dirname(__FILE__), 'config', 'database.yml'))
database_configuration = YAML::load(ERB.new(IO.read(database_configuration_file)).result)

# Parse out the RAILS_ENV=production setting, which can be either in the form
# ruby tracking_daemon.rb start RAILS_ENV=production or
# ruby tracking_daemon.rb start production
# Note that the environments allowed are vaildated against the ones available in the database.yml
ARGV.each do |arg|
  if arg.include?('=')
    key, val = arg.split('=', 2)
    ENV[key] ||= val
  elsif database_configuration.keys.include?(arg)
    ENV['RAILS_ENV'] ||= arg
  end
end

RAILS_ENV = (ENV['RAILS_ENV'] || "development").dup
puts "Starting #{FILE_NAME} daemon in #{RAILS_ENV} mode"

ActiveRecord::Base.configurations = database_configuration
ActiveRecord::Base.establish_connection RAILS_ENV

At the end of this we have loaded the database config and connected to the database. Happy days. The only thing left is to start the daemon, which I chose to do in an inline fashion. Note that Daemons allows you to have the process report whatever name you like in the process lists, but I went with the name of the file itself for clarity:

root_files_path       = File.expand_path(File.join(File.dirname(__FILE__), 'files'))
incoming_files_path   = File.expand_path(File.join(root_files_path, 'incoming_files'))

options = {
             :multiple   => true, # allow multiple concurrent of the same
             :ontop      => false, # daemonise
             :backtrace  => true, # show full failure info
             :log_output => true,
             :monitor    => true # instantiate a monitor to restart as required
           }

Daemons.run_proc(FILE_NAME, options) do
  loop do
    # 1. Get the next file to process
    Dir.glob(File.join(incoming_files_path, '*')) do |incoming_file|
    end

    #...

    puts "Sleep #{SLEEP_TIME_SECONDS} sec" if RAILS_ENV == "development"
    sleep(SLEEP_TIME_SECONDS)
  end
end

All in all it’s been a great success – what was previously using a few backend servers running full whack to process all the incoming information, was now using 2 daemons on a single server in each datacentre. They hardly even show up on the top list. Excellent stuff.

h1

Boolean.parse

April 6th, 2009

So often in Ruby on Rails I find myself needing to convert a variable to a Boolean, and there is no built in way of doing it. So, how about this:

class Boolean
  def self.parse(obj)
    %w(true t 1 y).include?(obj.to_s.downcase.strip)
  end
end

We use this at work and it’s a nice neat way of ensuring non-repetitive code reuse :)

h1

Howto get capistrano-ext to run migrations as the correct environment

March 18th, 2009

If like me, you use the Capistrano Multistage extensions in the capistrano-ext plugin, you’d have noticed to your horror at some point that if you deploy to an environment other than production, your migrations will still run against the production database.

As the extension has no native support for this change in behaviour, the following method override will do the trick:

namespace :deploy do
  desc "Invoke the db migration using the correct stage"
  task :migrate, :roles => :app do
    send(run_method, "cd #{release_path} && rake db:migrate RAILS_ENV=#{stage} ")
  end
end

While I’m here, here’s a simple trick I came up with: at work we use SVN branches heavily, some for each Sprint and then for Releases. As each release has its own branch, we need to change the Capistrano repository URL for each release, which is tedious and easy to forget.

Instead of the manual change, why not just ask SVN to tell you the current URL for the current path, which is in our case the root of the app? If, during the cap deploy, we run some ruby code to call ‘svn info’, like so (the first line is not necessary – it’s just there for context):

set :scm, :subversion
lines = %x[svn info]

The ‘svn info’ command obviously relies on the svn executable being in your path, but it is required for the rest of deployment so I’ll assume that’s already taken care of. The output of the command is something like:

Path: .
URL: https://svn.example.com/svn/project/branches/release_12345
Repository Root: https://svn.example.com/svn
Repository UUID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
Revision: 1359
Node Kind: directory
Schedule: normal
Last Changed Author: dan
Last Changed Rev: 13582
Last Changed Date: 2009-03-17 22:34:44 +0000 (Thu, 19 Feb 2009)

the %x[] command takes this output and puts it in a long string with line breaks. Then, because this output is well formed, we can use a little regex-fu to get out the parh of the repository, like so (full commands):

lines = %x[svn info]
set :repository, /URL\: (.*)\n/.match(lines)[1]

The regular expression we use here is looking for a string that is between “URL: ” and “\n”, which as you would see from the output above, is:


https://svn.example.com/svn/project/branches/release_12345

Just what we were after!




-