Ruby append (<<) versus concat (+)

A lot of developers know the dangers of concatenation with Strings and objects. In Java we had the StringBuffer.append vs. + (and now StringBuilder) knowledge transfer. Ruby has the same issue, and people have talked about it before. We ran into this issue in one of our projects, and I remember Dave Thomas talk about a problem that was fixed by moving from string concatenation to putting the contents on an array. I think this benchmark says it all: require 'benchmark' Benchmark.bm do |x| x.report do a = 'foo' 100000.times { a += ' foo' } end x.report do a = 'foo' 100000.times { a Output dion@stewart [~]$ ruby t.rb user system total real 13.790000 25.180000 38.970000 ( 40.102451) 0.060000 0.000000 0.060000 ( 0.064342) So, favour << unless you really want to copy strings around.

Deploying to Multiple Rails Environments

On one Rails project, we have two deployment environments: production; and UAT. Using the default Capistrano configuration makes deploying to these two environments rather difficult so, I thought I'd share our deploy.rb with a bit of explanation along the way. Ok, here goes:

For a start, we deploy to a directory that includes the environment as part of the path:

set :deploy_to, lambda { "/home/#{user}/www/#{rails_env}" }

For subversion, we checkout the code as the user who is running the deployment making sure not to cache authentication details on the server:

set :svn_user, ENV['USER']
set :svn_password, lambda { Capistrano::CLI.password_prompt('SVN Password: ') }
set :repository, lambda { "--username #{svn_user} --password #{svn_password} --no-auth-cache svnurl/trunk/#{application}" }

In both cases, we run a mongrel cluster. Because the mongrel configuration files share a lot in common and because they largely duplicate information contained within the deployment script, we generate an appropriate configuration on deployment. More of that in a bit but for now, the common bits look like:

set :mongrel_address, "127.0.0.1"
set :mongrel_environment, lambda { rails_env }
set :mongrel_conf, lambda { "#{current_path}/config/mongrel_cluster.yml" }

Now, for the environment specific portions. For each environment we have a task that simply sets variables appropriately—I toyed with using an environment variable such as RAILS_ENV rather than the pseudo-tasks but it was more typing and I'm allergic to typing :).

For production, we want 3 mongrel instances in the cluster, listening on ports 8000-8002:

desc "Production specific setup"
task :production do
  set :rails_env, :production
  set :mongrel_servers, 3
  set :mongrel_port, 8000
end

For UAT, we want 2 mongrel instances in the cluster, listening on ports 8010-8011:

desc "UAT specific setup"
task :uat do
  set :rails_env, :uat
  set :mongrel_servers, 2
  set :mongrel_port, 8010
end

And finally, a custom deployment script based almost entirely on the built-in deploy_with_migrations with the major difference being the configuration of the mongrel cluster just prior to restart:

desc "Generic deployment"
task :deploy do
  update_code

  begin
    old_migrate_target = migrate_target
    set :migrate_target, :latest
    migrate
  ensure
    set :migrate_target, old_migrate_target
  end

  symlink
  
  configure_mongrel_cluster

  restart
end

That's it really. Now whenever we need to deploy to a particular environment, say for example UAT, we do something like:

cap uat deploy

Update: By request, here is our database.yml file :

common: &common
  adapter: postgresql
  username: <%= ENV['USER'] %>

development:
  database: foo_development
  <<: *common

test:
  database: foo_test
  <<: *common

uat:
  database: foo_uat
  <<: *common

production:
  database: foo_production
  <<: *common

As you can probably tell, we're lucky enough that the database user is always the same as the user under which the application will be run and is that the database itself is named according to the environment. That makes it very easy to wrap up most of the common parts—Thanks goes to Jon Tirsen for that YAML tip.

This could also easily be generated. I guess it just hasn't needed any attention since it was created so YAGNI overrode DRY ;-)

100% Pure Ruby(tm)

Picture 34.pngRecently I’ve been doing a fair amount of work in Ruby. And yes, I’ve felt super-productive. Particularly compared to Java.

The downside of working in Java is the 100% Pure Java(tm) mentality. In the search for a clean and cohesive system, we take the attitude that if it’s not pure Java, it’s crap. In Java, if we need something to happen periodically, we might examine TimerTask, decide it’s insufficient and move on to Quartz. So we add it to our build, figure out the API, realize it conflicts with some other dependency. Well, damn.

With Ruby, it’s scripty enough to not feel the need to have a 100% Pure Ruby(tm) mentality. A Ruby system needs something to occur periodically, we just open a pipe to crontab and hand that bit off to cron.

“But Windows doesn’t have cron!”

Too bad.

Use a better operating system.

The majority of systems deploy to Linux or some other Unix-alike. Developing on a Unix-ish system only makes sense. You wouldn’t prepare to drive an RV by tooling around in a Kia Sportage, now would you?

When you break free of the JVM mentality and assume a sensible host operating system, you realize that the OS itself is your virtual machine to play in. If it’s in your $PATH and can be expected to behave reasonably well on any sane Unix-like OS, by all means, use it.

Back to the premise… Since Ruby is indeed “scripty” you can accomplish a crapload just using a pair of backticks, effectively not even using Ruby at  all.

And you can do it without guilt or complication. Completely unlike punting to Runtime.exec(…). That always makes you feel dirty.

Perhaps Groovy and JRuby will help break the never-escape-the-JVM attitude. Give a developer backticks and easy pipes to subprocesses, and no telling what sort of nefarious things he might could do.

Ferret diverges from Lucene

I am a long time Lucene fan, and was excited about being able to use Ferret in Ruby land to work on the same files. That dream just died as David Balmain (Mr. Ferret) has jumped away from the Lucene file format: This is the first Ferret announcement I've put up for a while, the reason being, the most recent releases of Ferret have been alpha releases. I completely rewrote Ferret from the ground up so that it no-longer uses Lucene's file format and I was able to gain so great performance improvements in the process. Do I need Ferret to use the same file format? Often-time no, as the app is in pure Rubyland, however, I know of a few projects in which being able to access the index from both worlds is a definite plus..... and not through a web service ;) I guess we will have to use one of the other lucene ports for that.

ActiveRecord Identity Map for Rails Transactions

I happened to be reading a blog entry last night that mentioned some "short comings" in Rails' ActiveRecord and its handling of record loading. Specifically, AR will load the same record twice, into two different instances, within the same transaction. Ie. the following test fails:

Customer.transaction do
  c = Customer.find_by_name('RedHill Consulting, Pty. Ltd.')
  assert_same c, Customer.find(c.id)
end

To be honest, I've not yet been burned by this but it may just catch-out some so I quickly whipped up a very basic plugin to see how difficult it would be solve:

module RedHillConsulting
  module IdentityMap
    class Cache
      def initialize
        @objects = {}
      end

      def put(object)
        objects = @objects[object.class] ||= {}
        objects[object.id] ||= object
      end
    end

    module Base
      def self.included(base)
        base.extend(ClassMethods)

        base.class_eval do
          alias_method_chain :create, :identity_map
        end
      end

      module ClassMethods
        def self.extended(base)
          class << base
            [:instantiate, :increment_open_transactions, :decrement_open_transactions].each do |method|
              alias_method_chain method, :identity_map
            end
          end
        end

        def instantiate_with_identity_map(record)
          enlist_in_transaction(instantiate_without_identity_map(record))
        end

        def enlist_in_transaction(object)
          identity_map = Thread.current['identity_map']
          return object unless identity_map
          identity_map.put(object)
        end

        private
          def increment_open_transactions_with_identity_map
            increment_open_transactions_without_identity_map
            Thread.current['identity_map'] ||= Cache.new
          end

          def decrement_open_transactions_with_identity_map
            Thread.current['identity_map'] = nil if decrement_open_transactions_without_identity_map < 1
          end
      end

      def create_with_identity_map()
        create_without_identity_map
        self.class.enlist_in_transaction(self)
        id
      end
    end
  end
end

The code essentially interferes with create and instantiate (called from find) and ensures that, within a transactions, the same record will always be returned for the same id (IdentityMap).

As I mentioned, unlike all my other plugins, I've never used nor needed to use this one—and I'm not sure I will unless it proves to be a problem for me—but it's yet another example of how easy it is to extend Rails to do pretty much whatever you might imagine.

Automatically Validate Uniqueness of Columns with Scope

The first cut at Schema Validations only applied validates_uniqueness_of for single-column unique indexes. This removed 80% of the cases in my code base but there were still cases where a scope was specified that lingered. Not any more.

The plugin now automatically generates validates_uniqueness_of with scope for multi-column unique indexes as well.

As always, there are some assumed conventions—which I believe will handle close to 99% of cases—around how to decide which column to validate versus which columns to consider part of the scope. The column to validate is chosen to be either:

  1. The last column in the index definition not ending in ‘_id’; or simply
  2. The last column in the index definition.

With all remaining columns considered part of the scope, following, what I believe to be, a typical typical composite unique index column ordering.

So, for example, given either of the following two statements in your schema migration:

add_index :states, [:country_id, :name], :unique => true
add_index :states, [:name, :country_id], :unique => true

The plugin will generate:

validates_uniqueness_of :name, :scope => [:country_id]

My next stop is to have a look at simple column constraints such as IN('male', 'female') and turn them into validates_inclusion_of :gender, :in => ['male', 'female'].

Perhaps tomorrow :)

validates_presence_of association Gotcha

The more I use Rails (and the more plugins I create) the more quirks I find.

Imagine I have a one:many relationship between Country and State:

State.belongs_to :country
Country.has_many :states

We then issue the following sequence of statements (I've interleaved the output of tailing the development log):

c = Country.find_by_name('Australia')
  Country Load (0.006506)   SELECT * FROM countries WHERE (countries."name" = 'Australia' ) LIMIT 1
s = c.states.build(:name => 'Victoria', :abbreviation => 'VIC')
s.country
  Country Load (0.009738)   SELECT * FROM countries WHERE (countries.id = 1)

Notice the SELECT to find the country? Now why would that be necessary? I just used .states.build on the country. I would have thought that would set the association but that doesn't appear to be the case.

Looking at the code, my suspicions were confirmed: only the parent's id is set. That seems decidedly odd given that we know for a fact the parent exists—we just used it to create the child.

So anyway, I'm pretty sure this is considered a "feature" but to be honest, I can't see why it is desired behaviour over and above the fact that doing otherwise would be more work and why would you need this if you already have the parent yada, yada, yada.

Well, for a start, I'd like this behaviour because I'd like to use validates_presence_of on foreign-keys and have it work for newly constructed graphs. Usually this barfs no matter what but I concocted a work-around last night and committed it to my Foreign Key Associations plugin which, if done manually, would look something like this:

class State < ActiveRecord::Base
  validates_presence_of :country_id, :if => lambda { |record| record.country.nil? }
  ...
end

Essentially this says to validate the presence of country_id but only if there isn't an associated country. This means that for cases where the parent record is also new, the validation checks for the presence of the associated object rather than the foreign-key column. If you had simply used validates_presence_of :country_id then save would fail because country_id was still nil.

OK that's all very well and good but it still doesn't help because, as shown above, the association isn't set anyway. So, I'm now back to manually setting the association; at least the validation works hehe

I'm sure someone far smarter than I will point out why the behaviour as it stands is obviously the most appropriate and that no one in their right mind would want to do anything else, of course ;-)

Procrastinating in Ruby is Delicious

As I was bookmarking something on del.icio.us today, I noticed the dates on which I had bookmarked the last couple of times and wondered if there was any correlation between frequency and day of the week. So, I downloaded a summary using https://api.del.icio.us/v1/posts/all? and whipped up a little ruby script to compile some statistics:

Wednesday = 41
Tuesday = 39
Thursday = 37
Friday = 32
Monday = 26
Saturday = 24
Sunday = 12

Looks like Wednesday is the biggest day for bookmarking—also known as procrastinating—and what do you know? Today is...Wednesday!

So then I thought I'd see if there was anything interesting in the time of day:

12 = 26
13 = 20
4 = 17
22 = 15
0 = 14
23 = 12
5 = 12
2 = 12
20 = 10
11 = 10
1 = 10
7 = 10
3 = 9
6 = 7
21 = 7
9 = 6
15 = 4
14 = 3
8 = 3
10 = 2
19 = 2

Phew! Most of my bookmarking is done around lunchtime although an awful lot were done at 4am!

RedHill on Rails Plugin Refactoring

I mentioned in my previous entry that I'd done quite a bit of refactoring of the plugins. Among the various changes that will affect developers using them are:

  • Schema Defining (schema_defining) has been deleted;
  • Foreign Key Support (foreign_key_support) has been deleted; and
  • RedHill on Rails Core (redhillonrails_core) has been added to replace the previous two as well as subsuming some of the more generic functionality from other plugins.

So, why all these changes?

The main reason is manageability. We're actually eating our own dog food and using these plugins in production applications and we're adding functionality at quite a surprising rate. Each time we add something, we first put it into the plugin that needs it directly. That works great for a while but then, someday, we decide we need that functionality in two or more plugins. What to do?

Our original idea had been to create new plugins and this worked for us up to a point. Unfortunately, of late, the number of extra plugins—with very specific functionality mind you—was just getting out of hand and needed to be simplified.

In the end, we decided on a two-tiered approach to plugins: those which add functionality but no (or at least minimal) behaviour; and those that add behavioural magic.

As an example, the new core plugin adds functionality to manage foreign keys, lookup indexes, add unique column meta-data, etc. but doesn't do anything particularly magic that will affect the running of your application.

On the other hand, the foreign key migrations, foreign key associations, schema validations, etc. plugins—which all rely on core—add funky rails magic to automatically generate foreign keys, associations, model validation, etc.

Another change we made was in the way documentation is generated. We used to manually generate a nice HTML file containing all the plugins. This was becoming rather tedious and meant that the documentation was often quite out of date. We've now remedied this with a nice ruby script using Erb and RDoc to generate the online documentation directly from the README files.

I also mentioned previously that we've added "lots" of tests. I say lots because we're still playing catchup so relatively, there are lots but we still need lots more. As a group of developers that are ardent TDD evangelists, the conspicuous lack of tests was somewhat embarrassing to say the least. Unfortunately, testing plugins (especially those related to schema and database) is pretty difficult so we opted to bypass the whole problem and just create a standard rails app with standard rails tests and all is well again.

And lastly, besides all the extrat features we've added (see the CHANGELOGs for the specific plugins), you'll notice that the subversion URL has changed slightly—it used to contain an extra slash (/) which was not only unnecessary but caused SVN to regularly crap out.

My aplogies to all those that have been trying to keep up but we hope that's the last of it. From now on, we'll continue to beef up core as we need and then add plugins only when we need new behaviour.

Of course we'll always reserve the right to change our minds ;-)

Foreign Key Associations Plugin

I've done quite a bit of refactoring of my Ruby on Rails plugins lately which, unfortunately, broke some stuff (thanks to all those that let me know) but the upshot is a much cleaner division of resposibility between plugins; and some sorely needed unit tests.

Another of the benefits from all of this was yet another plugin, this time to automatically generate associations based on foreign-keys.

For example, given a foreign-key from a customer_id column in an orders table to an id column in a customers table, the plugin generates:

  • Order.belongs_to :customer; and
  • Customer.has_many :orders.

(In the near future we intend to support has_one associations for foreign-key columns having a unique index.).

If there is a uniqueness constraint—eg unique index—on a foreign-key column, then the plugin will generate a has_one instead of a has_many.

For example, given a foreign-key from an order_id column with a uniqueness constraint in an invoices table to an id column in an orders table, the plugin generates:

  • Invoice.belongs_to :order; and
  • Order.has_one :invoice.

You can download the latest version directly from svn://rubyforge.org//var/svn/redhillonrails/trunk/vendor/plugins/foreign_key_associations

For all those that have asked for pure HTTP access, I hear you and I'm working on it. (It seems ./script/plugin install doesn't understand the format of the browse repository pages on RubyForge. DOH!)