Rails Housekeeping

Since moving from lighttpd+FasCGI to Apache 2.2+mongrel our production rails application has been rock solid—one unexplained ruby core-dump notwithstanding.

To keep everything humming along, we run a few cron jobs which I thought I'd share.

The first is to ensure the application starts on boot. There is an rc-script to do this but I never bothered to get it running on FreeBSD. Instead, we use the @reboot keyword built into vixie-cron:

cd ~/www/production/current; mongrel_rails cluster::stop; mongrel_rails cluster::start

Next, session expiration. Even though plenty have argued against them in favour of memcached, we've found file-system sessions to be just fine for our, relatively low traffic, application. To keep timeout sessions after one hour—with a margin of error of an extra hour—we run an hourly cron job to delete session files that haven't been updated since it last ran:

cd ~/www/production/current; find tmp/sessions -name 'ruby_sess.*' -amin +60 -exec rm -rf {} \;

Next, to keep log file sizes manageable, we run a cron job once a day to rotate the log files using logrotate, followed by a re-cycle of the mongrel cluster:

cd ~/www/production/current; logrotate -s log/logrotate.status config/logrotate.conf; mongrel_rails cluster::restart

And here's config/logrotate.conf:

"log/*.log" {
  compress
  daily
  delaycompress
  missingok
  notifempty
  rotate 7
}

And finally, just because, we run another daily cron job to vacuum the PostgreSQL database:

cd ~/www/production/current; psql cjp_production -c 'vacuum full'

Naked Active Records

I had a weird dream last night (literally a dream, not in the MLK sense). In the dream I was building a Rails app (I know, couldn't I think about more interesting / racy things?) and the designer for the project got hit by a bus. Instead of finding another designer, I delivered the UI that was already built. No, not even Streamlined but instead: % ruby script/console I didn't JUST give them the command line, I also had a piece of paper that listed the domain models and documentation for them to grok, but even that told them how to look at methods on the objects to delve deeper. This is Naked Objects taken to the extreme, but a lot of us probably use this "UI" don't we? This is because the developers of the application put the power in power users. We have to know everything about the system, and often have to dig deep into the system to fix bugs and such. script/console is my best friend. It means that I do not have to drop to SQL to poke around my DB. Instead I have a more powerful tool to munge the data, in a language that I prefer for day to day tasks. Also, it means that I do not bypass any validation / business logic when playing with the data. I have known many a project that has SQL loading scripts that end up being invalid when business logic changes (and this logic wasn't constrained in the DB). Writing scripts that use the bootstrapping is great too. Now instead of doing a bunch of work in Ruby and then dropping to SQL, you can stay in Ruby land and it will do the hard work for you. So, here is to script/console. The power users tool.

First Steps with Haskell for Web Applications

As I blogged yesterday, I'm planning to build a simplified personal publishing system to host this blog, partially to get around resource consumption issues with the current platform and partially to get some exercise with a new language or two. I thought about Smalltalk, Erlang, and Io, but Haskell gets the initial nod if for no other reason than it's a third side of the coin that Ruby and Java are two sides of — rigorously defined, "purely" functional, lazy, "typeful", and compiles to native code via GHC. (And, of course, the syntax warms the cockles of my mathematician's heart.) Like Ruby with gems, the GHC runtime also has excellent modularity, with a minimal and standard core and good package management via Cabal. (Hello? Java?)

The first question is how to integrate an application written in Haskell into a web container, preferably a web server like lightTPD or Apache via FastCGI. (CGI would be a consideration, too, but that's just too retro for me.) Thankfully, as of the forthcoming 6.6 version, GHC has good CGI support via the Network.CGI module, and Björn Bringert has a FastCGI binding that built on the GHC 6.5 tip with only a little tinkering. (I wanted to use the core Network.CGI module in place of Björn's cgi-compat module.)

A "Hello, World" implementation using the FastCGI binding and then compiled to native code performed well on a basic smoke benchmark. Here's the relevant line from top for an instance of the handler:

  PID COMMAND      %CPU   TIME   #TH #PRTS #MREGS RPRVT  RSHRD  RSIZE  VSIZE
[...]
  234 hello.fcgi   0.0%  0:06.83   1    13    21   692K  1.63M  1.69M  29.0M
[...]

Benchmarking with ab shows that 5 handlers can happily crank through around 4000 requests/second with 99% of the requests requiring <2ms.

For comparison purposes and with an identical FastCGI configuration, the simplest possible Ruby on Rails "Hello, World" implementation (create test controller, edit the .rhtml to return content, wire-up FastCGI) consumes considerably more memory:

  PID COMMAND      %CPU   TIME   #TH #PRTS #MREGS RPRVT  RSHRD  RSIZE  VSIZE
[...]
  537 ruby1.8     12.1%  0:26.49   1    14    94  22.5M  3.35M  24.5M  54.5M
[...]

and only manages around 100 requests/second with ~50ms response time for the 50th percentile and ~400ms at the 99th percentile. (I recognize that I should probably put a sic. after the "only", since 100 requests/second is significantly in excess of the peak throughput that my blog sees on a good day.)

This is far from apples-to-apples, as the RoR version is doing a lot more work under the covers, but it does give me the expectation that I can probably get a Haskell blog implementation that will have a memory footprint smaller than a base irb and provide Slashdottable performance.

Next up, deciding on how to store/represent an entry and how to implement Atom for syndication.

Abstract ActiveRecord Classes by Convention

Ruby on Rails provides a very simple mechanism for specifying that a model class is an abstract base class and therefore has no corresponding database table:

class MyAbstractClass < ActiveRecord::Base
  self.abstract_class = true
  ...
end

Code can then interrogate a model class to see if it is abstract:

puts "it's abstract" if MyAbstractClass.abstract_class?

Not so hard, however I pretty much always prefix the name of my abstract classes with, you guessed it, 'Abstract'. So, I added some code to the RedHill on Rails Core Plugin the other day to extend the definition of an abstract class to include the name:

def abstract_class?
  @@abstract_class || !(name =~ /^Abstract/).nil?
end

With that simple change, I no longer need to explicitly set self.abstract_class = true; it just works by magicconvention.

I suppose I could/should have created a plugin for it but I was feeling lazy :)

Typo + TextDrive != Happy

The logs say that mult.ifario.us throws a fair number of HTTP 500 response codes back at visitors, and that's sad. It is certainly not the impression I want to make on visitors and readers (although subscribers are insulated from failures by FeedBurner's excellent service). In a perfect world, something as simple as a weblog wouldn't throw any 500s, ever. The problems come from running Typo on TextDrive. There isn't anything intrinsically wrong with the Typo engine, with Ruby, or even with TextDrive, as a similar setup runs like a top in my test environment, but TextDrive's resource limits make Typo's design impractical.

This got me thinking about the design of the simplest possible weblog publishing software, a design that would eschew the use of a database and all runtime configuration in favor of a system that is ultra-lightweight and quick to “boot”. Almost all of the content in the blog is relatively static — display of an entry, feeds, archives, various paginations and groupings only require lightweight decoration of the XHTML for a given entry. Paginations and groups, e.g., by multiple tags or by tags plus date, require some dynamic behavior on the server, but not that much. A complexity-ectomy doesn't have to come at the expense of chrome and eye candy, as modern browsers make it possible to inject dynamic content (images from Flickr, links from del.icio.us, free-associations from Google AdSense, etc.) into the browser directly in the form of JavaScript.

The one difficult bit (and the only thing that would require a POST) would be comments. Comments don't need a database or use of dynamic content, either, and using email for comment workflow would solve multiple problems. Here's a sketch:

  • Comment is made on the weblog by submitting a form.
  • Server-side executable wraps the comment as an email and sends it to the blog's author.
  • Normal email filtering machinery is applied to the comment, i.e., spam filtering, and the blog content author either chooses to reply to the message, in which case the comment is added to the relevant entry (e.g., via a procmail recipe), or simply ignores it.

Akismet is apparently effective (if, at the same time, a statement about the sad state of the signal-to-noise ratio of the present-day internet), but it makes sense to leverage the filtering technology and massive corpus (~107 messages) of SPAM and ham that I already use for email.

I've experimented with different publishing platforms (Radio Userland, SnipSnap, MT, WordPress, Typo), and they all fell short for me in one way or another.

As the saying goes, if you want something done right... I'm going to embark on a project to replace Typo with something simple, dense/terse, and home-grown. It's also a chance to experiment with a new language or two, so it should be both fun and educational. Java's out due to footprint, but my mind is open otherwise — SmallTalk, Haskell, Lisp, Io, ...?