Transactional Migrations Plugin

I wrote a while ago on utilising transactional DDL in your ruby on rails migration scripts so I decided to create a plugin.

In a nutshell:

Transactional Migrations is a plugin that ensures your migration scripts—both up and down—run within a transaction. When used in conjunction with a database that supports transactional Data Definition Language (DDL)—such as PostgreSQL—this ensures that if any statement within your migration script fails, the entire script is rolled-back.

Lo-Fi Profiling of Typo

For anyone who's been wondering why this blog has been up and down over the past week, it's a slow-motion battle between the memory police at TextDrive killing Typo instance that hosts the blog and either a FastCGI dispatcher or a nanny cron job starting it back up. The onus is clearly on me to figure out what's burning memory, and my first inclination was to naively google for Ruby profilers. Here's a rambling account of what I did to conclude that I'm probably out of luck as far as a quick cure for the issues and then to address them.

There are a couple of speed-oriented Ruby performance profilers, the built-in one and ruby-prof, but there are no space-oriented profilers. There was a brute-force approach based on ObjectSpace.each_object in an old mailing list post from Michael Garniss that looked suitable, so I integrated it into the main controller in Typo as an after_filter and fired-up several concurrent wget commands to walk around on a production configuration on my development box at home:

while true; \
do wget -nv -r --delete-after http://localhost:3000; \
done

(There is no reason to try to set it on fire with something like ab.) That won't catch any issues with the vanilla two dispatcher lighttpd/FastCGI configuration that I use on Textdrive, but it should catch any issues with Typo internals, badly behaved sidebars, etc.

With the profiling code integrated, a request that includes the dump takes several seconds to complete, and there are several hits per page; so I added a class variable (@@no_sooner_than) and a little logic so that profiling requests would only run once a minute or so. With several wget walkers working, top reports that the server runs along at a happy 80-90Mb, and eyeballing the profiling output shows memory usage oscillating between <7Mb and ~20Mb without any perceptible upward trend over the course of an hour and a half. (That said, that's all the data I captured, as WEBrick locked up completely after that hour and a half.)

Armed with the information that there wasn't an easy fix for the memory issues, I switched the FastGCI configuration for the production instance to a single dispatcher from the previous two, pointed a couple of wget walkers at it, and tracked memory usage and process id at the commandline, like so:

while true; \
do ps mux | grep ruby | grep -v grep; \
read -t 30; done

I also changed the wget walker command to provide more useful information:

wget -S -r -b -l 4 --delete-after http://mult.ifario.us \
-a /tmp/log_id

where id is a unique number per walker, and so far, so good. Crunching the wget output through shell commands (awk, grep, cut, sort, uniq -c, etc.), e.g.:

cat log* | grep HTTP/1.1 | cut -f 4 -d ' ' | sort | uniq -c

says that mult.ifario.us is consistently returning snappy HTTP/1.1 200 responses about two nines (99.x%) of the time, which isn't great but isn't awful. (Really it's more like 2.5 nines, i.e., −log10(0.003), but who's counting?)

This is one time when I've missed some of the Java runtime environment's capabilities (i.e., the JVMTI) in other language runtimes, but no rocket science was required to get Typo under control.

HTTP, Mongrel, and Pipelining

Mongrel is getting a lot of good (and deserved, in my opinion) attention lately as an app server for ruby. One of the things that bothered me about it, for a good while, was this decision, explained in a comment in mongrel.rb:

  # A design decision was made to force the client to not pipeline requests.  HTTP/1.1
  # pipelining really kills the performance due to how it has to be handled and how
  # unclear the standard is.  To fix this the HttpResponse gives a "Connection: close"
  # header which forces the client to close right away.  The bonus for this is that it
  # gives a pretty nice speed boost to most clients since they can close their connection
  # immediately.

Interestingly, the whole HTTP status line and first couple headers are a constant, frozen, string -- short of patching mongrel or using your own TCP connection handling in your Handler, it *will* close the connection a la HTTP 1.0.

I know Zed is an awfully good programmer, so this decision really irked me. I recently asked why this was so, and the answer amounted to ~"because it fits the use case for which mongrel is intended, and makes life easier," which is valid. So, how does it fit this use case?

If you think of mongrel as being designed to run fairly big sites with one dynamic element and mostly static elements, and then this decision works. Basically you have mongrel serve the dynamic page (possibly from rails) and go ahead and close the connection because you know the same server isn't going to receive a followup resource request immediately, those are handled by servers optimized for that, or by a content distribution network. In this case the Connection:close on the initial request makes sense, the browser is going to be opening additional connections to a different host (or hosts for a CDN, or round-robined static setup) which will pipeline requests for resources.

Yahoo! is a good example of this, we see the initial response headers for the front page, made against www.yahoo.com, return the Connection: close header:

http://www.yahoo.com/

HTTP/1.x 200 OK
Date: Thu, 27 Jul 2006 16:53:56 GMT
P3P: policyref="http://p3p.yahoo.com/w3c/p3p.xml", ...
Vary: User-Agent
Cache-Control: private
Set-Cookie: FPB=3r0o6jmqh12chrt4; expires=Thu, 01 ....
Set-Cookie: D=_ylh=X3oDMTFmdWZsNGY1BF9TAzI ...
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html
Content-Encoding: gzip

but subsequent image loads are made against their CDN, with hosts such as us.i1.yimg.com, and do pipeline:

http://us.i1.yimg.com/us.yimg.com/i/ww/t6/tp_top_bg.png

HTTP/1.x 200 OK
Last-Modified: Thu, 11 May 2006 20:46:13 GMT
Accept-Ranges: bytes
Content-Length: 7857
Content-Type: image/png
Cache-Control: max-age=2345779
Date: Thu, 27 Jul 2006 16:53:56 GMT
Connection: keep-alive
Expires: Thu, 12 May 2016 20:29:52 GMT

http://us.i1.yimg.com/us.yimg.com/i/us/nt/bn/cta/yel_tr.gif

HTTP/1.x 200 OK
Last-Modified: Wed, 27 Jul 2005 00:18:07 GMT
Etag: "29990d3-122-42e6d2bf"
Accept-Ranges: bytes
Content-Length: 290
Content-Type: image/gif
Cache-Control: max-age=979924
Date: Thu, 27 Jul 2006 16:53:56 GMT
Connection: keep-alive
Expires: Sat, 01 Aug 2015 00:46:23 GMT

And so on...

Mongrel is not designed to be a general HTTP server. However, put Apache 2.2 with the worker mpm and mod_proxy in front of it (making sure to strip out the Connection: close header) and you have a pretty decent setup for a high-load system. Just make sure static resources (including page caching) get served up by apache, not Mongrel :-) This will work best when Apache and Mongrel are on the same machine to reduce the overhead for mod_proxy's connection establishment, but given a fast network, the local connect will be far from the bottlenecks for dynamic pages (and Apache is serving the statics directly).

Anyway, nice stuff, all told.