Squishr Alpha Released!

Not much time to blog lately, but here's a short update:

We rolled out the Squishr alpha release to a few selected people a couple of weeks ago and things haven't crashed and burned on us yet.

A couple of things learned so far:

  • As soon as you go live you realize that most of your features don't matter unless your big bang features are spot on. With that in mind we've rolled back our ambitions on a lot of cool things and we're working on making the most important things much, much better. Looking back, we probably should have launched the site earlier, and with fewer features in order to start the feedback loop. Apparently I've already forgotten everything I learned while working at Thoughtworks. :-)

  • Having real users is great because it really shows you what parts of the site people are using. It also reminds you that not everyone knows how to use the site the way that you do after building it for months, so putting in a bit of handholding on the front page is probably a good idea.

  • If you're using Rails and you're not using Capistrano, then begin immediately. We're able to deploy dozens of times a day right now, into production, with tweaks and small features. It makes life so much easier.

Overall though, things are going well. We've still got a way to go before we open things up to the public for a real beta, but we're deploying new stuff everyday and things are exciting and still going well right now. If I know you, and you want to check out what we've built so far, then just pop me an e-mail.

The Rails Development Pattern

The team that I am is kicking into gear with Rails now. I am finding an interesting pattern in how development is going.... and it has happened before with new technology but seems even more so now: Get new requirement Start hacking on new code to fulfill requirement (Solution: X lines of code) Chat with another team mate who knows of a plugin that does half of this (Solution: X / 2 lines of code) Look at Rails Recipes and realise there is a better way to get into it (Solution: X / 4 lines of code) Generalize the problem and use MOP to simplify it's usage (Solution: X / 10 lines of code) At the end of the day you sit there and realise that you spent the entire day writing 100 lines of code and then deleting 90 of them. :)

No Need for First-Class Continuations in Java

Gilad Bracha posted a longish entry about continuations:

[...] I’ve thought about this a bit, and here’s my take on why we really shouldn’t add continuations to the JVM. It’s bound to stir up controversy and annoy people, which is a good reason to post it. By far [t]he most compelling use case for continuations are continuation-based web servers. [...]

At least from my point of view, I don't miss continuations in Java, and considering that I program competently in languages that do offer continuations, I will claim that it's not out of ignorance. (I do frequently miss Java not being a functional language.) Every time I've had a real use for continuations, it was worth implementing something specific that wouldn't have been served by a language feature, but then I've only occasionally felt an urge to compact my code down into the smallest and least comprehensible form. (For what it's worth, I think I've used Java's weak goto, i.e., labeled statements, exactly once in seven years of writing Java code...) I've had a customer ask for something to use XML or J2EE or web services, but I've never had a customer ask if I could make sure that continuations get used on the covers. (At least for my money, the coolest thing about Seaside being implemented in Smalltalk is the debugging functionality.)

Continuations are a valid architectural approach to building a participant in one or more stateful conversations, but I don't necessarily see that aligning to a language-level feature. A web server is one example, where the next request is handled by a continuation of whatever handled the previous request, but you'd have to argue with me that snapshotting the stack is the best way to snapshot a session. Process and workflow engines are another example of a system amenable to implementation with continuation-style programming, where a continuation handles the next message or event (e.g., a timeout) to an instance. In the engine case, the execution state is the state of the engine and not necessarily the state of the underlying programming language runtime (e.g., the call stack) and has properties (e.g., durability) not normally provided by the execution state of a traditional programming language.

Many situations in Java where an anonymously created Runnable is passed (e.g., in Swing GUI programming with invokeLater) to be run sight-unseen are essentially instances of a continuation. (Yes, this is more of a closure, since it's local variables that are getting snapshotted and not really the stack...) Other instances where a continuation might be used, e.g., in implementing a generator for a sequence, are a little awkward in Java because it isn't a functional language, but still possible by wrapping up an Iterator the right way. Also, jumping out of nested loops can be accomplished with labeled statements.

Also, from a purely pragmatic perspective, if a given feature of a system really demands an approach that uses continuations at the language level, do I need Java? For one thing, I can get all sorts of fancy language features from other JVM languages like Scala (which is functional), Groovy (which has closures), Jython (which has generators), and JRuby (which is slated to have continuations in v0.9). For another, I could just implement a simple service (SOAP, POX, REST, XML-RPC, etc.) to encapsulate the required functionality or (gasp) write in a language that compiles to native code and get at it using JNI.

(btw, here's a thread at LtU on the topic of Gilad's post.)

Scaling out 37 Signal-style applications is convenient

I had someone telling me that: Ruby can scale. Basecamp prooves that. Now, you all know that I do not think that Ruby has ANY problems with scaling. However, applications such as basecamp have a huge advantage for scaling: minimal shared data. The key to scalability is minimizing access to the same data. The less anal you are about how "correct" the data is the better you will be. Microsoft Word scales very well as when a million people are writing a document, they are not editing the same one (I know, they could be on a shared drive blah blah). Caching and all of the tricks are ways in which we can cheat the system. We can make copies of our REAL data and access those instead. This works for a lot of applications, and a lot of data. This is why you see "this stock data may be up to 15 minutes old". Imagine if everyone needed access to the stock price right NOW. I mean NOW. I mean..... To scale probably, you want to minimize any locking. How stale can your data be in different aspects? The more stale that you can deal with the better. Where does Basecamp and company fit in here? One of the great advantages to those applications is that there is little shared data. If I sign up for an account for my company I can have a large amount of data on fooinc.grouphub.com. Someone else can have barinc.grouphub.com, and no data is shared. I do not need (or should be allowed) access to their data. This means that if they wanted too, we could be on two different machines with out own MySQL instance. One per user doesn't make sense of course (unless the users are GE and Ford), but how about splitting up: A-M and N-Z. If load keeps going up you split again, and again, and again. If you are in this situation you are a lucky man, and should be able to scale up anything :) If you have an application where there is more shared data then you may need to get more creative. A lot of web apps are heavy on reads so you can scale up nicely via MySQL replication and putting out more slaves, but at some point the master will not be able to handle the writes. This is when you need to Give the DB a break! and use caching, and tweaking your archicture into pieces. We do this with out Rails apps by making parts and pieces web services that we can scale up separately, but there is always the bottleneck on some part of the darn data!

More on Meeting-Making for Google Calendar

After having posted about how it would be possible to take the Atom feeds from Google Calendar and make a collaborative appointment scheduler (meeting time picker for multiple people), I decided to give it a shot using the Atom parsing library for Ruby from Martin Traverso and Brian McCallister.

The Atom library is slick, and doing some simple extensions to the basic binding to support the Google Data elements is straightforward. For example, here's a Ruby snippet that will read the start time, end time, and reminder settings from the feed:

require 'atom'
require 'xmlmapping'
require 'time'
require 'date'
require 'net/http'
require 'uri'

module GoogleData

  NAMESPACE = 'http://schemas.google.com/g/2005'
  
  def GoogleData.int_or_nil(s)
    if s.nil?
      nil
    else
      s.to_i
    end 
  end
  
  def GoogleData.date_or_datetime(s)
    if s.length == 10
      Date.parse(s)
    else
      Time.iso8601(s)
    end  
  end
  
  class Reminder
    include XMLMapping
    
    namespace NAMESPACE
    
    has_attribute :absolute_time, :name => 'absoluteTime',
      :transform => lambda { |t| Time.iso8601(t) }
    has_attribute :days, :name => 'days',
      :transform => lambda { |s| GoogleData.int_or_nil(s) }
    has_attribute :hours, :name => 'hours',
      :transform => lambda { |s| GoogleData.int_or_nil(s) }
    has_attribute :minutes, :name => 'minutes',
      :transform => lambda { |s| GoogleData.int_or_nil(s) }
  end
  
  class When 
    include XMLMapping

    namespace NAMESPACE
    
    # The following little hack is required because the
    # datatype switches between xs:date for all-day
    # appointments and xs:dateTime for non-all-day
    # appoinments.

    has_attribute :start_time, :name => 'startTime',
      :transform => lambda { |s| GoogleData.date_or_datetime(s) }
    has_attribute :end_time, :name => 'endTime',
      :transform => lambda { |s| GoogleData.date_or_datetime(s) }
    has_attribute :valueString
    
    has_many :reminders, :name => 'reminder', :type => Reminder
  end

  class Entry < Atom::Entry
    namespace NAMESPACE
    has_one :when, :name => 'when', :type => When
  end
  
  class Feed < Atom::Feed
    has_many :entries, :name => 'entry', :type => Entry
  end

The two key tricks above are extending Atom::Feed and Atom::Entry to add explicit handling for the extension elements that we're after. (Without any changes, Atom::Entry does capture an array of extension elements, but I'd prefer to work with objects.) Similar approaches can be applied to the other “kinds” of things in the feed. As an editorial comment, I'm lukewarm about the datatype of an attribute value determining its semantics; normally the semantics would determine the datatype.

To grab the data from the Atom feed of the calendar:

response = Net::HTTP.get_response(URI.parse(GCAL_FULL_URL))

# TODO: Limit the number of redirects to follow.
# TODO: Gracefully handle other non-200's here, too.
while response.kind_of? Net::HTTPRedirection
  response = Net::HTTP.get_response(URI.parse(response['location']))
end

feed = GoogleData::Feed.new(response.body)

feed.entries.each { |event|
  puts '---'
  puts event.title
  puts event.when.start_time.to_s + ' -- ' + event.when.end_time.to_s
}

Back to the original goal of building a “meeting maker” for Google Calendar based on the Atom feeds for participants' calendars, the additional work to properly handle recurrence and recurrenceException makes the problem look quite a bit more complicated (and interesting). (Fortunately, there does appear to be an iCalendar (RFC2445) library available as well...) So this is turning into more than a one-evening project.

With the added complexity of supporting recurring events and exceptions, there is probably a tidy approach that augments the list merge I suggested before with generators and sequence comprehensions for the recurring events — just enumerate possible meeting times from the complement of the merged list of “busy” times for non-recurring meetings and test for overlaps in the union (i.e., “or”) of the sequences for each participant. (If I recall correctly, the meeting makers in the usual Exchange clients don't support optimal scheduling of recurring meetings, so that would be a nice feature as well, i.e., schedule the recurring meeting at the time with the fewest conflicts or at least minimize the conflicts for some subset of the participants.)