Generating Rails Fixtures From Large Datasets

I’ve been working with Tomaso Minelli doing some Rails work, and one thing we needed to do was set up some tests. The situation we are in is that we already have a large set of data to work with in our development environment, but getting that data into .yml files is no easy task. Since we have a fairly large number of interdependent tables, dumping one of them alone is pretty useless, unless you manage to get the other tables it links to, and so on and so forth. Dumping all the data would make testing quite slow, and really is too much to be useful. Here’s the rough solution we arrived at:

@fixtures = {}
@seen = {}

# Do something with a fixture.  In this case, we store it for later.
def handle_fixture(record)
  @fixtures[record.class.table_name] ||= []
  @fixtures[record.class.table_name] < STARTING_LIMIT)
        related = record.send(
      related = [related] unless related.is_a?(Array)
      related.each do |related_record|
        if related_record
          puts "    New Record #{} #{}"
          @todo < e
      puts "    ERROR: #{e}:n#{e.backtrace.inspect}"

@todo = []

namespace :db do
  namespace :fixtures do
    desc 'Create YAML test fixtures starting from a single model using reflections'
    task :modelbase => :environment do
      STARTING_CLASS = Foobar
      STARTING_FIND  = :all

      tmp = 0
      totale = STARTING_CLASS.find(STARTING_FIND, :limit => STARTING_LIMIT).size
      STARTING_CLASS.find(STARTING_FIND, :limit => STARTING_LIMIT).each do |m|
        tmp += 1
        puts "#{tmp} su #{totale}: #{}"
        @todo < 0
          i = "000"
          filename = "#{RAILS_ROOT}/test/fixtures/#{name}.yml"
          puts "Create #{filename} for class #{}"
          FileUtils.mkdir(File.dirname(filename)) unless Dir[File.dirname(filename)].size > 0
"#{RAILS_ROOT}/test/fixtures/#{name}.yml", 'w') do |file|
            file.write fixture.inject({}) { |hash, record|
              hash["#{record.class.table_name}_#{i.succ!}"] = record.attributes
          puts "Fixture for #{name} is EMPTY"

It’s written in a somewhat quick and dirty way, but it gets the job done. What it does:

It takes a single class – hopefully a well connected one – fetches a few of them, and goes from there, exploring all the other objects that the initial ones are connected to, and then dumping them in .yml files in the fixtures/ directory. It could certainly use more work:

  • Additional starter classes, for ones that aren’t connected to others.
  • Intelligent examination of data and data sets…
  • For example you could check to see what classes aren’t connected to the others and use them as starter classes.
  • If you wanted to waste a lot of cycles, you could also try and aim for a representative set of examples. For an object that has_many something elses, you could try and find one with zero, one with one, and one with a lot, to cover various cases. That could quickly get out of hand for largish data sets though.
  • I’m sure there are various other tweaks, knobs, and levers that could be added to work more efficiently. I’d love to see them!


For those who use emacs with Ruby on Rails, and perhaps even use rhtml-minor-mode, I have published an update:

It now handles .html.erb extensions for layouts/ files and layouts/application.html.erb files (or .rhtml).

Something that would still be nice to see is some hacking to better integrate partials, which currently don’t really know exactly where they fit in their parent document. That would take some work though, as you’d have to scan for use of the partial in question and perhaps do some other parsing of ruby code. If you’re clever though, you could probably get the most common cases by looking for something like :partial => ......

Cache it all

I recently redid my personal web site, at Wanting to be quick about it, and make the look and feel a bit more uniform than it has been in the past, I hacked together some pages in Rails. Despite this being sort of a “killing a fly with a bazooka” situation, I’ve been doing lots with Rails, so it was quick to use. Here’s the thing, though: Rails is definitely overkill, as the site is basically static. I don’t need to calculate anything or fetch stuff from a database – I just wanted a reasonably good template system, and I am quite comfortable with Rails these days.

But the idea of leaving Rails running for a static site was of course no good: I basically need to cache the entire thing, so that Rails is simply not involved. How to do this as quick as possible (in between diaper changing and other baby duties!) ? Ideally, it would be possible to introspect Rails in order to know exactly which pages are present, then cache those, and avoid Rails on the server entirely (just generate them locally and put them in subversion), but that proved to be fairly hacky, so I settled for this code, which simply caches all pages which comes across, when caches_pages is placed in application.rb:

class CacheFileName
  include ActionController::Caching::Pages
  include ActionController::Caching::Pages::ClassMethods

  def cachedname(path)

def caches_pages()
  return unless perform_caching
  after_filter do |c|
    res = c.cache_page
    cfn =
    cf = :filename => cfn.cachedname(c.request.path)!

It simply caches everything. To be able to easily clear out the cache if there are any changes to the site, we record the changes in the Cachedfile model, which is defined like this:

create_table "cachedfiles", :force => true do |t|
  t.string   "filename"
  t.datetime "created_at"
  t.datetime "updated_at"

with this model:

class Cachedfile < ActiveRecord::Base

  def Cachedfile.clean_cache
    Cachedfile.find(:all).each do |cf|
        fn = ActionController::Base.page_cache_directory + cf.filename
        File.delete fn
      rescue => e
        logger.error "Error deleting #{fn}: #{e.inspect}"


which has a class method to go through and clean out all the cached files. I call it manually from ./script/clean_cache:

#!/usr/bin/env ruby

ENV['RAILS_ENV'] = ARGV.first || ENV['RAILS_ENV'] || 'development'

require File.dirname(__FILE__) + '/../config/boot'
require "#{RAILS_ROOT}/config/environment"
require 'console_app'


It’s not a beautiful system, but it gets the job done.

“Open Source” Akismet or Defensio?

I have a bad habit of getting caught up in side projects – I love to build stuff, and open see opportunities to engage in this hobby. A site I’ve run for a while is the Linux Incompatibility List, which also led me to create my own Rails based wiki for it, DedaWiki, which is of course open source (and could use some more attention).

However, things are getting badly on the spam front, and it’s time to find a better solution than hacking at my own anti-spam code. Two systems that look good are Akismet, which I use successfully for this journal, and Defensio. However, being the impractical guy that I am, I am also curious if there are what we might call ‘open source’ efforts in this area. I’m not sure what that would look like, since there’s probably strength in numbers, and a hosted solution is certainly easier. Also, it seems the incentives in an arms race like anti-spam are in favor of the quick turnaround that a small, smart, dedicated team can provide. Also… a bit of security through obscurity in an arms race type situation probably doesn’t hurt, since there’s often going to be a way around anything that’s not locked down (and part of the value of a journal like this is in the comments), and letting people see exactly what numbers and ratios and metrics are being used is going to help the bad guys.

Anyone able to point me to something that proves me wrong?

Rails and Hype

Yoav Shapira wonders about the “rails backlash”:

I’m a big fan of Rails, and of course I realize that it has some flaws and is certainly not the optimal choice for all situations, but I don’t buy into the bashing.

Sure, Rails was hyped, but there is no way that it could have possible reached the level of popularity it has on hype alone. We’re not talking about multi-million dollar marketing budgets commanded by the likes of Sun or Microsoft, but about a video and some nice dressings by DHH and 37signals. A small group like that just doesn’t have the resources to push something like Rails as far as it’s gone without some substance to back it up.

Look at all the imitation. People realize that the basic idea is good, and even though the basic idea is nothing phenomenal, it hadn’t really been done before. What is this idea? Take a bunch of components, and good (if not always the absolute best) practices, and integrate them nicely. Database, templating, testing, a clean application structure, ajax/javascript integration, with some code generation, all done with a language that’s flexible enough to do all of the above pretty well. All those things existed, but Rails did a great job of tying them all together. And that’s been imitated far and wide. That seems to indicate that, even though the imitators may change and improve bits and pieces, they’re quite smitten with the idea itself. So much that they’ve dedicated a lot of time and effort to reproduce it. That’s quite a tribute.

Look at the adoption. Moving to a new language/platform is a big step, not one to be taken lightly, and for people with lots of existing code in some other language, perhaps simply not possible. Hype will take things only so far, but Rails has been taken up by a large number of people who certainly didn’t do so because it was the safe, easy path to take. Perhaps some of those people have found that Rails really wasn’t for them, but that’s pretty normal. No one said it was perfect. However, when you look at what it takes to make people jump ship from what they know, and are comfortable with, you realize that it really must have been that much better to make people go out on a limb.

Furthermore, it’s easy to point out the failings and drawbacks of something after the fact, but much more difficult to do creative, innovative work of your own. Compounding that is the fact that criticizing others’ work is often an easy and effective way of looking smart; there’s an interesting study referenced here that talks about this effect.

It’s something I’ve come to dislike over the years. Our field is so new, so dynamic, and so immature, that it’s usually pretty easy to take a look at something and say that it “sucks”. It is much harder to write something of your own and put it out there, knowing full well that lots of people who have never done much of anything will jump on it and bemoan its “copious failings”. That’s not to say that it’s impolite to criticize, but – do it constructively please. Imagine that you were there in the same room, talking to the person who wrote the code in question. You would still say you don’t like it, but you’d probably be much more polite and helpful in your commentary. I hope so, at least.

Interest in a simple open source newsletter system for Rails?

I’m putting together a very simple newsletter system for a client, based on Rails. It’s meant to handle around 1000 addresses, and be simple to use: you create a newsletter with fckeditor, look at it, then send it if everything looks good. It also ads links so that people can unsubscribe, or subscribe from the site (with confirmation provided via a link in an email).

It’s not nearly as extensive as a lot of what I’ve seen out there in PHP, but I really needed something in Rails and didn’t find anything that worked for me. I’m considering open sourcing the code, but am only likely to do so if there is interest from people willing to work on the code. I’m just not in the mood to put code out there that just sits there. If you are interested, email me and we can work something out.

One reason to think Rails is “all that”

The economics of programming languages point to Rails being significantly better than what went before it.

I got to thinking about this when reading a comment on a site I like to read, which said:

Rails in itself is, to me, not that impressive. It does a lot of things right, but it does probably just as many wrong. Not the least of which is scaling.

It seems that these sorts of “after the fact” “I know better” comments are a dime a dozen in the world of programming discussions. It’s easy to come along after something’s been built and puff yourself up by pointing to defects in existing systems and show that, therefore, by comparison, you’re a clever fellow.

That’s not my point, though – what I wish to explain is that yes, Rails really was that much better than what was around before it came onto the scene:

“Switching costs” between languages are high. Less so for really sharp programmers, but for the masses that use one or two languages, learning a new language, tools, deployment, etc… is a big step to take, with potentially high risks. Even most A-list programmers I know use a few languages at a time – it’s simply easier if you’re not tripping over your own feet by switching to a different system every day. “Flow” is easier to attain when you’re ensconced in the thinking of one language. For companies, this effect is magnified, and switching to something new is not done lightly.

Since companies are beginning to explore Rails, successfully, I might add, you have to conclude that the big step into the unknown was worth it for some reason. Especially considering that a number of other languages rushed to copy various nice aspects of Rails, lessening the need for users of those systems to consider taking the leap.

Of course, that’s not to say it’s a perfect system, without reproach, or has no negative aspects, but in the spirit of honesty, and credit where credit is due, Rails really did move things a step forward, and the willingness of people to incur high switching costs to obtain its benefits is strong evidence of that.