Generating Rails Fixtures From Large Datasets

I’ve been working with Tomaso Minelli doing some Rails work, and one thing we needed to do was set up some tests. The situation we are in is that we already have a large set of data to work with in our development environment, but getting that data into .yml files is no easy task. Since we have a fairly large number of interdependent tables, dumping one of them alone is pretty useless, unless you manage to get the other tables it links to, and so on and so forth. Dumping all the data would make testing quite slow, and really is too much to be useful. Here’s the rough solution we arrived at:

@fixtures = {}
@seen = {}

# Do something with a fixture.  In this case, we store it for later.
def handle_fixture(record)
  @fixtures[record.class.table_name] ||= []
  @fixtures[record.class.table_name] < STARTING_LIMIT)
        related = record.send(
      related = [related] unless related.is_a?(Array)
      related.each do |related_record|
        if related_record
          puts "    New Record #{} #{}"
          @todo < e
      puts "    ERROR: #{e}:n#{e.backtrace.inspect}"

@todo = []

namespace :db do
  namespace :fixtures do
    desc 'Create YAML test fixtures starting from a single model using reflections'
    task :modelbase => :environment do
      STARTING_CLASS = Foobar
      STARTING_FIND  = :all

      tmp = 0
      totale = STARTING_CLASS.find(STARTING_FIND, :limit => STARTING_LIMIT).size
      STARTING_CLASS.find(STARTING_FIND, :limit => STARTING_LIMIT).each do |m|
        tmp += 1
        puts "#{tmp} su #{totale}: #{}"
        @todo < 0
          i = "000"
          filename = "#{RAILS_ROOT}/test/fixtures/#{name}.yml"
          puts "Create #{filename} for class #{}"
          FileUtils.mkdir(File.dirname(filename)) unless Dir[File.dirname(filename)].size > 0
"#{RAILS_ROOT}/test/fixtures/#{name}.yml", 'w') do |file|
            file.write fixture.inject({}) { |hash, record|
              hash["#{record.class.table_name}_#{i.succ!}"] = record.attributes
          puts "Fixture for #{name} is EMPTY"

It’s written in a somewhat quick and dirty way, but it gets the job done. What it does:

It takes a single class – hopefully a well connected one – fetches a few of them, and goes from there, exploring all the other objects that the initial ones are connected to, and then dumping them in .yml files in the fixtures/ directory. It could certainly use more work:

  • Additional starter classes, for ones that aren’t connected to others.
  • Intelligent examination of data and data sets…
  • For example you could check to see what classes aren’t connected to the others and use them as starter classes.
  • If you wanted to waste a lot of cycles, you could also try and aim for a representative set of examples. For an object that has_many something elses, you could try and find one with zero, one with one, and one with a lot, to cover various cases. That could quickly get out of hand for largish data sets though.
  • I’m sure there are various other tweaks, knobs, and levers that could be added to work more efficiently. I’d love to see them!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s