Generating Rails Fixtures From Large Datasets

Posted by David N. Welton Mon, 10 Nov 2008 16:16:00 GMT

I've been working with Tomaso Minelli doing some Rails work, and one thing we needed to do was set up some tests. The situation we are in is that we already have a large set of data to work with in our development environment, but getting that data into .yml files is no easy task. Since we have a fairly large number of interdependent tables, dumping one of them alone is pretty useless, unless you manage to get the other tables it links to, and so on and so forth. Dumping all the data would make testing quite slow, and really is too much to be useful. Here's the rough solution we arrived at:


@fixtures = {}
@seen = {}

# Do something with a fixture.  In this case, we store it for later.
def handle_fixture(record)
  @fixtures[record.class.table_name] ||= []
  @fixtures[record.class.table_name] << record
end

# Get some sort of unique way of identifiying a record - class name and id.
def unique_record(record)
  return record.class.to_s + " " + record.id.to_s
end

# Grab all the relations of a class and add them to the queue.
def add_fixture(record)
  ur = unique_record(record)
  if @seen[ur]
    puts "LOOP!"
    return
  end
  @seen[ur] = true

  puts "Fixture for #{record.class.name} - ID #{record.id}"
  handle_fixture(record)
  record.class.reflections.each do |k, r|
    begin
      if r.macro == :has_many ||
          r.macro == :has_and_belongs_to_many
        related = record.send(r.name).send('find', :all, :limit => STARTING_LIMIT)
      else
        related = record.send(r.name)
      end
      related = [related] unless related.is_a?(Array)
      related.each do |related_record|
        if related_record
          puts "    New Record #{related_record.class.name} #{related_record.id}"
          @todo << related_record
        end
      end
    rescue => e
      puts "    ERROR: #{e}:\n#{e.backtrace.inspect}"
    end
  end
end

@todo = []

namespace :db do
  namespace :fixtures do
    desc 'Create YAML test fixtures starting from a single model using reflections'
    task :modelbase => :environment do
      STARTING_CLASS = Foobar
      STARTING_FIND  = :all
      STARTING_LIMIT = 1

      tmp = 0
      totale = STARTING_CLASS.find(STARTING_FIND, :limit => STARTING_LIMIT).size
      STARTING_CLASS.find(STARTING_FIND, :limit => STARTING_LIMIT).each do |m|
        tmp += 1
        puts "#{tmp} su #{totale}: #{m.id}"
        @todo << m
      end

      while true
        if @todo.length == 0
          break
        end
        add_fixture(@todo.pop)
      end

      puts "\n\n#{'='*80}\nScan Finished\n#{'='*80}\n\n"

      @fixtures.each do |name, fixture|
        if fixture.size > 0
          i = "000"
          filename = "#{RAILS_ROOT}/test/fixtures/#{name}.yml"
          puts "Create #{filename} for class #{fixture.class.name}"
          FileUtils.mkdir(File.dirname(filename)) unless Dir[File.dirname(filename)].size > 0
          File.open("#{RAILS_ROOT}/test/fixtures/#{name}.yml", 'w') do |file|
            file.write fixture.inject({}) { |hash, record|
              hash["#{record.class.table_name}_#{i.succ!}"] = record.attributes
              hash
            }.to_yaml
          end
        else
          puts "Fixture for #{name} is EMPTY"
        end
      end
    end
  end
end

It's written in a somewhat quick and dirty way, but it gets the job done. What it does:

It takes a single class - hopefully a well connected one - fetches a few of them, and goes from there, exploring all the other objects that the initial ones are connected to, and then dumping them in .yml files in the fixtures/ directory. It could certainly use more work:

  • Additional starter classes, for ones that aren't connected to others.
  • Intelligent examination of data and data sets...
  • For example you could check to see what classes aren't connected to the others and use them as starter classes.
  • If you wanted to waste a lot of cycles, you could also try and aim for a representative set of examples. For an object that has_many something elses, you could try and find one with zero, one with one, and one with a lot, to cover various cases. That could quickly get out of hand for largish data sets though.
  • I'm sure there are various other tweaks, knobs, and levers that could be added to work more efficiently. I'd love to see them!

no comments |

Trackbacks

Use the following link to trackback from your own site:
http://journal.dedasys.com/trackbacks?article_id=2012