Seed Data In Ruby On Rails

by Matthias Marschall on May 28, 2009 · 6 comments

To run automated tests for your Ruby on Rails webapp, not only do you need your latest database structure deployed to the test database (created by rake db:test:prepare), but you also need some seed data for lookup tables, e.g. like zip codes.

Common approaches like adding seed data through rails migrations are discouraged, and plugins like seed_fu only work for small amounts of seed data. In seed_fu, you can specify a seed method for your ActiveRecord models like so:

User.seed(:login, :email) do |s|
      s.login = "bob"
      s.email = "bob@bobson.com"
      s.first_name = "Bob"
      s.last_name = "Bobson"
    end

Running the rake db:seed task provided by seed_fu will add all defined models to your test database.

DHH has even standardized a way to load seed data for Rails 3, making the rake db:seed task part of rails and setting up a file called db/seeds.rb for maintaining your seeding code. Using that file, you can load your seed data however you see fit, e.g. seed_fu.

How to Deal With Big Amounts of Seed Data


So far, so good. There are ways to load seed data into your rails test database using Ruby code. But what if, like in our case, you have to seed more than 60,000 Points of Interest and over 16,000 cars? We definitely don’t want to write Ruby code for each of them. The only sane way of handling such amounts of data are database dumps. So I added my own rake db:seed:dump and rake db:seed:load tasks to our Rails 2.3.2 application. As soon as we move to Rails 3, we can call the load task from within db/seeds.rb.

Short and sweet (and completely MySQL specific and dependent on MySQL living in your path ;-) ) here are my two rake tasks:

namespace :db do
  namespace :seed do
    require 'db/seed_tables'
 
    desc "dump the tables holding seed data to db/RAILS_ENV_seed.sql. SEED_TABLES need to be defined in config/environment.rb!!!"
    task :dump => :environment do
      config = ActiveRecord::Base.configurations[RAILS_ENV]
      dump_cmd = "mysqldump --user=#{config['username']} --password=#{config['password']} #{config['database']} #{SEED_TABLES.join(" ")} > db/#{RAILS_ENV}_seed.sql"
      system(dump_cmd)
    end
 
    desc "load the dumped seed data from db/development_seed.sql into the test database"
    task :load => :environment do
      config = ActiveRecord::Base.configurations['test']
      system("mysql --user=#{config['username']} --password=#{config['password']} #{config['database']} < db/#{RAILS_ENV}_seed.sql")
     end
  end
end

Note that I use a file called db/seed_tables.rb to define, which tables shall be dumped. It just holds an array of table names like so:

SEED_TABLES = [
  "auxilary_services",
  "background_informations",
  "pois"
]

Using two basic rake tasks and database dumps eases the pain of handling test data for us. How do you manage your test data? Let us know in the comments!

Did you enjoy this article? Please subscribe to the RSS Feed to receive updates on Agile Methodologies & DevOps.

Related Posts

{ 4 comments… read them below or add one }

1 guns July 18, 2009 at 12:39 pm

Hey thanks for the writeup! I’ve been using shell scripts to do the same, but this is a nicer way to handle it.

You should consider making this a Rails plugin.

2 Matthias Marschall July 21, 2009 at 11:30 pm

Great that you like it. Unfortunately it’s still a little too rough for becoming a gem.

3 Kovalchuk Anatoliy January 12, 2010 at 10:57 am

Good solution,
I’d like to add –host=#{config['host']} in command line

4 Matthias Marschall August 4, 2010 at 4:21 pm

In rails 2.3.8 you don’t need the load task anymore. You can just drop

config = ActiveRecord::Base.configurations[RAILS_ENV]
system("mysql --user=#{config['username']} --password=#{config['password']} --host=#{config['host']} #{config['database']} < db/#{RAILS_ENV}_seed.sql")

into the given db/seeds.rb file and run:

rake db:seed

or use rake db:setup to create your database and load the seed data from your SQL file

Leave a Comment

{ 2 trackbacks }