Alberto Grespan

CSV file reading in Ruby

— October 26, 2014

In the recent days I’ve started using CSV files more and more for data manipulation, specifically for database seeding. CSV is a super easy to use and it doesn’t require a complex text editor… Google Spreadsheets or VIM will work great.

This post I will show how to read data from .csv file and dumping it in a Ruby Array of Hashes. It will also show how to seed a Rails database.

Reading a CSV file

Ruby provides CSV support in the Standard Library and it will help us do most of the job very quickly; although there are various ways to achieve the same results we will be focusing in just 2 of the possible ways, read the file entirely with the read method or line by line with the foreach method.

CSV File

The CSV file we are going to work with is this one:

Name,Lastname,Email,Birth Date,Hometown
Alberto,Grespan,[email protected],30/11/1986,Mérida
Pedro,Perez,[email protected],4/4/1984,Caracas
John,Doe,[email protected],,Kansas
José,González,[email protected],16/10/1984,Madrid
Andrés,Márquez,,18/3/1987,Caracas

As you see from the data above the first line is the Header line, which gives meaning to the rest of the data and the rest of the file is just the plain data. There are some missing values, e.g. we don’t have the birth date of John Doe or the email of Andrés Márquez.

Our goal here is to have an Array of hashes after reading the CSV file. It must contains all headers as keys for their values e.g:

[
  {:name=>"Alberto", :lastname=>"Grespan", :email=>"[email protected]", :birth_date=>"30/11/1986", :hometown=>"Mérida"},
  {:name=>"Pedro", :lastname=>"Perez", :email=>"[email protected]", :birth_date=>"4/4/1984", :hometown=>"Caracas"},
  {:name=>"John", :lastname=>"Doe", :email=>"[email protected]", :birth_date=>nil, :hometown=>"Kansas"},
  {:name=>"José", :lastname=>"González", :email=>"[email protected]", :birth_date=>"16/10/1984", :hometown=>"Madrid"},
  {:name=>"Andrés", :lastname=>"Márquez", :email=>nil, :birth_date=>"18/3/1987", :hometown=>"Caracas"}
]

Reading all the file at once

How do we go from the CSV file to an Array of hashes?

Well pretty easily, we just need to open irb, require the CSV library from the Standard Library and use the read method passing it the right options and that’s it!

require 'csv'

> data = CSV.read("csv_data.csv", { encoding: "UTF-8", headers: true, header_converters: :symbol, converters: :all})

> hashed_data = data.map { |d| d.to_hash }
=> [
{:name=>"Alberto", :lastname=>"Grespan", :email=>"[email protected]", :birth_date=>"30/11/1986", :hometown=>"Mérida"},
{:name=>"Pedro", :lastname=>"Perez", :email=>"[email protected]", :birth_date=>"4/4/1984", :hometown=>"Caracas"},
{:name=>"John", :lastname=>"Doe", :email=>"[email protected]", :birth_date=>nil, :hometown=>"Kansas"},
{:name=>"José", :lastname=>"González", :email=>"[email protected]", :birth_date=>"16/10/1984", :hometown=>"Madrid"},
{:name=>"Andrés", :lastname=>"Márquez", :email=>nil, :birth_date=>"18/3/1987", :hometown=>"Caracas"}
]

We are passing a couple of options to the read method:

As we first converted the CSV file into an Array of CSV::Row's we need to iterate through and convert each line to a hash, saving it in a new hashed_data variable that will achieve our desired format.

How about doing this but line by line?

Reading line by line

In a similar way as before, we are going to read the file, only that this time using the foreach method.

require 'csv'

data = Array.new

> CSV.foreach("csv_data.csv", { encoding: "UTF-8", headers: true, header_converters: :symbol, converters: :all}) do |row|
  data << row.to_hash
end

> data
=> [
{:name=>"Alberto", :lastname=>"Grespan", :email=>"[email protected]", :birth_date=>"30/11/1986", :hometown=>"Mérida"},
{:name=>"Pedro", :lastname=>"Perez", :email=>"[email protected]", :birth_date=>"4/4/1984", :hometown=>"Caracas"},
{:name=>"John", :lastname=>"Doe", :email=>"[email protected]", :birth_date=>nil, :hometown=>"Kansas"},
{:name=>"José", :lastname=>"González", :email=>"[email protected]", :birth_date=>"16/10/1984", :hometown=>"Madrid"},
{:name=>"Andrés", :lastname=>"Márquez", :email=>nil, :birth_date=>"18/3/1987", :hometown=>"Caracas"}
]

Now that we now how to read and print the CSV file in Ruby we can use this to seed a database, for example within Rails.

Seeding a Rails database

Seeding the database through a CSV file is in no way any different from what we’ve been doing. For this example let’s try seeding a Users table that has the same structure as the above CSV file (name, lastname, email, birth_date and hometown).

Let’s open the db/seeds.rb file and use the read line by line foreach method to accomplish our seeding task.

# seeds.rb
require 'csv'

CSV.foreach("db/csv/csv_data.csv", { encoding: "UTF-8", headers: true, header_converters: :symbol, converters: :all}) do |row|
  User.create(row.to_hash)
end

Let’s run the seeding:

$ bin/rake db:seed

And if there are no errors we can go and check in the Rails console to check if all the users in the CSV file where inserted correctly:

$ bin/rails c
Loading development environment (Rails 4.1.5)
irb(main):001:0> users = User.all
  User Load (1.4ms)  SELECT "users".* FROM "users"
=> #<ActiveRecord::Relation [
#<User id: 1, name: "Alberto", lastname: "Grespan", email: "[email protected]", birth_date: "1986-11-30", hometown: "Mérida", created_at: "2014-10-27 00:54:44", updated_at: "2014-10-27 00:54:44">,
#<User id: 2, name: "Pedro", lastname: "Perez", email: "[email protected]", birth_date: "1984-04-04", hometown: "Caracas", created_at: "2014-10-27 00:54:44", updated_at: "2014-10-27 00:54:44">,
#<User id: 3, name: "John", lastname: "Doe", email: "[email protected]", birth_date: nil, hometown: "Kansas", created_at: "2014-10-27 00:54:44", updated_at: "2014-10-27 00:54:44">,
#<User id: 4, name: "José", lastname: "González", email: "[email protected]", birth_date: "1984-10-16", hometown: "Madrid", created_at: "2014-10-27 00:54:44", updated_at: "2014-10-27 00:54:44">,
#<User id: 5, name: "Andrés", lastname: "Márquez", email: nil, birth_date: "1987-03-18", hometown: "Caracas", created_at: "2014-10-27 00:54:44", updated_at: "2014-10-27 00:54:44">
]>

Pretty cool, eh?


I believe this is a great way to seed a database for testing and development. Using a CSV file can help us add, remove or modify data quickly, clean and easy.

Keep in mind that you can also use the read method for seeding the database and that will be fine for small amount of data, but if it’s lots and lots of data it will probably be a slower process that using foreach because it’s keeping all in memory and then reading and inserting it into the db instead of doing it all at once.

Thanks for reading!