- Introduction
- The Basic API
- Batch Processing
- Configuration Options
- Row and Column Separators
- Header Transformations
- Header Validations
- Data Transformations
- Value Converters
Processing CSV data in batches (chunks), allows you to parallelize the workload of importing data. This can come in handy when you don't want to slow-down the CSV import of large files.
Setting the option chunk_size
sets the max batch size.
Please note how the returned array contains two sub-arrays containing the chunks which were read, each chunk containing 2 hashes.
In case the number of rows is not cleanly divisible by :chunk_size
, the last chunk contains fewer hashes.
> pets_by_owner = SmarterCSV.process('/tmp/pets.csv', {:chunk_size => 2, :key_mapping => {:first_name => :first, :last_name => :last}})
=> [ [ {:first=>"Dan", :last=>"McAllister", :dogs=>"2"}, {:first=>"Lucy", :last=>"Laweless", :cats=>"5"} ],
[ {:first=>"Miles", :last=>"O'Brian", :fish=>"21"}, {:first=>"Nancy", :last=>"Homes", :dogs=>"2", :birds=>"1"} ]
]
Example 2: How SmarterCSV processes CSV-files as chunks, and passes arrays of hashes to a given block:
Please note how the given block is passed the data for each chunk as the parameter (array of hashes),
and how the process
method returns the number of chunks when called with a block
> total_chunks = SmarterCSV.process('/tmp/pets.csv', {:chunk_size => 2, :key_mapping => {:first_name => :first, :last_name => :last}}) do |chunk|
chunk.each do |h| # you can post-process the data from each row to your heart's content, and also create virtual attributes:
h[:full_name] = [h[:first],h[:last]].join(' ') # create a virtual attribute
h.delete(:first) ; h.delete(:last) # remove two keys
end
puts chunk.inspect # we could at this point pass the chunk to a Resque worker..
end
[{:dogs=>"2", :full_name=>"Dan McAllister"}, {:cats=>"5", :full_name=>"Lucy Laweless"}]
[{:fish=>"21", :full_name=>"Miles O'Brian"}, {:dogs=>"2", :birds=>"1", :full_name=>"Nancy Homes"}]
=> 2
# using chunks:
filename = '/tmp/some.csv'
options = {:chunk_size => 100, :key_mapping => {:unwanted_row => nil, :old_row_name => :new_name}}
n = SmarterCSV.process(filename, options) do |chunk|
# we're passing a block in, to process each resulting hash / row (block takes array of hashes)
# when chunking is enabled, there are up to :chunk_size hashes in each chunk
MyModel.insert_all( chunk ) # insert up to 100 records at a time
end
=> returns number of chunks we processed
PREVIOUS: The Basic API | NEXT: Configuration Options