Skip to content
Tilo Sloboda edited this page Dec 28, 2023 · 26 revisions

SmarterCSV Wiki

SmarterCSV is a Ruby Gem for smarter importing of CSV Files as Arrays of Hashes, suitable for parallel processing, e.g. with Sidekiq, as well as direct processing of the resulting hashes with Rails, e.g. ActiveRecord.

The current stable versions are 1.x on the main branch.

Why was SmarterCSV created?

Ruby's CSV library's API is pretty old, and its processing of CSV-files returning Arrays of Arrays feels 'very close to the metal'. The output is not easy to use - especially not if you want to create database records from it. Another shortcoming is that Ruby's CSV library does not have good support for huge CSV-files, e.g. there is no support for 'chunking' and/or parallel processing of the CSV-content (e.g. with Sidekiq),

As the existing CSV libraries didn't fit my needs, and I had to support nightly imports of huge CSV files, I was writing my own CSV processing - specifically for use in connection with Rails ORMs like ActiveRecord, Mongoid, or MongoMapper. In those ORMs you can easily pass a hash with attribute/value pairs to the create() method. The lower-level Mongo driver and Moped also accept larger arrays of such hashes to create a larger amount of records quickly with just one call.

Contents

Experimental Work

A few years back, there was an effort to do a version 2.0 which would re-think how the CSV processing is done. This would have been a major breaking change, and unfortunately this effort was never completed. If you want to look at the experimental work, it is on the 2.0-develop branch, and is documented in the Experimental part of the Wiki. I do not recommend to use this in production, as it is unfinished. If you want to provide feedback on this, open an Issue.

Clone this wiki locally