Skip to content

Latest commit

 

History

History
31 lines (22 loc) · 1.92 KB

README.md

File metadata and controls

31 lines (22 loc) · 1.92 KB

CircleCI Build Status

Chaos CSV Writer - Writing CSVs Responsibly

Motivation

Have you ever had a production dependency on data delivered via a CSV? How many times have your production processes broken because you were using a half-baked CSV reader which could not handle commas or escaped double quotes inside CSV data cells? For more gotchas check this article.

Chaos CSV has been created to address this problem by generating valid CSV files which are unreadable with half-baked CSV readers.

Would you not be happy if all of the CSVs that your read were written with this approach?

For those of you who think - "why would anyone parse a CSV themselves or use a crappy CSV reader?" - as of 2016-12-19, the most popular Scala CSV reader (at least according to Google search) - totoshi/scala-csv cannot handle delimiters inside quotes (issue).

Assumptions

It is assumed that you want to write CSV files with headers and all lines contain the same number of data cells.

Implementation

Ensuring Ability to Handle Quotes, Commas, New Lines in Data Cells

The writer ensures that various special characters inside data cells are handled correctly by creating a dummy column containing a random number of these characters, ensuring that at least one of each special character is present per file.

Ensuring Clients do not Hardcode Column Integer Index in their Code

The chaos writer randomly permutes columns when saving a CSV. This forces clients to rely on labels defined in the header to read the data correctly. This holds even for CSVs with a single data column because a dummy column is always added.