-
Notifications
You must be signed in to change notification settings - Fork 2
Home
Welcome to the CSVComparer wiki!
The CSVComparer is simple tool that checks two csv style files and reports the differences between them in a structured way.
Fix bug where row numbers were out by 1 in the report: https://github.com/jscott7/CSVComparer/issues/40
Replace Queue with lock statements with ConcurrentQueue. This reduced the benchmarks for different file comparison from 1.5ms to 1.28ms.
Created a NuGet package on NuGet.org https://www.nuget.org/packages/CSVComparer (version 1.0.0)
- Change naming convention from Reference/Candidate to LeftHandSide/RightHandSide
- Refactor SplitRowWithQuotes to use ReadOnlySpan
Benchmark. Before
Method | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Gen1 | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|---|
StringSplit | 60.68 ns | 0.351 ns | 0.328 ns | 0.000 | 0.00 | 0.0178 | - | 224 B | 0.000 |
StringSplitWithQuotesControl | 339.66 ns | 0.935 ns | 0.875 ns | 0.000 | 0.00 | 0.0267 | - | 336 B | 0.000 |
StringSplitWithQuotes | 352.94 ns | 6.950 ns | 6.501 ns | 0.000 | 0.00 | 0.0267 | - | 336 B | 0.000 |
After
Method | Mean | Error | StdDev | Ratio | Gen0 | Gen1 | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|
StringSplit | 60.82 ns | 0.655 ns | 0.613 ns | 0.000 | 0.0178 | - | 224 B | 0.000 |
StringSplitWithQuotesControl | 141.54 ns | 1.322 ns | 1.172 ns | 0.000 | 0.0267 | - | 336 B | 0.000 |
StringSplitWithQuotes | 138.74 ns | 1.322 ns | 1.236 ns | 0.000 | 0.0267 | - | 336 B | 0.000 |
- Fix bug with splitting string containing quotes and complex delimiter, for example, using "##" delimiter:
"A##\"B contains a quote##comma\"##\"Also contains a##comma\"##D"
- Change default branch from master to main
Make the following changes to update local branch
git branch -m master main
git fetch origin
git branch -u origin/main main
git remote set-head origin -a
- The column(s) used for the key are included in the results table, for example Key - ABC:DEF below:
Break Type,Key - ABC:DEF,Column Name,Reference Row, Reference Value, Candidate Row, Candidate Value
ValueMismatch,B:1,AnotherColumn,3,y,3,z
ValueMismatch,B:2,AValueColumn,4,1.2,4,1.0
RowInCandidateNotInReference,C:1,,-1,,5,
- Add CodeQL vulnerability scan to GitHub workflow
- Refactor Orphan handling
- Refactor Console Application into new project. Use .NET 6.0 Console template
- Create output folder if it doesn't exist
- Expand unit test coverage
- Updated to .NET 6.0
- Add support for excluding value breaks based on a regex pattern match of key
- Fix typo in xml
- Improve logging. Example output now
Searching for comparison definition for C:\temp\ReferenceDirectory\File.3456.csv
Found Comparison Definition. ID = Test3
Exact file match for reference: 'File.3456.csv' not found. Search using pattern: '^File.[a-zA-Z0-9]*.csv'
Comparing C:\temp\ReferenceDirectory\File.3456.csv with C:\temp\CandidateDirectory\File.1234.csv
Reference: C:\temp\ReferenceDirectory\File.3456.csv
Candidate: C:\temp\CandidateDirectory\File.1234.csv
No differences found.
Saving results to C:\temp\testresults\Reconciliation-Results-Test3.csv
Comparison took 25ms
Finished
- Delimit summary data by comma instead of colon. This will split summary into different cells if opened in a spreadsheet
- Add ability to exclude Orphans based on a Regex pattern matching the key
- Added coverlet to CI
- Setup Azure Pipeline for CI
- Process output correctly reports filenames when they end with .BREAKS.csv
- Perform tolerance based comparison on numeric values when they are enclosed by quotes
COL A | COL B | COL C |
---|---|---|
"ROW 1" | "SOME VALUE" | "42.1" |
- Add support for quotes within column fields. This now follows CSV RFC-4180
If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote. For example:
"aaa","b""bb","ccc"
- In orphan report, missing row index is now -1
- Populate Date in Comparison Results
- Add support for empty csv files. If one or both are empty the comparison will complete with meaningful results
- Add Column Name to break details, for example:
Break Type | Key | Column Name | Reference Row | Reference Value | Candidate Row | Candidate Value |
---|---|---|---|---|---|---|
ValueMismatch | 7 | COL B | 8 | 32.1 | 8 | 42.1 |
- Optionally exclude columns from the comparison. A list of columns to exclude can be defined in the configuration
- Output files for comparisons with differences are saved with Filename.BREAKS.csv
- The output path is now specified as a directory. If it doesn't exist it will be created
- Improve exception reporting when non-unique key columns are defined
- Cosmetic improvements to logging/output file.
- Add line counts for CSV files to output
- In directory comparison if a candidate file doesn't match exactly, use file pattern to search
- Use Regex to match file to configuration in directory comparisons
- If a folder is passed as the outputfile parameter then in directory comparison a results file for each configuration key will be saved
Added support for simple directory comparisons
Change to API. ComparisonDefinition must now be applied in constructor
Fix bug where SplitStringWithQuotes does not exit when a quote is last character, for example: A,B,"C,D"
Add support for single-character delimiters to be enclosed within quotes. For example:
A,B,"This is a comma, in a quote", D
will resolve to:
- A
- B
- "This is a comma, in a quote"
- D
A single instance of the CSVComparer class can now be reused for multiple comparisons
Add support to optionally save output to file
Add IgnoreInvalidRows flag to configuration. If this is set to true then all rows that do not contain the same number of columns as the header row will be excluded from the comparison. Typically this will happen if a footer row is present
Improve output. Example break now reported as:
Break Type: ValueMismatch. Description Key:XY, Reference Row:100, Value:10.5 != Candidate Row:110, Value:1.5
The BreakDetail objects can also now be accessed programmatically
If none of the key columns listed in the configuration exist in the CSV files the comparison terminates with a break description added.
Added tool to create large random test csv files
Breaks include the row number
A: Row: 1 Value: 1.0 != Row: 1 Value: 1.2
Renamed 'Target' to 'Candidate'. Naming convention change
Added Tolerance for comparison of numeric fields