Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Diff mode #210

Open
turion opened this issue Mar 6, 2020 · 7 comments
Open

Feature request: Diff mode #210

turion opened this issue Mar 6, 2020 · 7 comments

Comments

@turion
Copy link

turion commented Mar 6, 2020

I frequently have to compare large csv files where only a few fields in moderately many rows have changed. It would be cool to have a diff mode that shows the cell-wise diff of two csv files.

@BurntSushi
Copy link
Owner

Could you please provide more details? An example with inputs and outputs would help.

@turion
Copy link
Author

turion commented Mar 6, 2020

An example with inputs and outputs would help.

Good idea.

Here is a rough sketch:

$ cat a.csv
foo,bar
a,23
b,42
[...many lines]

$ cat b.csv
foo,bar
a,100
b,42
[...many lines]
c,0

$ xsv --diff a.csv b.csv
@@ -1,bar +1,bar @@ foo,bar
  a,-100+23
@@ -1234 +1234 @@ foo,bar
- c,0

$ xsv table --diff a.csv b.csv
  foo bar
  a   -100+23
+ c   0

@turion
Copy link
Author

turion commented Mar 6, 2020

I guess there are other interesting interactions. E.g. xsv stats --diff could show the number of changed rows and cells. xsv select --diff could limit the diff on certain columns.

@Yomguithereal
Copy link
Contributor

Hello @turion, do you know daff? Reading your initial question I remembered about this tool that seem to do the job you need. It can also be easily integrated with git if I remember correctly.

@turion
Copy link
Author

turion commented Mar 8, 2020

@Yomguithereal that sounds cool! Yes, that's sort of the feature set I'd like to see.

@nicoburns
Copy link

I think a daff-style diff is the way to go for this feature. Daff actually has a spec: http://paulfitz.github.io/daff-doc/spec.html, and the codebase (written in Haxe) is MIT licensed.

@kevinji
Copy link

kevinji commented Dec 31, 2020

The simplest version of this that would be useful for me would contain:

  • A way to select a subset of columns as a key. If there are multiple rows with the same key, raise an error.
  • In the diffed csv, some way to tell what was added/edited/removed/unchanged.

Something like this is useful if you have a job that snapshots state periodically and you need to figure out what changed. Here, the format rarely changes but the contents often do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants