Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: 'merge' (line 'cat' for rows, but with column interpolation) #4

Open
jeetsukumaran opened this issue Mar 5, 2015 · 9 comments

Comments

@jeetsukumaran
Copy link

The final output would have the union of all the field names across all input files.
Fields without data (e.g., that column was missing from the original) could be left blank or a user-supplied value (e.g., "--nodata='NA'").

@BurntSushi
Copy link
Owner

Could you give a small example please? Just so I make sure I understand.

@jeetsukumaran
Copy link
Author

data1.csv:

f1,f2,f3,f4
 0, 1, 2, 3
 4, 5, 6, 7

data2.csv:

f1,f4,f5,f6,f7
 a, b, c, d, e
 f, g, h, i, j

Output of xsv cat rows data1.csv data2.csv --nodata="NA":

f1,f2,f3,f4,f5,f6,f7
 0, 1, 2, 3,NA,NA,NA
 4, 5, 6, 7,NA,NA,NA
 a,NA,NA, b, c, d, e
 f,NA,NA, g, h, i, j
~~

@jeetsukumaran
Copy link
Author

Maybe "missing data" or "missing fields" would be a more precise term. The flag could be "--interpolate-missing" or something like that.

@danielecook
Copy link

4 years late to the party but...

@jeetsukumaran I have written a utility that does this:

https://github.com/danielecook/tut

tut stack data1.csv data2.csv

It's called stack, and I use it to merge all kinds of output files with heterogeneous columns... I also added an option to output the filename with --add-filename (full path) or basenames --add-basename of the files being merged. This makes it super easy to merge together a collection of related files for analysis.

@BurntSushi - it would be really cool if you were able to add this as I'm sure the xsv implementation would be much faster.

@geekscrapy
Copy link

Echo'ing @danielecook 's last comment: It'd be amazing if this was incorporated as a subcommand 👍 Currently using csvstack - it works, but I think xsv would be faster!

@data-man
Copy link

Maybe tsv-utils can help (written in D).

@ad-si
Copy link

ad-si commented Jan 7, 2020

I was pretty surprised to find out that this is not the default behavior 😳. Addition of this would be highly appreciated 😊

@alexmarco
Copy link

Maybe tsv-utils can help (written in D).

HI, don't exists option in tsv-utils for stack and align input data.
+1 for this option in xsv

@BurntSushi
Copy link
Owner

@alexmarco Please don't post comments just to +1 a feature requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants