-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: deduplicate columns/extract unique columns #84
Comments
Another workaround is But while I was looking into this, I saw this pending PR to add a I'm now looking to adapt it to qsv as well. :) |
I had spotted BurntSushi/xsv#114 too, and the My real use case is merging several "join" tables from another tool, which all share the first dozen columns (and values). |
Got it. You're not just deduping duplicate headers. |
Not tested yet - I got sidetracked by conda-forge (see #85), but could try your pre-compiled binaries instead. |
The transpose/dedup/transpose workaround isn't quite what I wanted as it has also sorted the columns (and I wanted to preserve the order keeping the first occurrence only). I wonder how often people would want a row-based dedup which preserves order? Using |
And don't forget that
to select the second column named 'Foo'. Also, maybe you can use the Regardless, if you come up with a useful recipe, please do share it in the Cookbook. |
I can redo the PR here if it helps. |
@eddy-geek Please do! |
Hi @eddy-geek , just wanted to give you a heads-up that I modified join to have As is, they only take columns from the left relation, so it shouldn't affect your PR for deduping column names... |
Looking at #89 and #90, while If that was working, it would solve my use case fairly well. Here I merge on column 3 of
That looks to be working nicely, other than the duplication of the join column. (My original request of a column deduplication command would make this even easier) |
@peterjc , can you add that to the Cookbook? Hopefully, @eddy-geek can redo his old PR and we can get the BTW, the performance regression may have been a false positive... I just installed WSL at the time and I have since uninstalled it. I ran the benchmarks on WSL and it was giving some bad numbers which I may have unnecessarily attributed to the PR. |
Ah, the wiki page https://github.com/jqnatividad/qsv/wiki/Cookbook#cookbook - I could do that. Maybe a simpler version with CSV files only. |
I don't see that I can edit the wiki (likely restricted to collaborators which is fine), so suggested text: Multi-table join avoiding repeated columnsThis example was inspired by having to combine multiple tables exported from another system, which were themselves from multiple database joins. Suppose you have have several tables ( cp table_A.csv combined.csv
for NEXT in table_B.csv table_C.csv table_D.csv; do
qsv join --merge 2 combined.csv 1 <(qsv select 2,11- $NEXT) > new.csv
mv new.csv combined.csv
done We use a loop to perform multiple joins. Each time we use The |
@peterjc , this is awesome. Thanks! I just opened up the wiki, do you mind adding the article yourself? I really want the wiki to be a community resource, and being one of the early qsv adopters, I'd really appreciate it if you make the first community contribution to it! |
Done. |
Stale issue message |
Should somebody stumble into this - the polars powered |
Cross reference BurntSushi/xsv#283
We can use
qsv dedup
or the Unix command line toolssort
anduniq
to remove duplicate rows in plain text table, but I find myself wanting to do something similar with duplicated columns.For example, after doing
qsv join ...
there will be at least one pair of duplicated columns (the values used for the join).I am hoping for something like a column based version of the row based
qsv dedup
command (see #26).I suspect I could workaround this via the
qsv transpose
command (see #3).The text was updated successfully, but these errors were encountered: