-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-detect delimiter #199
Comments
@harrybiddle I agree that this would be a useful option. I would limit it though to single character ascii delimiters, and to sniff candidate delimiters from a provided list. Perhaps, the env var could be something like WDYT? A PR would be most welcome! |
That would make sense to me. I guess there are already sniffers out there (I see this one?) so we could depend on that, or re-implement the idea to minimise dependencies.
I would love to give this a go, but unfortunately my Rust knowledge is less than ϵ, where ϵ is an arbitrarily small number 😜. |
The csv_sniffer crate is a good find, but it seems unmaintained, though we can certainly leverage techniques used there. As for Rust, I wouldn't consider myself a Rust veteran. I just picked it up while working on a project myself in 2020 using xsv, scratched a few itches, and ended up doing the qsv fork. It does have a relatively steep initial learning curve, but once you get over the hump, its most enjoyable and makes you realize how interpreted, weakly typed, garbage-collected languages are so inefficient. Anyways, I'll take a look at it over the weekend... |
Hi @harrybiddle , When it gets merged and csv-sniffer is published, will leverage it for this request using a Since it has auto-preamble detection, I'll also use it to automatically skip lines. |
Hi, csv_sniffer owner here. It's definitely been unmaintained (or rather, I completely forgot I wrote it), but I'm happy to maintain it if it's being used :) . I've merged @jqnatividad 's PR and republished. |
Awesome @jblondin , will be sure to integrate it and let you know! |
Thank you for the quick feature! I compiled master and tried this out. Unfortunately it failed on the first file I tried, but I think this was just unfortunate that I happened to find a file which csv-sniffer couldn't handle. @jblondin I've opened up an issue over there: jblondin/csv-sniffer#13. It seems to work great on other files :). @jqnatividad any reason why we aren't able to do sniffing when the file is piped in? The way I had this working in my noddy bash script was to read in the sample into a buffer in memory, sniff it, and then pass the buffer on, followed by the rest of the input pipe. I wouldn't mind the challenge of trying to do this myself, but it might take me a few months to ramp up on rust and also find the time. |
@harrybiddle it's an encoding issue. I handled it with qsv by transcoding to UTF8 using encoding_rs_io. With I'll take another go at it for the next release to see if I can use |
BTW @harrybiddle , is it OK to include the sample file in jblondin/csv-sniffer#13 and use it with qsv's integration tests? |
Also @harrybiddle, I'm very interested in your use cases being that you use qsv "hundreds of times a day." And you don't have to learn Rust to contribute... it'd be awesome if you can share some recipes of how you've put qsv to work in the Cookbook. |
@harrybiddle 's test file works with csv-sniffer directly (jblondin/csv-sniffer#13 (comment)), so it may be an issue with how it's called? I'll investigate a bit here as well. |
Yes, please feel free to use as a test :). It sounds like we might have to guess the encoding too? I use the tool hundreds of times a day, but purely for my own development cycle. I work a lot with CSVs due to the work environment I'm in, and I very frequently need to:
I found Happy to throw a few of my more common commands in the cookbook 👍 . I don't do anything advanced; it's 70% |
That's awesome @harrybiddle ! I look forward to your recipes, and do feel free to include third-party CLI tools with 'em. Speaking of which, you may want to check out csvlens - it's like less for CSVs. 😄 |
Ooh, thanks, I didn't know about that. Combined with csv-sniffer I've got myself auto-delimiter detection ;) #! /usr/bin/env bash
file="$1"
delimiter=$(sniff "$1" | grep Delimiter | awk '{print $2}')
csvlens --delimiter "$delimiter" $1 |
Still encountering an error on Windows 11, Ubuntu Linux 20.04 LTS, and macOS Monterey wih the csv-sniffer binary parsing the sample file in jblondin/csv-sniffer#13... |
Now the qsv REQUIRES and even scans for utf8 encoding when it starts, closing this issue. |
Is your feature request related to a problem? Please describe.
In my daily work, a .csv file is comma-separated 50% of the time, and semi-colon-separated the other 50%. I use this command-line tool hundreds of times a day. It's incredibly frustrating to have to first figure out which separator character is being used, and then adjust my command appropriately.
Describe the solution you'd like
Ideally
qsv
would auto-detect a delimiter by default. However, this would break backwards compatibility, so I suggest having a environment variable to turn this on.$ export QSV_AUTO_DETECT_DELIMITER=1 $ qsv table my_file.csv
When this environment variable is set, any value of
QSV_DELIMITER
would be ignored.Describe alternatives you've considered
I tried to achieve this using a bash wrapper, but it was a bit fiddly because I need to do different things depending on whether qsv is being passed a file (in which case I sniff the file and then pass the delimiter to the qsv command) or a stream (in which case I sniff the stream, and then pass the amount I've already sniffed plus the rest of the stream to qsv).
Additional context
Duplicate of BurntSushi/xsv#294
The text was updated successfully, but these errors were encountered: