-
-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add helper functions for CSV processing #125
Comments
If you're thinking about helper functions for CSV processing, it would be worthwhile to look at "csvkit" https://csvkit.readthedocs.io/en/latest/ which is a set of command line tools for processing CSV data. I'm pretty sure that all the simple tools have direct implementations in goawk, but some don't, and this might be inspiration. (thanks for goawk, always nice to see a favorite old language get a modern implementation) |
@vielmetti Thanks for that. Yeah, I've looked at csvkit some when thinking about this (see https://github.com/benhoyt/goawk/blob/master/csv.md#examples-based-on-csvkit). Select and cut and reorder are fairly straight-forward with the Some things that csvkit can do probably aren't going to be included though, for example, converting to JSON. Or sorting -- that just doesn't fit the row-by-row AWK model very well. |
You could also look at Miller for inspiration. Miller is heavily inspired by awk and the unix toolbox, but adds support for formats like CSV, JSON, etc. Miller is also written in Go, so you could even borrow some code for other parts of your project, like buffers and such. |
PS: there is also csvtk in Go! |
It'd be good to add a library of various functions to help with processing CSV files (or other tabular data, it wouldn't be limited to CSV). For example:
printrow()
function mentioned in csv.md.printheader()
function that prints the names inOFIELDS
(or just useprint
?).delfield(n)
to delete a single field, or maybedelfield(n[, c])
to delete c fields starting at field n (c defaults to 1). For one implementation, see thermcol
definition in this StackOverflow answer.delfieldbyname()
? Though with a better name.insfield(n, val)
. With standard AWK you can kind of cheat with something like{$n=val FS $n;}
, but that doesn't work for CSV escaping.We could start by making this a simple AWK library that you include, eg
goawk -f lib.awk -f prog.awk
(prepend/append the library to the source when using the Go API).When we want to add them as builtins to GoAWK, we should do it in a backwards-compatible way (i.e., not make them keywords like the other builtins, but if the user redefines a function or variable with that same name, that takes precedence).
The text was updated successfully, but these errors were encountered: