Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add helper functions for CSV processing #125

Open
benhoyt opened this issue May 24, 2022 · 4 comments · May be fixed by #127
Open

Add helper functions for CSV processing #125

benhoyt opened this issue May 24, 2022 · 4 comments · May be fixed by #127

Comments

@benhoyt
Copy link
Owner

benhoyt commented May 24, 2022

It'd be good to add a library of various functions to help with processing CSV files (or other tabular data, it wouldn't be limited to CSV). For example:

  • The printrow() function mentioned in csv.md.
  • If we add the above, may also want a printheader() function that prints the names in OFIELDS (or just use print?).
  • A function to delete a field or fields, for example delfield(n) to delete a single field, or maybe delfield(n[, c]) to delete c fields starting at field n (c defaults to 1). For one implementation, see the rmcol definition in this StackOverflow answer.
    • Do we also need a delfieldbyname()? Though with a better name.
  • A function to insert a field or fields, eg insfield(n, val). With standard AWK you can kind of cheat with something like {$n=val FS $n;}, but that doesn't work for CSV escaping.

We could start by making this a simple AWK library that you include, eg goawk -f lib.awk -f prog.awk (prepend/append the library to the source when using the Go API).

When we want to add them as builtins to GoAWK, we should do it in a backwards-compatible way (i.e., not make them keywords like the other builtins, but if the user redefines a function or variable with that same name, that takes precedence).

@vielmetti
Copy link

If you're thinking about helper functions for CSV processing, it would be worthwhile to look at "csvkit"

https://csvkit.readthedocs.io/en/latest/

which is a set of command line tools for processing CSV data. I'm pretty sure that all the simple tools have direct implementations in goawk, but some don't, and this might be inspiration.

(thanks for goawk, always nice to see a favorite old language get a modern implementation)

@benhoyt
Copy link
Owner Author

benhoyt commented Jun 24, 2022

@vielmetti Thanks for that. Yeah, I've looked at csvkit some when thinking about this (see https://github.com/benhoyt/goawk/blob/master/csv.md#examples-based-on-csvkit). Select and cut and reorder are fairly straight-forward with the @ operator, and the functions in #127 augment that with field insertion/deletion when you need that.

Some things that csvkit can do probably aren't going to be included though, for example, converting to JSON. Or sorting -- that just doesn't fit the row-by-row AWK model very well.

@janxkoci
Copy link

You could also look at Miller for inspiration. Miller is heavily inspired by awk and the unix toolbox, but adds support for formats like CSV, JSON, etc. Miller is also written in Go, so you could even borrow some code for other parts of your project, like buffers and such.

@janxkoci
Copy link

PS: there is also csvtk in Go!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants