-
-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add JSON Lines support #152
Comments
Thanks! I do intend to do a deep-dive into this, but just a few initial thoughts. I hadn't considered your first example of "JSON array per line", just because "JSON object per line" is much more common. But that's perfectly valid and reasonable as a strongly typed CSV (well, really more like "slightly typed CSV"). I think JSON What would it do with non-scalar values? In other words, if an array or object was nested inside? Error? Just yield the JSON string? Ignore it? Replace with some placeholder like ""? I suppose for v1 we could say non-scalar values are undefined, and yield "" for now, with the possibility of extending it later. I don't think Unicode causes problems. Everything's just UTF-8 in GoAWK. And then "JSON object per line" maps very well to the GoAWK-specific What would Not sure we need to escape double quotes in returned JSON strings if we end up doing that. Just yield the JSON-encoded string. Escaping is only an issue for string literals. Yeah, the AWK Thanks for your thoughts on this. More another time! |
Hey https://github.com/tomnomnom/gron Is related in that it is a golang package to make json able to work with grep. It looks like a potential base for hawk to support json ? In the example fgrep is used. There is a basic golang implementation of grep here: https://github.com/u-root/u-root/blob/v0.10.0/cmds/core/grep/grep.go fprep is as I understand it depreciated anyway |
Hey, just an FYI that miller (written in Go) also supports JSONL (and JSON), maybe you can check the code there. The author notes that JSON parsing is generally more slow than the other supported formats. |
The JSON Lines text format (aka JSONL or newline-delimited JSON) has one JSON object per line. It's often used for structured log files or as a well-specified alternative to CSV.
Here are some ideas how the JSON Lines format could be supported in GoAWK. To be honest I'm not completely sure if this is a good idea, but I've found it interesting to think about. This write-up captures some of my thoughts.
I can imagine different levels of sophistication. We could start simple and then in later versions support more complex input data and ways to interact with it.
One JSON array of scalars per line
Suggestions:
jsonl
input mode.Questions:
One JSON object per line, with pairs of keys and scalar values
This is used by the Graylog Extended Log Format (GELF).
Users wanting to parse Logfmt messages (like myself, see #149) should be able to convert their data into this format quite easily.
Suggestions:
@"short_message"
)Nested data
Suggestions:
@"five.alpha[1]
returns"fum"
,"five.beta
returns"{"hey": "How's tricks?"}"
(Quoting issue, see below).getjsonarr("five.alpha")
. Now $1 isfo
, $2 isfum
.setjsonroot("five.beta"); print @"hey"
. returnsHow's tricks?
Questions:
The text was updated successfully, but these errors were encountered: