-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming support #53
Comments
Hi @kellyjonbrazil Please let me know if you'd be willing to discuss a PR and I'll open one. I'm also open to recommendations for organization, design choices, and style. This is also a workaround for #67 or leaves a small amount of work left to implement it. The core changes:
Relative to your proposal this only supports ndjson/jsonl. It does not stream a single large json doc as ijson might. This fits all of my use cases although others may have different needs. Some examples: flattening to generate data with little memory:
the user python query, and therefore the resulting function, returns the generator expression which is streamed.
reading the above input, streaming the input, and taking the sum of all the "value"s
DotMap works -- value may be accessed as
performing line-by-line transformation in a streaming manner. In this case add an attribute "is_even":
output:
in general the "lines" behavior requested in #67 looks like
where "f" is the transformation to be performed. This is also how I'd implement that feature (using python ast package), but at this point I think it's a marginal improvement. |
Hi there - thank you so much for offering this contribution! I think this looks fantastic. Please do create a PR... I'd like to play around with this a bit. I think we should also include some updated documentation in the PR. Thanks for including the tests, too! One thing to note - I'll do my best to troubleshoot/fix if any bugs are reported after release of this feature, but could I ping you for help if I get stuck? |
Also, could you make sure to branch from |
Had already realized my mistake and just rebased. Streaming input doesn't support raw input, which I missed because I was working off of master. PR is up at #69
Of course |
Enhancement to add streaming support so the entire JSON document doesn't need to be loaded to start processing.
Looks like the
ijson
library might handle a lot of this. I think I might be able to create ajello -S
(for streaming) option that uses theijson
library to parseSTDIN
and return_
as a generator/iterator of JSON objects - whether it's an array or a top-level of JSON objects.The text was updated successfully, but these errors were encountered: