-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iterate over more than one prefix? #109
Comments
@hotaru355 I'm currently just messaging to acknowledge that I've seen this message and I'm aware of it. I'm however unable to properly reply for a couple of weeks, so don't expect any action on this immediately. Thanks! |
An update, a month later, an update: Yes, the idea of having a high-level function allowing users to obtain items from more than one prefix is something that has long been asked for, one way or another. Yesterday I released ijson 3.3.0 with the latest master, and I have a few janitorial tasks I'd like to undertake before tackling this feature, which is something I think would be slightly popular indeed. I'll keep updating this issue whenever I have something to share, but once again don't hold your breath. |
Hi, I've looked at the code to understand basic concepts of the library and done some experiments. Thought that maybe I could help a bit with the implementation. Before I commit to producing a pull request I would like to ask what is the preferred way to add the feature. I've focused on the First change I made was to allow both This brings me to the actual question - do you think that the pipeline for pipeline_node coro_pipeline[] = {
{&ItemsBasecoro_Type, items_args, NULL},
{&ParseBasecoro_Type, NULL, NULL},
{&BasicParseBasecoro_Type, NULL, kwargs},
{NULL}
};
pipeline_node coro_pipeline[] = {
{&StripPath_Type, NULL, NULL}, // leave just the value, strip the path
{&ItemsBasecoro_Type, items_args, NULL},
{&ParseBasecoro_Type, NULL, NULL},
{&BasicParseBasecoro_Type, NULL, kwargs},
{NULL}
}; with the downside of or
Both solutions would require Another variant would be to return sub-generators that allow something like: for prefix, items in ijson.multiple_items(f, ["foo.item", "bar.item"]):
for item in items:
...
|
@wodny first of all, thanks a lot for your efforts! I seldom see people delving deep into the library, let alone the C backend, so I'm happy to see someone giving it a shot. Personally, whenever I add a feature to the library, I do it first for the pure-python backends (basically in ijson/common.py), as it's easier to try concepts there and iterate. But maybe you already did that.
Maybe you can be more generous and accept any iterable of strings, then put it internally in a set (and warn if repeated values were given)
The answer is "it depends", as usual, althogh my first reaction would be to go for an option 3, where items gets a multi_items sibling, but that items can internally call directly to avoid a tuple being created only to deconstruct it and extract its second value. I went to great lengths though the yajl2_c backend to avoid this kind of wasteful tuple creation + unpacking combo when passing the values from one level to the next. I can't recall the actual difference from the benchmarks (do run those! See the README), but it was big enough for me to care about this detail. goes find a link Ok, in the penultimate paragraph if https://github.com/ICRAR/ijson/blob/master/notes/design_notes.rst it says that the C backend got 25% faster (median across different dimensions) when avoiding those tuples plus other things. So yes, definitely worth trying to avoid those I'd say. Having said that, I also realise that option 1 is way easier though. And also thinking more about it, it might be a better strategy to go with it actually. If you write it, we could see how performance decreases for items(), and then we make the decision on what to do for the longer-term. Once again, thanks a lot for the work you're doing. Let me know if you need more guidance in other parts of the codebase. I have myself a few changes committed locally that I haven't polished enough to push yet, but hopefully won't be clashing with yours. Also, today I'm answering, but I might not be able to actually review code for at least two weeks, and then it's Xmas, NY and all of that... Busy time of the year. |
I thought that I would be able to write Python code one way or another but I have never really used the Python C API for anything serious yet so I wanted to check if I was able to write anything working before I offered to prepare some proposals. I created the #127 pull request including a description of what I did.
I struggle to find time for fun stuff like this myself. From my point of view there is no hurry. Probably batching tasks is more important than the overall time. |
Description
Using higher level interfaces, is there a way to iterate over more than one prefix?
Detailed description
First of all, thank you very much for this great library!
I am using ijson to transform a long JSON response from a graphQL server into python objects. Unfortunately, if the server encounters an error, the response status code is still 200 with the response body containing some JSON error message. So, the response can be in one of two formats:
Is there a way to transform the data for a good response and raise an error in case there is a bad response?
I know I can achieve the desired behaviour using the low-level
parse
function. The issues I have with this approach is thatparse
results would need to build dictionaries out of token eventsHow bad would you say is the performance loss when building dictionaries from token events compared to the C implementation? Is it even worth using ijson in this case?
If not already requested, I would like to follow up this question with a feature request that enhances the high-level interfaces to something like:
Why is this not clear from the documentation
The use case is not mentioned
The text was updated successfully, but these errors were encountered: