-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: iterate over more than one prefix (#109) #127
base: master
Are you sure you want to change the base?
Conversation
prepare to create prefixed_items_basecoro using the same code
keep set for prefixed_items
Reviewer's Guide by SourceryThis PR adds support for iterating over multiple prefixes in ijson by introducing a new Class diagram for the new prefixed_items functionalityclassDiagram
class ItemsCommonBasecoro {
+builder_t builder
+int pending_builder_reset
+PyObject *target_send
+PyObject *prefix
+int object_depth
+yajl2_state *module_state
}
class PrefixedItemsBasecoro {
+PyObject *target_send
+PyObject *prefix
+int object_depth
+int pending_builder_reset
+builder_t builder
+yajl2_state *module_state
+PyObject* prefixed_items_basecoro_send_impl(PyObject *self, PyObject *path, PyObject *event, PyObject *value)
}
class ItemsBasecoro {
+PyObject *target_send
+PyObject *prefix
+int object_depth
+int pending_builder_reset
+builder_t builder
+yajl2_state *module_state
}
ItemsCommonBasecoro <|-- PrefixedItemsBasecoro
ItemsCommonBasecoro <|-- ItemsBasecoro
PrefixedItemsBasecoro <|-- PrefixedItemsGen
PrefixedItemsBasecoro <|-- PrefixedItemsAsync
class PrefixedItemsGen {
+reading_generator_t reading_gen
}
class PrefixedItemsAsync {
+async_reading_generator *reading_generator
}
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @wodny - I've reviewed your changes and they look great!
Here's what I looked at during the review
- 🟡 General issues: 1 issue found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟡 Complexity: 1 issue found
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
N_N(retval = items_common_send_top(self, path, event, value)); | ||
if (retval == Py_None) | ||
Py_RETURN_NONE; | ||
PyObject *tuple = PyTuple_Pack(2, path, retval); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): Add NULL check after PyTuple_Pack
PyTuple_Pack can return NULL on error. Should check the return value and handle the error case appropriately.
PyObject *tuple = PyTuple_Pack(2, path, retval); | |
PyObject *tuple = PyTuple_Pack(2, path, retval); | |
if (tuple == NULL) { | |
return NULL; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct, but it is not the only place with lacking checks (kvitems_basecoro.c
, parse_basecoro.c
). Probably a separate commit correcting all of them at once would be beneficial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have created a GitHub issue for your comment: #128
@@ -244,6 +252,14 @@ def _items_pipeline(backend, prefix, map_type, config): | |||
) | |||
|
|||
|
|||
def _prefixed_items_pipeline(backend, prefix, map_type, config): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider consolidating the parallel implementations into a single core prefixed pipeline with wrappers for non-prefixed functionality
The parallel implementations for prefixed and non-prefixed items create unnecessary complexity. We can simplify by making the prefixed version the core implementation and removing duplicate pipeline definitions. Here's how:
def _prefixed_items_pipeline(backend, prefix, map_type, config):
return (
(backend['prefixed_items_basecoro'], (prefix,), {'map_type': map_type}),
(backend['parse_basecoro'], [], {}),
(backend['basic_parse_basecoro'], [], config)
)
def _items_pipeline(backend, prefix, map_type, config):
# Reuse prefixed pipeline with wrapper to strip prefix
@utils.coroutine
def strip_prefix(target):
while True:
prefix, value = (yield)
target.send(value)
return _prefixed_items_pipeline(backend, prefix, map_type, config)[:-1] + \
((strip_prefix, [], {}),)
This approach:
- Makes prefixed_items_pipeline the core implementation
- Removes duplicate pipeline logic
- Maintains all functionality while reducing code complexity
- Keeps the separation between prefixed and non-prefixed interfaces
The same pattern can be applied to the generator and async implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wodny as promised in #109, I took some time to come back to this. Once again, thanks a lot! I was now able to read your description, and have a cursory look at the changes. I'll kick CI now to run, which might flag any potential obvious issues. In the meantime I have only some minor comments for things I saw on passing. From what I saw, I think the solution looks good, although I do wonder whether there's more code we can unify -- that can be done later though.
I missed seeing some tests. The test_items.py
test suite should be a good starting point, and you'll probably want to add some tests that are specific to the new functionality. There's also a test that should be added to test_misc.py
for the new API entry point.
You said you benchmarked this, so presumably you have some changes for benchmark.py
too? And hopefully for dump.py
as well -- both of which also have respective tests to be added. All of these should be fairly straightforward though. If you don't feel like adding them, I can do it after the fact. The tests are a must though, otherwise we can't really be sure things are working as they should.
@@ -49,11 +49,13 @@ def _default_backend(): | |||
parse_coro = backend.parse_coro | |||
items = backend.items | |||
items_coro = backend.items_coro | |||
prefixed_items = backend.prefixed_items | |||
prefixed_items_coro = backend.prefixed_items_coro |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the async variant is missing
PyObject *retval; | ||
N_N(retval = items_common_send_top(self, path, event, value)); | ||
if (retval == Py_None) | ||
Py_RETURN_NONE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is incrementing a reference to None
unnecessarily, you could just return retval
@@ -49,11 +49,13 @@ def _default_backend(): | |||
parse_coro = backend.parse_coro | |||
items = backend.items | |||
items_coro = backend.items_coro | |||
prefixed_items = backend.prefixed_items |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good place as any to discuss the name. I'm not convinced by "prefixed items" doesn't convey the fact that now you can specify multiple prefixes. I originally thought of "multi_items" or something along those lines, to make it clearer that there was no restriction on the number of prefixes.
@wodny while you're here, could you rebase on top of the latest master? I updated the CI infra in the meanwhile and macos runners should be running correctly now |
I prepared some proposals to check if the way I tackle the problem from #109 is acceptable.
I implemented the changes in Python and
yajl2_c
backends.The commit stack contains two variants:
items()
,items()
and the newprefixed_items()
, but has a drawback of performing 10% worse than the original.I benchmarked the code both using defaults and some input I parse in production which is a map of very long lists of maps for which using
items()
is essential.Both versions use
items_common.h
which containsstatic inline
functions used byitems()
and the newprefixed_items()
. An analogy to thebuilder.h
file. The code is all that I was able to move fromitems_basecoro.c
. Because the code is shared, in the non-set-items()
variant a macro is used to define the comparison method (PyObject_RichCompareBool
vsPySet_Contains
). The shared function is divided into top and bottom halves so the calling side can do a specializedCORO_SEND
in between and avoid mangling a generic tuple. Division into halves required one additional state information -pending_builder_reset
.Before I did the
items_common.h
I tried extending pipelines but the built-in tests changed my mind. All elements of the pipelines have input and output formats already defined and eg. the items basecoro is tested directly with events as inputs.I think I managed to avoid additional tuple creation/destruction with the exception of the pure Python code where I assumed it won't matter. I can always write two specialized functions.
I extended the list of
if
s inparse_basecoro.c
to detect the newPrefixedItemsBasecoro
:Everything else is just copying
items*.{c,h}
as a template and adding theprefixed
prefix ;) I made it in a separate commit ("add prefixed_items") because it's the biggest one and I didn't want to mix it with the actual changes.All tests (including memleaks) pass without modification.
I did not touch the
kvitems()
part yet nor did I write any tests forprefixed_items()
in case you decide the changes should be done in a completely different manner.Summary by Sourcery
Add support for iterating over multiple prefixes in the ijson library, enhancing data parsing capabilities. This includes new coroutine and generator implementations for prefixed_items, with shared code to maintain efficiency.
New Features:
Enhancements: