-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: iterate over more than one prefix (#109) #127
base: master
Are you sure you want to change the base?
Conversation
prepare to create prefixed_items_basecoro using the same code
keep set for prefixed_items
Reviewer's Guide by SourceryThis PR adds support for iterating over multiple prefixes in ijson by introducing a new Class diagram for the new prefixed_items functionalityclassDiagram
class ItemsCommonBasecoro {
+builder_t builder
+int pending_builder_reset
+PyObject *target_send
+PyObject *prefix
+int object_depth
+yajl2_state *module_state
}
class PrefixedItemsBasecoro {
+PyObject *target_send
+PyObject *prefix
+int object_depth
+int pending_builder_reset
+builder_t builder
+yajl2_state *module_state
+PyObject* prefixed_items_basecoro_send_impl(PyObject *self, PyObject *path, PyObject *event, PyObject *value)
}
class ItemsBasecoro {
+PyObject *target_send
+PyObject *prefix
+int object_depth
+int pending_builder_reset
+builder_t builder
+yajl2_state *module_state
}
ItemsCommonBasecoro <|-- PrefixedItemsBasecoro
ItemsCommonBasecoro <|-- ItemsBasecoro
PrefixedItemsBasecoro <|-- PrefixedItemsGen
PrefixedItemsBasecoro <|-- PrefixedItemsAsync
class PrefixedItemsGen {
+reading_generator_t reading_gen
}
class PrefixedItemsAsync {
+async_reading_generator *reading_generator
}
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @wodny - I've reviewed your changes and they look great!
Here's what I looked at during the review
- 🟡 General issues: 1 issue found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟡 Complexity: 1 issue found
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
N_N(retval = items_common_send_top(self, path, event, value)); | ||
if (retval == Py_None) | ||
Py_RETURN_NONE; | ||
PyObject *tuple = PyTuple_Pack(2, path, retval); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): Add NULL check after PyTuple_Pack
PyTuple_Pack can return NULL on error. Should check the return value and handle the error case appropriately.
PyObject *tuple = PyTuple_Pack(2, path, retval); | |
PyObject *tuple = PyTuple_Pack(2, path, retval); | |
if (tuple == NULL) { | |
return NULL; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct, but it is not the only place with lacking checks (kvitems_basecoro.c
, parse_basecoro.c
). Probably a separate commit correcting all of them at once would be beneficial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have created a GitHub issue for your comment: #128
@@ -244,6 +252,14 @@ def _items_pipeline(backend, prefix, map_type, config): | |||
) | |||
|
|||
|
|||
def _prefixed_items_pipeline(backend, prefix, map_type, config): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider consolidating the parallel implementations into a single core prefixed pipeline with wrappers for non-prefixed functionality
The parallel implementations for prefixed and non-prefixed items create unnecessary complexity. We can simplify by making the prefixed version the core implementation and removing duplicate pipeline definitions. Here's how:
def _prefixed_items_pipeline(backend, prefix, map_type, config):
return (
(backend['prefixed_items_basecoro'], (prefix,), {'map_type': map_type}),
(backend['parse_basecoro'], [], {}),
(backend['basic_parse_basecoro'], [], config)
)
def _items_pipeline(backend, prefix, map_type, config):
# Reuse prefixed pipeline with wrapper to strip prefix
@utils.coroutine
def strip_prefix(target):
while True:
prefix, value = (yield)
target.send(value)
return _prefixed_items_pipeline(backend, prefix, map_type, config)[:-1] + \
((strip_prefix, [], {}),)
This approach:
- Makes prefixed_items_pipeline the core implementation
- Removes duplicate pipeline logic
- Maintains all functionality while reducing code complexity
- Keeps the separation between prefixed and non-prefixed interfaces
The same pattern can be applied to the generator and async implementations.
I prepared some proposals to check if the way I tackle the problem from #109 is acceptable.
I implemented the changes in Python and
yajl2_c
backends.The commit stack contains two variants:
items()
,items()
and the newprefixed_items()
, but has a drawback of performing 10% worse than the original.I benchmarked the code both using defaults and some input I parse in production which is a map of very long lists of maps for which using
items()
is essential.Both versions use
items_common.h
which containsstatic inline
functions used byitems()
and the newprefixed_items()
. An analogy to thebuilder.h
file. The code is all that I was able to move fromitems_basecoro.c
. Because the code is shared, in the non-set-items()
variant a macro is used to define the comparison method (PyObject_RichCompareBool
vsPySet_Contains
). The shared function is divided into top and bottom halves so the calling side can do a specializedCORO_SEND
in between and avoid mangling a generic tuple. Division into halves required one additional state information -pending_builder_reset
.Before I did the
items_common.h
I tried extending pipelines but the built-in tests changed my mind. All elements of the pipelines have input and output formats already defined and eg. the items basecoro is tested directly with events as inputs.I think I managed to avoid additional tuple creation/destruction with the exception of the pure Python code where I assumed it won't matter. I can always write two specialized functions.
I extended the list of
if
s inparse_basecoro.c
to detect the newPrefixedItemsBasecoro
:Everything else is just copying
items*.{c,h}
as a template and adding theprefixed
prefix ;) I made it in a separate commit ("add prefixed_items") because it's the biggest one and I didn't want to mix it with the actual changes.All tests (including memleaks) pass without modification.
I did not touch the
kvitems()
part yet nor did I write any tests forprefixed_items()
in case you decide the changes should be done in a completely different manner.Summary by Sourcery
Add support for iterating over multiple prefixes in the ijson library, enhancing data parsing capabilities. This includes new coroutine and generator implementations for prefixed_items, with shared code to maintain efficiency.
New Features:
Enhancements: