-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[c++] Fix offsets for nullable columns #3611
base: main
Are you sure you want to change the base?
[c++] Fix offsets for nullable columns #3611
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3611 +/- ##
==========================================
+ Coverage 86.27% 86.31% +0.04%
==========================================
Files 55 55
Lines 6375 6380 +5
==========================================
+ Hits 5500 5507 +7
+ Misses 875 873 -2
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to add some additional tests:
- Call Sparse/Dense NDArray
write
with an array that has a validity buffer, and ensure it acts as expected. - For DataFrame, need to verify expected behavior if an index (dimension) column is written with a validity buffer.
- In test_dataframe, I'd also test the "invalid" timestamp types (NaT) alongside the other nullable types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testing requests only at this point - the C++ code looks OK (I did a quick pass only - would be good to get @XanthosXanthopoulos or @jp-dark to review it in more depth)
Co-authored-by: Julia Dark <[email protected]>
d89b2f4
to
4271442
Compare
Issue and/or context:
[sc-62104]
We were adding the offset to the
ArrowArray
's validity buffer directly which is incorrect because the validity is stored as a bit, not as auint8_t
.Changes:
util::bitmap_to_uint8
which takes a pointer to a bitmap, length, and offset, and returns avector<uint8_t>
representation; if the bitmap is a nullptr, then return a nulloptManagedQuery::_cast_bool_data
, formallyutil::cast_bit_to_uint8
, andManagedQuery::_cast_validity_buffer
, which takes the validity buffer of the passed inArrowArray
and returnsvector<uint8_t>
with the properly shifted offsetManagedQuery::setup_write_column
andColumnBuffer::set_data
to take in anstd::optional<std::vector<uint8_t>>
where the validity buffer has already been offset and casted to uint8