-
Notifications
You must be signed in to change notification settings - Fork 30
[c++] Optimize memory management on write path #4311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4311 +/- ##
==========================================
- Coverage 86.85% 86.85% -0.01%
==========================================
Files 137 138 +1
Lines 20705 20712 +7
Branches 15 16 +1
==========================================
+ Hits 17983 17989 +6
- Misses 2722 2723 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
bkmartinjr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there any measurements of the time/space impact of this PR?
I have ingested a couple of h5ad files and the result was about 20% faster with this PR with lower memory usage as well |
| } else { | ||
| // Casting is needed and casted data ownership should pass to the column | ||
|
|
||
| std::unique_ptr<std::byte[]> data_buffer = std::make_unique_for_overwrite<std::byte[]>( | ||
| array->length * sizeof(DiskType)); | ||
|
|
||
| std::span<UserType> original_data_buffer_view(buf, array->length); | ||
| std::span<DiskType> data_buffer_view(reinterpret_cast<DiskType*>(data_buffer.get()), array->length); | ||
|
|
||
| for (int64_t i = 0; i < array->length; ++i) { | ||
| data_buffer_view[i] = static_cast<DiskType>(original_data_buffer_view[i]); | ||
| } | ||
|
|
||
| setup_write_column( | ||
| schema->name, | ||
| array->length, | ||
| std::move(data_buffer), | ||
| (uint64_t*)nullptr, | ||
| _cast_validity_buffer_ptr(array)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should just throw. Since arrow/pyarrow/nanoarrow have an explicit schema, we should just require the user pass us data with appropriate types. The only case we want to auto-cast is when the user provides us with a dictionary where the values match the on disk data type of a non-enumerated column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change may break existing unit tests in Python/R and will definitely require new unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this change will require non trivial changes and may break current workflows, should be in a separate PR?
|
|
||
| CArrayColumnBuffer() = delete; | ||
| CArrayColumnBuffer(const CArrayColumnBuffer&) = delete; | ||
| CArrayColumnBuffer(CArrayColumnBuffer&&) = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the following warning when compiling:
/home/jules/Software/TileDB-Inc/TileDB-SOMA/libtiledbsoma/src/soma/column_buffer.h:401:5: warning: explicitly defaulted move constructor is implicitly deleted [-Wdefaulted-function-deleted]
401 | CArrayColumnBuffer(CArrayColumnBuffer&&) = default;
| ^
/home/jules/Software/TileDB-Inc/TileDB-SOMA/libtiledbsoma/src/soma/column_buffer.h:375:28: note: move constructor of 'CArrayColumnBuffer' is implicitly deleted because base class 'ReadColumnBuffer' has a deleted move constructor
|
|
||
| VectorColumnBuffer() = delete; | ||
| VectorColumnBuffer(const VectorColumnBuffer&) = delete; | ||
| VectorColumnBuffer(VectorColumnBuffer&&) = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the following warning when compiling:
/home/jules/Software/TileDB-Inc/TileDB-SOMA/libtiledbsoma/src/soma/column_buffer.h:474:5: warning: explicitly defaulted move constructor is implicitly deleted [-Wdefaulted-function-deleted]
474 | VectorColumnBuffer(VectorColumnBuffer&&) = default;
| ^
/home/jules/Software/TileDB-Inc/TileDB-SOMA/libtiledbsoma/src/soma/column_buffer.h:423:28: note: move constructor of 'VectorColumnBuffer' is implicitly deleted because base class 'ReadColumnBuffer' has a deleted move constructor
14d71e2 to
f2da391
Compare
Issue and/or context: SOMA-528 SOMA-714 SOMA-688
Changes:
This PR changes the memory management for the read/write operations implemented by
ManagedQuery. Specifically:std::vectorbacked buffers with C++ arrays wrapped instd::unique_ptrSOMAArrayprovided schema to type casts in advanceNotes for Reviewer: