-
Notifications
You must be signed in to change notification settings - Fork 3.9k
GH-48168: [C++][Parquet] Fix setting column-specific options when writing an encrypted Dataset #48170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This duplicates the Python test so we don't really need both.
| content_defined_chunking_options_( | ||
| properties.content_defined_chunking_options()) {} | ||
| properties.content_defined_chunking_options()) { | ||
| for (const auto& [col_path, col_props] : properties.column_properties_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed that WriterProperties and WriterProperties::Builder use different forms to represent column properties. Is it possible to simplify this by using the same form?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the builder needs to work this way so that different column-specific properties can be changed as well as changing default column properties without the order of operations affecting the result. I don't think the WriterProperties can be changed to use the Builder's representation without breaking the API as ColumnProperties is part of the public API and returned as a reference:
arrow/cpp/src/parquet/properties.h
Line 911 in e6ad2c4
| const ColumnProperties& column_properties( |
rok
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, strange we don't currently do this. Just a minor change request.
| content_defined_chunking_options_( | ||
| properties.content_defined_chunking_options()) {} | ||
| properties.content_defined_chunking_options()) { | ||
| for (const auto& [col_path, col_props] : properties.column_properties_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this logic is non-trivial we might want to move it out of the header?
Rationale for this change
Fixes creation of writer properties when writing a Parquet Dataset with encryption, so that column-specific settings aren't lost.
What changes are included in this PR?
Changes the
WriterProperties::Builderconstructor that uses existing properties so that it also sets any column-specific settings.Are these changes tested?
Yes.
Are there any user-facing changes?
Yes, this fixes a user-facing bug.