-
Notifications
You must be signed in to change notification settings - Fork 55
feat: manifest writer and adapter impl part2 #216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
1 parse v1v2v3 manifest schema in adapter 2 convert ManifestFile&ManifestEntry into ArrowArray 3 add e2e case
f79e978
to
478a6ea
Compare
|
||
#define ICEBERG_NANOARROW_RETURN_IF_NOT_OK(status, error) \ | ||
if (status != NANOARROW_OK) [[unlikely]] { \ | ||
return InvalidArrowData("Nanoarrow error msg: {}", error.message); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return InvalidArrowData("Nanoarrow error msg: {}", error.message); \ | |
return InvalidArrowData("nanoarrow error: {}", error.message); \ |
/// | ||
/// \return Status of write results. | ||
virtual Status Write(ArrowArray data) = 0; | ||
virtual Status Write(ArrowArray& data) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ArrowArray
is a simple C struct and will be moved away (be invalid) after the call. Therefore it looks a little bit strange to use a reference here. Perhaps changing it to ArrowArray*
and document the behavior?
#include "iceberg/schema_internal.h" | ||
#include "iceberg/util/checked_cast.h" | ||
#include "iceberg/util/macros.h" | ||
#include "nanoarrow/nanoarrow.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include "nanoarrow/nanoarrow.h" | |
#include <nanoarrow/nanoarrow.h> |
const std::span<const uint8_t>& value); | ||
|
||
protected: | ||
std::shared_ptr<ArrowArray> array_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not recommended to use smart pointers on a C struct, especially when we don't add a custom deleter. Can we revert this line?
#define NANOARROW_RETURN_IF_FAILED(status) \ | ||
if (status != NANOARROW_OK) [[unlikely]] { \ | ||
return InvalidArrowData("Nanoarrow error code: {}", status); \ | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#define NANOARROW_RETURN_IF_FAILED(status) \ | |
if (status != NANOARROW_OK) [[unlikely]] { \ | |
return InvalidArrowData("Nanoarrow error code: {}", status); \ | |
} |
We should use ICEBERG_NANOARROW_RETURN_IF_NOT_OK
consistently.
/// \brief Get a view of the partition fields. | ||
std::span<const PartitionField> fields() const; | ||
|
||
Result<std::shared_ptr<Schema>> partition_schema(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Result<std::shared_ptr<Schema>> partition_schema(); | |
Result<std::shared_ptr<Schema>> GetPartitionSchema(); |
This is not a trivial getter so we cannot use the snake case form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, can we add a test case for this?
partition_fields.emplace_back(partition_field.field_id(), | ||
std::string(partition_field.name()), | ||
std::move(result_type), | ||
true // optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true // optional | |
/*optional=*/true |
ICEBERG_ASSIGN_OR_RAISE(auto source_field, | ||
schema_->FindFieldById(partition_field.source_id())); | ||
if (!source_field.has_value()) { | ||
return InvalidSchema("Cannot find source field for partition field:{}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a TODO comment to use unknown type when source field is missing.
|
||
auto expected_entries = PreparePartitionedTestData(); | ||
auto write_manifest_path = CreateNewTempFilePath(); | ||
TestWriteManifest(write_manifest_path, partition_spec, expected_entries); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestManifestReadingByPath
is missing? Same for below.
ManifestEntry::kDataFileFieldId, | ||
ManifestEntry::kSequenceNumber.field_id(), | ||
ManifestEntry::kFileSequenceNumber.field_id(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ManifestEntry::kDataFileFieldId, | |
ManifestEntry::kSequenceNumber.field_id(), | |
ManifestEntry::kFileSequenceNumber.field_id(), | |
ManifestEntry::kSequenceNumber.field_id(), | |
ManifestEntry::kFileSequenceNumber.field_id(), | |
ManifestEntry::kDataFileFieldId, |
Reorder them to match the actual schema.
1 parse v1v2v3 manifest schema in adapter
2 convert ManifestFile&ManifestEntry into ArrowArray
3 add e2e case