Skip to content

Conversation

dongxiao1198
Copy link
Contributor

1 parse v1v2v3 manifest schema in adapter
2 convert ManifestFile&ManifestEntry into ArrowArray
3 add e2e case

xiao.dong added 4 commits September 9, 2025 11:09
1 parse v1v2v3 manifest schema in adapter
2 convert ManifestFile&ManifestEntry into ArrowArray
3 add e2e case
@dongxiao1198 dongxiao1198 marked this pull request as ready for review September 9, 2025 04:47

#define ICEBERG_NANOARROW_RETURN_IF_NOT_OK(status, error) \
if (status != NANOARROW_OK) [[unlikely]] { \
return InvalidArrowData("Nanoarrow error msg: {}", error.message); \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return InvalidArrowData("Nanoarrow error msg: {}", error.message); \
return InvalidArrowData("nanoarrow error: {}", error.message); \

///
/// \return Status of write results.
virtual Status Write(ArrowArray data) = 0;
virtual Status Write(ArrowArray& data) = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ArrowArray is a simple C struct and will be moved away (be invalid) after the call. Therefore it looks a little bit strange to use a reference here. Perhaps changing it to ArrowArray* and document the behavior?

#include "iceberg/schema_internal.h"
#include "iceberg/util/checked_cast.h"
#include "iceberg/util/macros.h"
#include "nanoarrow/nanoarrow.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include "nanoarrow/nanoarrow.h"
#include <nanoarrow/nanoarrow.h>

const std::span<const uint8_t>& value);

protected:
std::shared_ptr<ArrowArray> array_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not recommended to use smart pointers on a C struct, especially when we don't add a custom deleter. Can we revert this line?

Comment on lines +31 to +34
#define NANOARROW_RETURN_IF_FAILED(status) \
if (status != NANOARROW_OK) [[unlikely]] { \
return InvalidArrowData("Nanoarrow error code: {}", status); \
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#define NANOARROW_RETURN_IF_FAILED(status) \
if (status != NANOARROW_OK) [[unlikely]] { \
return InvalidArrowData("Nanoarrow error code: {}", status); \
}

We should use ICEBERG_NANOARROW_RETURN_IF_NOT_OK consistently.

/// \brief Get a view of the partition fields.
std::span<const PartitionField> fields() const;

Result<std::shared_ptr<Schema>> partition_schema();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Result<std::shared_ptr<Schema>> partition_schema();
Result<std::shared_ptr<Schema>> GetPartitionSchema();

This is not a trivial getter so we cannot use the snake case form.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, can we add a test case for this?

partition_fields.emplace_back(partition_field.field_id(),
std::string(partition_field.name()),
std::move(result_type),
true // optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
true // optional
/*optional=*/true

ICEBERG_ASSIGN_OR_RAISE(auto source_field,
schema_->FindFieldById(partition_field.source_id()));
if (!source_field.has_value()) {
return InvalidSchema("Cannot find source field for partition field:{}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a TODO comment to use unknown type when source field is missing.


auto expected_entries = PreparePartitionedTestData();
auto write_manifest_path = CreateNewTempFilePath();
TestWriteManifest(write_manifest_path, partition_spec, expected_entries);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestManifestReadingByPath is missing? Same for below.

Comment on lines +32 to +34
ManifestEntry::kDataFileFieldId,
ManifestEntry::kSequenceNumber.field_id(),
ManifestEntry::kFileSequenceNumber.field_id(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ManifestEntry::kDataFileFieldId,
ManifestEntry::kSequenceNumber.field_id(),
ManifestEntry::kFileSequenceNumber.field_id(),
ManifestEntry::kSequenceNumber.field_id(),
ManifestEntry::kFileSequenceNumber.field_id(),
ManifestEntry::kDataFileFieldId,

Reorder them to match the actual schema.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants