feat: manifest writer and adapter impl part2 #216

dongxiao1198 · 2025-09-09T03:18:27Z

1 parse v1v2v3 manifest schema in adapter
2 convert ManifestFile&ManifestEntry into ArrowArray
3 add e2e case

1 parse v1v2v3 manifest schema in adapter 2 convert ManifestFile&ManifestEntry into ArrowArray 3 add e2e case

src/iceberg/manifest_adapter.h

src/iceberg/manifest_adapter.cc

src/iceberg/v1_metadata.cc

test/manifest_reader_writer_test.cc

src/iceberg/util/macros.h

src/iceberg/CMakeLists.txt

src/iceberg/manifest_adapter.h

src/iceberg/manifest_adapter.cc

src/iceberg/v1_metadata.cc

wgtmac · 2025-09-22T02:59:15Z

src/iceberg/arrow/nanoarrow_error_transform_internal.h

+
+#define ICEBERG_NANOARROW_RETURN_IF_NOT_OK(status, error)              \
+  if (status != NANOARROW_OK) [[unlikely]] {                           \
+    return InvalidArrowData("Nanoarrow error msg: {}", error.message); \


Suggested change

return InvalidArrowData("Nanoarrow error msg: {}", error.message); \

return InvalidArrowData("nanoarrow error: {}", error.message); \

wgtmac · 2025-09-22T03:02:48Z

src/iceberg/file_writer.h

  ///
  /// \return Status of write results.
-  virtual Status Write(ArrowArray data) = 0;
+  virtual Status Write(ArrowArray& data) = 0;


ArrowArray is a simple C struct and will be moved away (be invalid) after the call. Therefore it looks a little bit strange to use a reference here. Perhaps changing it to ArrowArray* and document the behavior?

wgtmac · 2025-09-22T03:03:55Z

src/iceberg/manifest_adapter.cc

+#include "iceberg/schema_internal.h"
+#include "iceberg/util/checked_cast.h"
+#include "iceberg/util/macros.h"
+#include "nanoarrow/nanoarrow.h"


Suggested change

#include "nanoarrow/nanoarrow.h"

#include <nanoarrow/nanoarrow.h>

wgtmac · 2025-09-22T04:15:19Z

src/iceberg/manifest_adapter.h

+                            const std::span<const uint8_t>& value);
+
+ protected:
+  std::shared_ptr<ArrowArray> array_;


It is not recommended to use smart pointers on a C struct, especially when we don't add a custom deleter. Can we revert this line?

wgtmac · 2025-09-22T14:24:41Z

src/iceberg/manifest_adapter.cc

+#define NANOARROW_RETURN_IF_FAILED(status)                       \
+  if (status != NANOARROW_OK) [[unlikely]] {                     \
+    return InvalidArrowData("Nanoarrow error code: {}", status); \
+  }


Suggested change

#define NANOARROW_RETURN_IF_FAILED(status) \

if (status != NANOARROW_OK) [[unlikely]] { \

return InvalidArrowData("Nanoarrow error code: {}", status); \

}

We should use ICEBERG_NANOARROW_RETURN_IF_NOT_OK consistently.

wgtmac · 2025-09-23T05:47:16Z

src/iceberg/partition_spec.h

  /// \brief Get a view of the partition fields.
  std::span<const PartitionField> fields() const;

+  Result<std::shared_ptr<Schema>> partition_schema();


Suggested change

Result<std::shared_ptr<Schema>> partition_schema();

Result<std::shared_ptr<Schema>> GetPartitionSchema();

This is not a trivial getter so we cannot use the snake case form.

BTW, can we add a test case for this?

wgtmac · 2025-09-23T05:49:02Z

src/iceberg/partition_spec.cc

+    partition_fields.emplace_back(partition_field.field_id(),
+                                  std::string(partition_field.name()),
+                                  std::move(result_type),
+                                  true  // optional


Suggested change

true // optional

/*optional=*/true

wgtmac · 2025-09-23T05:50:52Z

src/iceberg/partition_spec.cc

+    ICEBERG_ASSIGN_OR_RAISE(auto source_field,
+                            schema_->FindFieldById(partition_field.source_id()));
+    if (!source_field.has_value()) {
+      return InvalidSchema("Cannot find source field for partition field:{}",


Add a TODO comment to use unknown type when source field is missing.

wgtmac · 2025-09-23T05:55:10Z

test/manifest_reader_writer_test.cc

+
+  auto expected_entries = PreparePartitionedTestData();
+  auto write_manifest_path = CreateNewTempFilePath();
+  TestWriteManifest(write_manifest_path, partition_spec, expected_entries);


TestManifestReadingByPath is missing? Same for below.

wgtmac · 2025-09-23T05:57:51Z

src/iceberg/v2_metadata.cc

+      ManifestEntry::kDataFileFieldId,
+      ManifestEntry::kSequenceNumber.field_id(),
+      ManifestEntry::kFileSequenceNumber.field_id(),


Suggested change

ManifestEntry::kDataFileFieldId,

ManifestEntry::kSequenceNumber.field_id(),

ManifestEntry::kFileSequenceNumber.field_id(),

ManifestEntry::kSequenceNumber.field_id(),

ManifestEntry::kFileSequenceNumber.field_id(),

ManifestEntry::kDataFileFieldId,

Reorder them to match the actual schema.

xiao.dong added 4 commits September 9, 2025 11:09

feat: manifest writer and adapter impl part2

0b20283

1 parse v1v2v3 manifest schema in adapter 2 convert ManifestFile&ManifestEntry into ArrowArray 3 add e2e case

fix warning

0f1699c

fix schema release

8fb60aa

fix windows build

30320bd

dongxiao1198 marked this pull request as ready for review September 9, 2025 04:47

fix adapter init

8d4b744

HeartLinked reviewed Sep 11, 2025

View reviewed changes

xiao.dong added 2 commits September 12, 2025 09:23

change some func into static

1ca9b0c

fix partition literal support type list

478a6ea

dongxiao1198 force-pushed the manifest_adapter_impl branch from f79e978 to 478a6ea Compare September 12, 2025 02:04

wgtmac reviewed Sep 12, 2025

View reviewed changes

xiao.dong and others added 5 commits September 12, 2025 16:08

fix comments

c860fe2

fix lint

a0e342f

Merge branch 'main' into manifest_adapter_impl

bbc6849

fix cpplint

74f17e3

remove useless virtual define

bafed12

wgtmac suggested changes Sep 23, 2025

View reviewed changes

	return InvalidArrowData("Nanoarrow error msg: {}", error.message); \
	return InvalidArrowData("nanoarrow error: {}", error.message); \

	#include "nanoarrow/nanoarrow.h"
	#include <nanoarrow/nanoarrow.h>

	Result<std::shared_ptr<Schema>> partition_schema();
	Result<std::shared_ptr<Schema>> GetPartitionSchema();

feat: manifest writer and adapter impl part2 #216

Are you sure you want to change the base?

feat: manifest writer and adapter impl part2 #216

Uh oh!

Conversation

dongxiao1198 commented Sep 9, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!