Skip to content

Refactor complex child column layout metadata#55

Merged
JingsongLi merged 2 commits into
apache:mainfrom
JingsongLi:codex/explicit-complex-child-layout
Jun 8, 2026
Merged

Refactor complex child column layout metadata#55
JingsongLi merged 2 commits into
apache:mainfrom
JingsongLi:codex/explicit-complex-child-layout

Conversation

@JingsongLi

Copy link
Copy Markdown
Contributor

Summary

Make ARRAY/MAP physical child layout explicit in the in-memory metadata used by bucket read/write paths. This keeps the on-disk format unchanged while removing order-based inference for nested complex child columns.

Changes

  • Add ChildColumnRole plus length_physical_index to ChildColumnMeta so child columns know whether they are list elements, map keys, or map values, and which physical lengths column owns them.
  • Refactor complex type expansion and writer traversal to locate nested children through explicit layout metadata instead of sibling index offsets.
  • Refactor reader reassembly to rebuild ARRAY/MAP columns from explicit roles and length ownership rather than adjacent-child heuristics.
  • Add a unit test covering the physical layout plan for a nested ARRAY<MAP<INT, ARRAY<UTF8>>> shape.

Testing

  • cargo fmt --check
  • git diff --check
  • cargo test -p paimon-mosaic-core --lib
  • cargo test -p paimon-mosaic-core --test array_type_test
  • cargo test -p paimon-mosaic-core --test gen_fixtures

Notes

cargo test -p paimon-mosaic-core was also run. It passed through unit, ARRAY/MAP, encoding, format, and fixture tests, then failed in interop_read_test because /tmp/mosaic_interop/python_written.mosaic was not present; that test expects externally generated interop fixtures.

@QuakeWang QuakeWang left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Non-blocking: the new test mainly verifies the metadata layout. It would be useful to add a real roundtrip case such as List<Map<Int32, List<Utf8>>>, covering the writer pending queue and reader reassembly together.

@JingsongLi JingsongLi merged commit 961e74f into apache:main Jun 8, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants