Refactor complex child column layout metadata#55
Merged
JingsongLi merged 2 commits intoJun 8, 2026
Conversation
QuakeWang
approved these changes
Jun 8, 2026
QuakeWang
left a comment
Contributor
There was a problem hiding this comment.
LGTM
Non-blocking: the new test mainly verifies the metadata layout. It would be useful to add a real roundtrip case such as List<Map<Int32, List<Utf8>>>, covering the writer pending queue and reader reassembly together.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Make ARRAY/MAP physical child layout explicit in the in-memory metadata used by bucket read/write paths. This keeps the on-disk format unchanged while removing order-based inference for nested complex child columns.
Changes
ChildColumnRolepluslength_physical_indextoChildColumnMetaso child columns know whether they are list elements, map keys, or map values, and which physical lengths column owns them.ARRAY<MAP<INT, ARRAY<UTF8>>>shape.Testing
cargo fmt --checkgit diff --checkcargo test -p paimon-mosaic-core --libcargo test -p paimon-mosaic-core --test array_type_testcargo test -p paimon-mosaic-core --test gen_fixturesNotes
cargo test -p paimon-mosaic-corewas also run. It passed through unit, ARRAY/MAP, encoding, format, and fixture tests, then failed ininterop_read_testbecause/tmp/mosaic_interop/python_written.mosaicwas not present; that test expects externally generated interop fixtures.