[core] Support append read/write for MAP shared-shredding#8392
[core] Support append read/write for MAP shared-shredding#8392lxy-9602 wants to merge 5 commits into
Conversation
48b8d94 to
82dec42
Compare
| } | ||
|
|
||
| @Override | ||
| public ColumnVector[] getChildren() { |
There was a problem hiding this comment.
This vector is still exposed through the vectorized read path: ShreddingFormatReader copies the delegate VectorizedRowIterator, so callers such as ArrowVectorizedBatchConverter can consume the returned batch directly. The Arrow map writer calls mapColumnVector.getChildren()[0/1], so returning null here makes a shared-shredding MAP column fail with an NPE when exported through the vectorized Arrow path. Could we either materialize key/value child vectors for this wrapper or force this reader to fall back to a row iterator for shared-shredding maps?
There was a problem hiding this comment.
Thanks for the suggestion! Updated MapSharedShreddingReadPlan to materialize MAPs columnarly with key/value child vectors. Also added an e2e append read/write test covering shared-shredding MAP values with the supported nested types.
Purpose
This is a subtask of MAP shared-shredding support.
This PR adds append-only read/write support for MAP shared-shredding layout. MAP fields configured with
map.storage-layout=shared-shreddingare written with a physical shredded layout and per-field metadata in ORC/Parquet footers, and are reconstructed back to logical MAP values during reads.The read path is also refactored to use a unified shredding read-plan model for both Variant and MAP shared-shredding. ORC/Parquet readers now create a
ShreddingReadPlanfrom file metadata/schema, read the corresponding physical row type, and wrap the raw reader with a commonShreddingFormatReaderto assemble physical batches back into logical batches. Variant and MAP shared-shredding therefore share the same high-level read flow instead of having separate format-specific conversion models.Supported scope
Limitations
Rewrite/compaction paths are not supported for MAP shared-shredding yet. Unsupported table modes and incompatible combinations, such as primary-key tables or combining Variant with MAP shared-shredding, are rejected by validation or fail fast in write paths.
Tests
Added coverage for MAP shared-shredding metadata, write plans, read plans, append table end-to-end read/write, projection, null values, overflow values, adaptive column restoration, layout switching, data evolution interaction, and validation of unsupported configurations. Existing Variant shredding tests continue to cover the refactored Variant read path.