Skip to content

[core] Support append read/write for MAP shared-shredding#8392

Open
lxy-9602 wants to merge 5 commits into
apache:masterfrom
lxy-9602:append-shredding-write2
Open

[core] Support append read/write for MAP shared-shredding#8392
lxy-9602 wants to merge 5 commits into
apache:masterfrom
lxy-9602:append-shredding-write2

Conversation

@lxy-9602

@lxy-9602 lxy-9602 commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Purpose

This is a subtask of MAP shared-shredding support.

This PR adds append-only read/write support for MAP shared-shredding layout. MAP fields configured with map.storage-layout=shared-shredding are written with a physical shredded layout and per-field metadata in ORC/Parquet footers, and are reconstructed back to logical MAP values during reads.

The read path is also refactored to use a unified shredding read-plan model for both Variant and MAP shared-shredding. ORC/Parquet readers now create a ShreddingReadPlan from file metadata/schema, read the corresponding physical row type, and wrap the raw reader with a common ShreddingFormatReader to assemble physical batches back into logical batches. Variant and MAP shared-shredding therefore share the same high-level read flow instead of having separate format-specific conversion models.

Supported scope

  • Append-only tables.
  • ORC and Parquet data files.
  • MAP shared-shredding write path with footer metadata.
  • MAP shared-shredding read path with schema evolution/projection.
  • Variant read materialization moved into the shared shredding read-plan framework.

Limitations

Rewrite/compaction paths are not supported for MAP shared-shredding yet. Unsupported table modes and incompatible combinations, such as primary-key tables or combining Variant with MAP shared-shredding, are rejected by validation or fail fast in write paths.

Tests

Added coverage for MAP shared-shredding metadata, write plans, read plans, append table end-to-end read/write, projection, null values, overflow values, adaptive column restoration, layout switching, data evolution interaction, and validation of unsupported configurations. Existing Variant shredding tests continue to cover the refactored Variant read path.

@lxy-9602 lxy-9602 force-pushed the append-shredding-write2 branch from 48b8d94 to 82dec42 Compare June 30, 2026 09:46
}

@Override
public ColumnVector[] getChildren() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This vector is still exposed through the vectorized read path: ShreddingFormatReader copies the delegate VectorizedRowIterator, so callers such as ArrowVectorizedBatchConverter can consume the returned batch directly. The Arrow map writer calls mapColumnVector.getChildren()[0/1], so returning null here makes a shared-shredding MAP column fail with an NPE when exported through the vectorized Arrow path. Could we either materialize key/value child vectors for this wrapper or force this reader to fall back to a row iterator for shared-shredding maps?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! Updated MapSharedShreddingReadPlan to materialize MAPs columnarly with key/value child vectors. Also added an e2e append read/write test covering shared-shredding MAP values with the supported nested types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants