Skip to content

add shred_variant support for LargeUtf8 and LargeBinary#9554

Merged
scovich merged 1 commit intoapache:mainfrom
sdf-jkl:shred-LargeUtf8-LargeBinary
Mar 16, 2026
Merged

add shred_variant support for LargeUtf8 and LargeBinary#9554
scovich merged 1 commit intoapache:mainfrom
sdf-jkl:shred-LargeUtf8-LargeBinary

Conversation

@sdf-jkl
Copy link
Contributor

@sdf-jkl sdf-jkl commented Mar 13, 2026

Which issue does this PR close?

Rationale for this change

check issue.

What changes are included in this PR?

Add shred_variant support for LargeUtf8 and LargeBinary

Are these changes tested?

Yes, unit tests.

Are there any user-facing changes?

No

@github-actions github-actions bot added the parquet-variant parquet-variant* crates label Mar 13, 2026
@sdf-jkl
Copy link
Contributor Author

sdf-jkl commented Mar 13, 2026

@klion26 @codephage2020 @scovich Please take a look when you are available. Thanks!

Copy link
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I assume #9518 is a fast-follow?

@sdf-jkl
Copy link
Contributor Author

sdf-jkl commented Mar 13, 2026

@scovich It is, thanks!

Copy link
Member

@klion26 klion26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the improvement

@scovich scovich merged commit 55ff6eb into apache:main Mar 16, 2026
17 checks passed
@sdf-jkl sdf-jkl deleted the shred-LargeUtf8-LargeBinary branch March 16, 2026 14:32
alamb pushed a commit that referenced this pull request Mar 20, 2026
#9576)

# Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax.
-->

- Closes #9526 

# Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

`shred_variant` already supports Binary and LargeBinary types (#9525,
#9554), but unshred_variant does not handle these types. This means
shredded Binary/LargeBinary columns cannot be converted back to
unshredded VariantArrays.

# What changes are included in this PR?

Adds unshred_variant support for DataType::Binary and
DataType::LargeBinary in parquet-variant-compute/src/unshred_variant.rs:
  - New enum variants PrimitiveBinary and PrimitiveLargeBinary
  - Match arms in append_row and try_new_opt
  - AppendToVariantBuilder impls for BinaryArray and LargeBinaryArray



# Are these changes tested?

Yes

# Are there any user-facing changes?

No breaking changes

---------

Signed-off-by: Kunal Singh Dadhwal <kunalsinghdadhwal@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Variant] Add shred_variant support for LargeUtf8 and LargeBinary types

3 participants