Skip to content

UnsafeAppendBoolToBitmap for dictionary and REE builders#758

Merged
zeroshade merged 4 commits intoapache:mainfrom
serramatutu:serramatutu/UnsafeAppendBoolToBitmap-ree
Apr 14, 2026
Merged

UnsafeAppendBoolToBitmap for dictionary and REE builders#758
zeroshade merged 4 commits intoapache:mainfrom
serramatutu:serramatutu/UnsafeAppendBoolToBitmap-ree

Conversation

@serramatutu
Copy link
Copy Markdown
Contributor

@serramatutu serramatutu commented Apr 10, 2026

Rationale for this change

For more context: https://github.com/apache/arrow-go/pull/558/changes#r2758798540

The Builders for REE and dict-encoded arrays were inheriting the UnsafeAppendBoolToBitmap() and UnsafeAppend() from the Builder interface defined in arrow/array/builder.go. The default implementation does not bump the length of the inner idxBuilder or runEndsBuilder, which causes it to be in an inconsistent state where the parent has greater length than the index/run-ends builder. This causes a panic when instancing a record batch.

What changes are included in this PR?

This PR makes UnsafeAppendBoolToBitmap() for RunEndEncodedBuilder panic as there is not a good semantic meaning for this method in case of REE.

It also implements UnsafeAppendBoolToBitmap() and UnsafeAppend() for the Dictionary builders. This allows users to use bldr.Reserve(n) followed by n calls to UnsafeAppend() or UnsafeAppendBoolToBitmap() without the need to check if the array needs to grow everytime.

Are these changes tested?

Yes.

Are there any user-facing changes?

No. The API was already there (inherited from Builder), but the behavior was incorrect.

@serramatutu serramatutu force-pushed the serramatutu/UnsafeAppendBoolToBitmap-ree branch from c2b7711 to 374a42f Compare April 10, 2026 23:26
@serramatutu serramatutu changed the title Implement UnsafeAppendBoolToBitmap for dictionary and REE builders Panic on UnsafeAppendBoolToBitmap for dictionary and REE builders Apr 10, 2026
@serramatutu serramatutu marked this pull request as ready for review April 10, 2026 23:28
@serramatutu serramatutu requested a review from zeroshade as a code owner April 10, 2026 23:28
@serramatutu serramatutu force-pushed the serramatutu/UnsafeAppendBoolToBitmap-ree branch from 374a42f to 4551ae0 Compare April 12, 2026 14:03
@serramatutu serramatutu changed the title Panic on UnsafeAppendBoolToBitmap for dictionary and REE builders UnsafeAppendBoolToBitmap for dictionary and REE builders Apr 12, 2026
@zeroshade
Copy link
Copy Markdown
Member

Can you fix the lint issues?

@zeroshade
Copy link
Copy Markdown
Member

You might also need to rebase from main

@serramatutu serramatutu force-pushed the serramatutu/UnsafeAppendBoolToBitmap-ree branch from 4551ae0 to 1834882 Compare April 14, 2026 09:58
@serramatutu
Copy link
Copy Markdown
Contributor Author

serramatutu commented Apr 14, 2026

@zeroshade just rebased and fixed all the errors.

Lint is still failing due to unformatted import on main:

I can change it here if you want me to :)

dictionaryBuilder
}

func (b *dictBuilder[T]) UnsafeAppend(v T) error {
Copy link
Copy Markdown
Contributor Author

@serramatutu serramatutu Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: Since Append() returns error, I had done the same thing for and UnsafeAppend(). However, I noticed the only thing that could return an error from appendValue, appendBytes or unsafeAppendValue is b.memoTable.GetOrInsert(val) since val is interface{} and the type ID could be wrong.

Since v is typed explicitly with T in Append()/UnsafeAppend(), it is safe to assume there should never be any error, unless there is an implementation bug, in which case a panic is better.

In the latest commit, I changed UnsafeAppend() to be infallible, and added TODO comments for all the other Append(), as that would be a breaking change to the public API.

What do you think?

@serramatutu serramatutu force-pushed the serramatutu/UnsafeAppendBoolToBitmap-ree branch 4 times, most recently from b450496 to 72a283f Compare April 14, 2026 10:30
@zeroshade
Copy link
Copy Markdown
Member

Lint is still failing due to unformatted import on main:

I actually just merged that fix. So you can just rebase it to fix the lint :)

For more context: https://github.com/apache/arrow-go/pull/558/changes#r2758798540

This commit makes `UnsafeAppendBoolToBitmap()` and `RunEndEncodedBuilder`
panic. The current implementation pulled from `arrow/array/builder.go`
would leave the builder in an invalid state that would panic when
trying to finish a record batch.
This commit adds a proper implementation of `UnsafeAppend` and
`UnsafeAppendBoolToBitmap` to the dictionary-encoded array builder.

This allows using `Reserve(n)` followed by `n` calls to `UnsafeAppend`
or `UnsafeAppendBoolToBitmap`.

I also added a test to it.
@zeroshade zeroshade force-pushed the serramatutu/UnsafeAppendBoolToBitmap-ree branch from 72a283f to 2cb62c1 Compare April 14, 2026 16:12
Copy link
Copy Markdown
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rebased with the lint fix, once CI passes this is good and I'll merge

@zeroshade zeroshade merged commit e0df4f9 into apache:main Apr 14, 2026
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants