UnsafeAppendBoolToBitmap for dictionary and REE builders#758
Conversation
c2b7711 to
374a42f
Compare
UnsafeAppendBoolToBitmap for dictionary and REE buildersUnsafeAppendBoolToBitmap for dictionary and REE builders
374a42f to
4551ae0
Compare
UnsafeAppendBoolToBitmap for dictionary and REE buildersUnsafeAppendBoolToBitmap for dictionary and REE builders
|
Can you fix the lint issues? |
|
You might also need to rebase from main |
4551ae0 to
1834882
Compare
|
@zeroshade just rebased and fixed all the errors. Lint is still failing due to unformatted import on main: arrow-go/arrow/datatype_fixedwidth.go Line 20 in 1769405 I can change it here if you want me to :) |
arrow/array/dictionary.go
Outdated
| dictionaryBuilder | ||
| } | ||
|
|
||
| func (b *dictBuilder[T]) UnsafeAppend(v T) error { |
There was a problem hiding this comment.
NOTE: Since Append() returns error, I had done the same thing for and UnsafeAppend(). However, I noticed the only thing that could return an error from appendValue, appendBytes or unsafeAppendValue is b.memoTable.GetOrInsert(val) since val is interface{} and the type ID could be wrong.
Since v is typed explicitly with T in Append()/UnsafeAppend(), it is safe to assume there should never be any error, unless there is an implementation bug, in which case a panic is better.
In the latest commit, I changed UnsafeAppend() to be infallible, and added TODO comments for all the other Append(), as that would be a breaking change to the public API.
What do you think?
b450496 to
72a283f
Compare
I actually just merged that fix. So you can just rebase it to fix the lint :) |
For more context: https://github.com/apache/arrow-go/pull/558/changes#r2758798540 This commit makes `UnsafeAppendBoolToBitmap()` and `RunEndEncodedBuilder` panic. The current implementation pulled from `arrow/array/builder.go` would leave the builder in an invalid state that would panic when trying to finish a record batch.
This commit adds a proper implementation of `UnsafeAppend` and `UnsafeAppendBoolToBitmap` to the dictionary-encoded array builder. This allows using `Reserve(n)` followed by `n` calls to `UnsafeAppend` or `UnsafeAppendBoolToBitmap`. I also added a test to it.
72a283f to
2cb62c1
Compare
zeroshade
left a comment
There was a problem hiding this comment.
I rebased with the lint fix, once CI passes this is good and I'll merge
Rationale for this change
For more context: https://github.com/apache/arrow-go/pull/558/changes#r2758798540
The Builders for REE and dict-encoded arrays were inheriting the
UnsafeAppendBoolToBitmap()andUnsafeAppend()from theBuilderinterface defined inarrow/array/builder.go. The default implementation does not bump the length of the inneridxBuilderorrunEndsBuilder, which causes it to be in an inconsistent state where the parent has greater length than the index/run-ends builder. This causes a panic when instancing a record batch.What changes are included in this PR?
This PR makes
UnsafeAppendBoolToBitmap()forRunEndEncodedBuilderpanic as there is not a good semantic meaning for this method in case of REE.It also implements
UnsafeAppendBoolToBitmap()andUnsafeAppend()for the Dictionary builders. This allows users to usebldr.Reserve(n)followed byncalls toUnsafeAppend()orUnsafeAppendBoolToBitmap()without the need to check if the array needs to grow everytime.Are these changes tested?
Yes.
Are there any user-facing changes?
No. The API was already there (inherited from
Builder), but the behavior was incorrect.