Add has_true() and has_false() to BooleanArray#9511
Add has_true() and has_false() to BooleanArray#9511adriangb wants to merge 5 commits intoapache:mainfrom
Conversation
Short-circuiting methods that return early on the first matching chunk, avoiding full popcount scans. Useful for replacing common patterns like `true_count() == 0` or `true_count() == len`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Switch the no-nulls paths from BitChunks::iter_padded() (opaque iterator, prevents auto-vectorization) to UnalignedBitChunk (aligned &[u64] slice that LLVM can vectorize). Process chunks in blocks of 64 u64s with a fold + short-circuit between blocks. Worst-case full-scan at 65536 elements drops from ~255ns to ~50ns (5x), now 2x faster than true_count() (~100ns) thanks to simpler per-element ops (OR/AND fold vs popcount). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
run benchmark boolean_array |
|
🤖 Hi @adriangb, thanks for the request (#9511 (comment)).
Please choose one or more of these with You can also set environment variables on subsequent lines: Unsupported benchmarks: boolean_array. |
|
cc @Dandandan |
|
run benchmark boolean_array |
|
🤖 Hi @adriangb, thanks for the request (#9511 (comment)).
Please choose one or more of these with You can also set environment variables on subsequent lines: Unsupported benchmarks: boolean_array. |
|
run benchmark boolean_array |
|
🤖 Hi @adriangb, thanks for the request (#9511 (comment)).
Please choose one or more of these with You can also set environment variables on subsequent lines: Unsupported benchmarks: boolean_array. |
|
Benchmark job started for this request (job |
|
Benchmark job started for this request (job |
|
🤖 Arrow criterion benchmark running (GKE) | trigger |
|
Benchmark for this request failed. Last 20 lines of output: Click to expand |
|
🤖 Arrow criterion benchmark running (GKE) | trigger |
|
Benchmark for this request failed. Last 20 lines of output: Click to expand |
|
🤖 Arrow criterion benchmark running (GKE) | trigger |
|
Benchmark for this request failed. Last 20 lines of output: Click to expand |
|
🤖 Arrow criterion benchmark running (GKE) | trigger |
|
Benchmark for this request failed. Last 20 lines of output: Click to expand |
|
🤖 Arrow criterion benchmark running (GKE) | trigger |
|
Benchmark for this request failed. Last 20 lines of output: Click to expand |
|
🤖 Arrow criterion benchmark running (GKE) | trigger |
|
Benchmark for this request failed. Last 20 lines of output: Click to expand |
|
🤖 Arrow criterion benchmark running (GKE) | trigger |
|
Benchmark for this request failed. Last 20 lines of output: Click to expand |
|
🤖 Arrow criterion benchmark running (GKE) | trigger |
|
Benchmark for this request failed. Last 20 lines of output: Click to expand |
|
run benchmark record_batch |
|
Benchmark job started for this request (job |
|
🤖 Arrow criterion benchmark running (GKE) | trigger |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Details
Resource Usagebase (merge-base)
branch
|
|
run benchmark boolean_array |
|
Benchmark job started for this request (job |
|
🤖 Arrow criterion benchmark running (GKE) | trigger |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger New benchmark — branch-only results (no baseline comparison) Details
Resource Usagebranch
|
|
run benchmark boolean_array |
|
Benchmark job started for this request (job |
|
🤖 Arrow criterion benchmark running (GKE) | trigger |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger New benchmark — branch-only results (no baseline comparison) Details
Resource Usagebranch
|
Motivation
When working with
BooleanArray, a common pattern is checking whether any true or false value exists — e.g.arr.true_count() > 0orarr.false_count() == 0. This currently requirestrue_count()/false_count(), which scan the entire bitmap to count every set bit (viapopcount), even though we only need to know if at least one exists.This PR adds
has_true()andhas_false()methods that short-circuit as soon as they find a matching value, providing both:arr.has_true()expresses intent more clearly thanarr.true_count() > 0Callsites in DataFusion
There are several places in DataFusion that would benefit from these methods:
datafusion/functions-nested/src/array_has.rs—eq_array.true_count() > 0→eq_array.has_true()datafusion/physical-plan/src/topk/mod.rs—filter.true_count() == 0check →!filter.has_true()datafusion/datasource-parquet/src/metadata.rs—exactness.true_count() == 0andcombined_mask.true_count() > 0datafusion/physical-plan/src/joins/nested_loop_join.rs—bitmap.true_count() == 0checksdatafusion/physical-expr-common/src/physical_expr.rs—selection_count == 0fromselection.true_count()datafusion/physical-expr/src/expressions/binary.rs— short-circuit checks for AND/ORBenchmark Results
The key wins are on larger arrays (65,536 elements), where
has_true/has_falseare up to 16-129x faster thantrue_count()in best-case scenarios (early short-circuit). Even in worst case (must scan entire array), performance iscomparable to
true_count.Implementation
The implementation processes bits in 64-bit chunks using
UnalignedBitChunk, which handles arbitrary bit offsets and alignsdata for SIMD-friendly processing.
has_true(no nulls): OR-folds 64-bit chunks, short-circuits when any bit is sethas_false(no nulls): AND-folds 64-bit chunks, short-circuits when any bit is unset (with padding bits masked to 1)(null, value)chunks, checkingnull & value != 0(has_true) ornull & !value != 0(has_false)
Alternatives considered
true_count()but with simpler bitwise opsinstead of popcount. Marginally faster than
true_count()but misses the main optimization opportunity.self.iter().any(|v| v == Some(true)). Simple but processes one bit at atime, missing SIMD vectorization of the inner loop. Our approach processes 64 bits at a time while still supporting early
exit.
The chosen approach balances SIMD-friendly bulk processing (64 bits per iteration) with early termination, giving the best of
both worlds.
Test Plan
lengths (65 elements, 64+1 with trailing false)
has_true/has_falsevstrue_countacross sizes (64, 1024, 65536) and data distributions🤖 Generated with [Claude Code](https://claude.com/claude-code