Try smaller segment length in binary fuse build #50

RaduBerinde · 2026-01-07T18:40:11Z

Improve binary fuse parameter testing

We add a test that shows the range of sizes and segment counts for
each segment length.

We also add a test that checks filter generation at "boundary" sizes
in terms of segment lengths. The test prints the average and max
number of iterations for each tested size. Output with numTrials=100:

size: 2  iterations: 1.02 avg (2 max)
size: 8  iterations: 1.02 avg (2 max)
size: 24  iterations: 1.13 avg (3 max)
size: 27  iterations: 1.02 avg (2 max)
size: 55  iterations: 1.02 avg (2 max)
size: 91  iterations: 1.04 avg (3 max)
size: 120  iterations: 1.00 avg (1 max)
size: 303  iterations: 1.09 avg (3 max)
size: 349  iterations: 1.04 avg (2 max)
size: 1009  iterations: 1.02 avg (2 max)
size: 1124  iterations: 1.13 avg (2 max)
size: 3361  iterations: 1.03 avg (3 max)
size: 3551  iterations: 9.45 avg (42 max)
size: 11192  iterations: 1.03 avg (2 max)
size: 11521  iterations: 109.79 avg (528 max)
size: 37272  iterations: 1.00 avg (1 max)
size: 37454  iterations: 15.42 avg (70 max)
size: 124117  iterations: 1.02 avg (2 max)
size: 126131  iterations: 1.70 avg (6 max)
size: 413309  iterations: 1.01 avg (2 max)
size: 416077  iterations: 1.83 avg (6 max)
size: 1376321  iterations: 1.00 avg (1 max)

Try smaller segment length in binary fuse build

Some sizes around segment length transitions require many iterations
and would work much better with the previous segment length.

We add a simple fix that is more robust than tweaking the formula:
once every four iterations, we try the previous segment length while
keeping the same capacity. Note that in most cases this won't affect
the build because it's rare to need more than 1-2 iterations.

TestBinaryFuseBoundarySizes output (with numTrials=100):

binaryfusefilter_test.go:490: size: 2  iterations: 1.02 avg (2 max)
binaryfusefilter_test.go:490: size: 8  iterations: 1.08 avg (3 max)
binaryfusefilter_test.go:490: size: 24  iterations: 1.08 avg (3 max)
binaryfusefilter_test.go:490: size: 27  iterations: 1.03 avg (2 max)
binaryfusefilter_test.go:490: size: 55  iterations: 1.02 avg (2 max)
binaryfusefilter_test.go:490: size: 91  iterations: 1.02 avg (2 max)
binaryfusefilter_test.go:490: size: 120  iterations: 1.04 avg (2 max)
binaryfusefilter_test.go:490: size: 303  iterations: 1.04 avg (2 max)
binaryfusefilter_test.go:490: size: 349  iterations: 1.01 avg (2 max)
binaryfusefilter_test.go:490: size: 1009  iterations: 1.01 avg (2 max)
binaryfusefilter_test.go:490: size: 1124  iterations: 1.16 avg (4 max)
binaryfusefilter_test.go:490: size: 3361  iterations: 1.03 avg (2 max)
binaryfusefilter_test.go:490: size: 3551  iterations: 2.05 avg (6 max)
binaryfusefilter_test.go:490: size: 11192  iterations: 1.04 avg (3 max)
binaryfusefilter_test.go:490: size: 11521  iterations: 2.10 avg (6 max)
binaryfusefilter_test.go:490: size: 37272  iterations: 1.01 avg (2 max)
binaryfusefilter_test.go:490: size: 37454  iterations: 2.09 avg (6 max)
binaryfusefilter_test.go:490: size: 124117  iterations: 1.03 avg (2 max)
binaryfusefilter_test.go:490: size: 126131  iterations: 1.53 avg (4 max)
binaryfusefilter_test.go:490: size: 413309  iterations: 1.00 avg (1 max)
binaryfusefilter_test.go:490: size: 416077  iterations: 1.50 avg (4 max)
binaryfusefilter_test.go:490: size: 1376321  iterations: 1.02 avg (3 max)

Informs #23, #24

We add a test that shows the range of sizes and segment counts for each segment length. We also add a test that checks filter generation at "boundary" sizes in terms of segment lengths. The test prints the average and max number of iterations for each tested size. Output with numTrials=100: ``` size: 2 iterations: 1.02 avg (2 max) size: 8 iterations: 1.02 avg (2 max) size: 24 iterations: 1.13 avg (3 max) size: 27 iterations: 1.02 avg (2 max) size: 55 iterations: 1.02 avg (2 max) size: 91 iterations: 1.04 avg (3 max) size: 120 iterations: 1.00 avg (1 max) size: 303 iterations: 1.09 avg (3 max) size: 349 iterations: 1.04 avg (2 max) size: 1009 iterations: 1.02 avg (2 max) size: 1124 iterations: 1.13 avg (2 max) size: 3361 iterations: 1.03 avg (3 max) size: 3551 iterations: 9.45 avg (42 max) size: 11192 iterations: 1.03 avg (2 max) size: 11521 iterations: 109.79 avg (528 max) size: 37272 iterations: 1.00 avg (1 max) size: 37454 iterations: 15.42 avg (70 max) size: 124117 iterations: 1.02 avg (2 max) size: 126131 iterations: 1.70 avg (6 max) size: 413309 iterations: 1.01 avg (2 max) size: 416077 iterations: 1.83 avg (6 max) size: 1376321 iterations: 1.00 avg (1 max) ```

lemire · 2026-01-07T23:12:47Z

One point to take into consideration is that if your application is to create a set of 100 or 1000 elements, these probabilistic filters are probably not worth the effort. I am not opposed to changing the formula but we should be clear on the objectives.

Ping @thomasmueller

binaryfusefilter.go

Some sizes around segment length transitions require many iterations and would work much better with the previous segment length. We add a simple fix that is more robust than tweaking the formula: once every four iterations, we try the previous segment length while keeping the same capacity. Note that in most cases this won't affect the build because it's rare to need more than 1-2 iterations. `TestBinaryFuseBoundarySizes` output (with numTrials=100): ``` binaryfusefilter_test.go:490: size: 2 iterations: 1.02 avg (2 max) binaryfusefilter_test.go:490: size: 8 iterations: 1.08 avg (3 max) binaryfusefilter_test.go:490: size: 24 iterations: 1.08 avg (3 max) binaryfusefilter_test.go:490: size: 27 iterations: 1.03 avg (2 max) binaryfusefilter_test.go:490: size: 55 iterations: 1.02 avg (2 max) binaryfusefilter_test.go:490: size: 91 iterations: 1.02 avg (2 max) binaryfusefilter_test.go:490: size: 120 iterations: 1.04 avg (2 max) binaryfusefilter_test.go:490: size: 303 iterations: 1.04 avg (2 max) binaryfusefilter_test.go:490: size: 349 iterations: 1.01 avg (2 max) binaryfusefilter_test.go:490: size: 1009 iterations: 1.01 avg (2 max) binaryfusefilter_test.go:490: size: 1124 iterations: 1.16 avg (4 max) binaryfusefilter_test.go:490: size: 3361 iterations: 1.03 avg (2 max) binaryfusefilter_test.go:490: size: 3551 iterations: 2.05 avg (6 max) binaryfusefilter_test.go:490: size: 11192 iterations: 1.04 avg (3 max) binaryfusefilter_test.go:490: size: 11521 iterations: 2.10 avg (6 max) binaryfusefilter_test.go:490: size: 37272 iterations: 1.01 avg (2 max) binaryfusefilter_test.go:490: size: 37454 iterations: 2.09 avg (6 max) binaryfusefilter_test.go:490: size: 124117 iterations: 1.03 avg (2 max) binaryfusefilter_test.go:490: size: 126131 iterations: 1.53 avg (4 max) binaryfusefilter_test.go:490: size: 413309 iterations: 1.00 avg (1 max) binaryfusefilter_test.go:490: size: 416077 iterations: 1.50 avg (4 max) binaryfusefilter_test.go:490: size: 1376321 iterations: 1.02 avg (3 max) ```

RaduBerinde · 2026-01-08T01:02:02Z

One point to take into consideration is that if your application is to create a set of 100 or 1000 elements, these probabilistic filters are probably not worth the effort. I am not opposed to changing the formula but we should be clear on the objectives.

I agree. I am experimenting with binary fuse filters for Pebble (https://github.com/cockroachdb/pebble), where in a typical LSM we would see ~40K keys in the upper LSM level files and ~600K keys in the lowest level files. But these numbers could be very different for various workloads and I don't want to have a separate code path if we happen to have a smaller set of keys.

lemire · 2026-01-08T01:04:47Z

Merged. I will release.

lemire · 2026-01-08T01:09:30Z

@RaduBerinde You may want to check this link https://gihub.com/cockroachdb/pebble

It is likely not pointing at what you expect.

RaduBerinde force-pushed the test-seg-length branch from 2317f0f to f529bf5 Compare January 7, 2026 22:38

lemire reviewed Jan 7, 2026

View reviewed changes

binaryfusefilter.go Show resolved Hide resolved

RaduBerinde force-pushed the test-seg-length branch from f529bf5 to 87e37e6 Compare January 8, 2026 00:55

lemire mentioned this pull request Jan 8, 2026

Implement small set trick FastFilter/xor_singleheader#76

Open

lemire merged commit 9e0c9da into FastFilter:master Jan 8, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Try smaller segment length in binary fuse build #50

Try smaller segment length in binary fuse build #50

Uh oh!

RaduBerinde commented Jan 7, 2026

Uh oh!

lemire commented Jan 7, 2026

Uh oh!

Uh oh!

RaduBerinde commented Jan 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

lemire commented Jan 8, 2026

Uh oh!

lemire commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Try smaller segment length in binary fuse build #50

Try smaller segment length in binary fuse build #50

Uh oh!

Conversation

RaduBerinde commented Jan 7, 2026

Improve binary fuse parameter testing

Try smaller segment length in binary fuse build

Uh oh!

lemire commented Jan 7, 2026

Uh oh!

Uh oh!

RaduBerinde commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

lemire commented Jan 8, 2026

Uh oh!

lemire commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RaduBerinde commented Jan 8, 2026 •

edited

Loading