-
Notifications
You must be signed in to change notification settings - Fork 51
Try smaller segment length in binary fuse build #50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We add a test that shows the range of sizes and segment counts for each segment length. We also add a test that checks filter generation at "boundary" sizes in terms of segment lengths. The test prints the average and max number of iterations for each tested size. Output with numTrials=100: ``` size: 2 iterations: 1.02 avg (2 max) size: 8 iterations: 1.02 avg (2 max) size: 24 iterations: 1.13 avg (3 max) size: 27 iterations: 1.02 avg (2 max) size: 55 iterations: 1.02 avg (2 max) size: 91 iterations: 1.04 avg (3 max) size: 120 iterations: 1.00 avg (1 max) size: 303 iterations: 1.09 avg (3 max) size: 349 iterations: 1.04 avg (2 max) size: 1009 iterations: 1.02 avg (2 max) size: 1124 iterations: 1.13 avg (2 max) size: 3361 iterations: 1.03 avg (3 max) size: 3551 iterations: 9.45 avg (42 max) size: 11192 iterations: 1.03 avg (2 max) size: 11521 iterations: 109.79 avg (528 max) size: 37272 iterations: 1.00 avg (1 max) size: 37454 iterations: 15.42 avg (70 max) size: 124117 iterations: 1.02 avg (2 max) size: 126131 iterations: 1.70 avg (6 max) size: 413309 iterations: 1.01 avg (2 max) size: 416077 iterations: 1.83 avg (6 max) size: 1376321 iterations: 1.00 avg (1 max) ```
2317f0f to
f529bf5
Compare
|
One point to take into consideration is that if your application is to create a set of 100 or 1000 elements, these probabilistic filters are probably not worth the effort. I am not opposed to changing the formula but we should be clear on the objectives. Ping @thomasmueller |
Some sizes around segment length transitions require many iterations and would work much better with the previous segment length. We add a simple fix that is more robust than tweaking the formula: once every four iterations, we try the previous segment length while keeping the same capacity. Note that in most cases this won't affect the build because it's rare to need more than 1-2 iterations. `TestBinaryFuseBoundarySizes` output (with numTrials=100): ``` binaryfusefilter_test.go:490: size: 2 iterations: 1.02 avg (2 max) binaryfusefilter_test.go:490: size: 8 iterations: 1.08 avg (3 max) binaryfusefilter_test.go:490: size: 24 iterations: 1.08 avg (3 max) binaryfusefilter_test.go:490: size: 27 iterations: 1.03 avg (2 max) binaryfusefilter_test.go:490: size: 55 iterations: 1.02 avg (2 max) binaryfusefilter_test.go:490: size: 91 iterations: 1.02 avg (2 max) binaryfusefilter_test.go:490: size: 120 iterations: 1.04 avg (2 max) binaryfusefilter_test.go:490: size: 303 iterations: 1.04 avg (2 max) binaryfusefilter_test.go:490: size: 349 iterations: 1.01 avg (2 max) binaryfusefilter_test.go:490: size: 1009 iterations: 1.01 avg (2 max) binaryfusefilter_test.go:490: size: 1124 iterations: 1.16 avg (4 max) binaryfusefilter_test.go:490: size: 3361 iterations: 1.03 avg (2 max) binaryfusefilter_test.go:490: size: 3551 iterations: 2.05 avg (6 max) binaryfusefilter_test.go:490: size: 11192 iterations: 1.04 avg (3 max) binaryfusefilter_test.go:490: size: 11521 iterations: 2.10 avg (6 max) binaryfusefilter_test.go:490: size: 37272 iterations: 1.01 avg (2 max) binaryfusefilter_test.go:490: size: 37454 iterations: 2.09 avg (6 max) binaryfusefilter_test.go:490: size: 124117 iterations: 1.03 avg (2 max) binaryfusefilter_test.go:490: size: 126131 iterations: 1.53 avg (4 max) binaryfusefilter_test.go:490: size: 413309 iterations: 1.00 avg (1 max) binaryfusefilter_test.go:490: size: 416077 iterations: 1.50 avg (4 max) binaryfusefilter_test.go:490: size: 1376321 iterations: 1.02 avg (3 max) ```
f529bf5 to
87e37e6
Compare
I agree. I am experimenting with binary fuse filters for Pebble (https://github.com/cockroachdb/pebble), where in a typical LSM we would see ~40K keys in the upper LSM level files and ~600K keys in the lowest level files. But these numbers could be very different for various workloads and I don't want to have a separate code path if we happen to have a smaller set of keys. |
|
Merged. I will release. |
|
@RaduBerinde You may want to check this link https://gihub.com/cockroachdb/pebble It is likely not pointing at what you expect. |
Improve binary fuse parameter testing
We add a test that shows the range of sizes and segment counts for
each segment length.
We also add a test that checks filter generation at "boundary" sizes
in terms of segment lengths. The test prints the average and max
number of iterations for each tested size. Output with numTrials=100:
Try smaller segment length in binary fuse build
Some sizes around segment length transitions require many iterations
and would work much better with the previous segment length.
We add a simple fix that is more robust than tweaking the formula:
once every four iterations, we try the previous segment length while
keeping the same capacity. Note that in most cases this won't affect
the build because it's rare to need more than 1-2 iterations.
TestBinaryFuseBoundarySizesoutput (with numTrials=100):Informs #23, #24