Skip to content

Use SkipBlockRangeIterator as approximation in SortedNumericDocValuesRangeQuery#15954

Open
sgup432 wants to merge 3 commits intoapache:mainfrom
sgup432:sorted_block_range_iter_approx
Open

Use SkipBlockRangeIterator as approximation in SortedNumericDocValuesRangeQuery#15954
sgup432 wants to merge 3 commits intoapache:mainfrom
sgup432:sorted_block_range_iter_approx

Conversation

@sgup432
Copy link
Copy Markdown
Contributor

@sgup432 sgup432 commented Apr 13, 2026

Description

Replace DocValuesRangeIterator with SkipBlockRangeIterator as the two-phase approximation when a DocValuesSkipper is available. This makes the approximation block-level only (no DV decoding), deferring value reads to matches(). In
conjunctions, this avoids wasted DV decoding when another field's block is NO.

This was done as suggested in the comment here - #15770 (comment) by @romseygeek

Also includes benchmark:

Results:

JMH Benchmark Results (1M docs, ops/s, higher is better)

Pattern Fields Before (ops/s) After (ops/s) Speedup
clustered 3 15,890 18,505 1.16x
clustered 5 11,546 12,959 1.12x
mixed 3 854 937 1.10x
mixed 5 512 723 1.41x
random 3 60 68 1.13x
random 5 44 51 1.16x

Data patterns:

  • clustered: all field values increase with docID (time-series style)
  • mixed: field0 monotonic, field1 low-cardinality (20 values), rest random (e-commerce style)
  • random: all fields uniformly random, wide query ranges (worst case)

Query: BooleanQuery with N FILTER clauses, each a SortedNumericDocValuesField.newSlowRangeQuery on a distinct field with skip index enabled.

@github-actions github-actions bot added this to the 11.0.0 milestone Apr 13, 2026
Copy link
Copy Markdown
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sgup432, this is a great start. I think we can improve things further by caching the docid of the end of a block and avoiding comparisons on every match call for fully matching blocks.

new TwoPhaseIterator(skipApprox) {
@Override
public boolean matches() throws IOException {
int blockMatch = classifyBlock();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we cache this somehow so that we're not doing the comparison on every match request? For example, if you know that the whole block matches then you can just check that the current docId() is lower than the block end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants