Skip to content

[common] Accelerate endsWith and like '%x' with reverse btree global index#8371

Open
ArnavBalyan wants to merge 1 commit into
apache:masterfrom
ArnavBalyan:arnavb/btree-suffix-support
Open

[common] Accelerate endsWith and like '%x' with reverse btree global index#8371
ArnavBalyan wants to merge 1 commit into
apache:masterfrom
ArnavBalyan:arnavb/btree-suffix-support

Conversation

@ArnavBalyan

@ArnavBalyan ArnavBalyan commented Jun 28, 2026

Copy link
Copy Markdown
Member

Purpose:

  • Introduce a reverse btree global index:
    • This can be used to serve variety of queries using endsWith and LIKE '%abc' patterns.
    • Stores keys which are reversed (using ReversedKeySerializer), allowing a suffix query to be converted into a prefix scan of the reversed values.
  • ReverseLazyFilteredBTreeReader routes endsWith and like queries through the existing prefix scan, and uses min/max pruning instead of full scan.
  • Reuses the existing btree file format (only key bytes are reversed). In the future more complex like clauses can be offloaded in combination with the straight btree index.

Tests

  • Unit Tests

@ArnavBalyan

Copy link
Copy Markdown
Member Author

cc @JingsongLi @leaves12138 thanks! :)

String v = "row" + rnd.nextInt(1_000_000) + suffixes[rnd.nextInt(suffixes.length)];
data.add(Pair.of(BinaryString.fromString(v), (long) i));
}
data.sort((a, b) -> cmp.compare(a.getKey(), b.getKey()));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Please cover the production build path before registering this index type. This test writes a valid reverse-btree only because it sorts with the reversed-key comparator here. The real create_global_index paths do not do that: reverse-btree is not in the Spark/Flink SortedIndexTopoBuilder support lists, so it falls through to the default/generic builders, and those pass rows to BTreeIndexWriter in scan order; even SortedGlobalIndexBuilder sorts by the original index field, not the reversed bytes. Since SstFileWriter/BTreeIndexWriter require keys to be monotonically increasing in the writer comparator, an index built through the normal procedure can produce an incorrectly ordered SST and wrong suffix lookups. Please wire reverse-btree into a builder that sorts by ReversedKeySerializer ordering and add an integration test for the procedure path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants