Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: optimize MatchHashesAndScoreQuery for case where a hash occurs once (96% recall at 190 qps) #612

Merged
merged 6 commits into from
Nov 30, 2023

Conversation

alexklibisz
Copy link
Owner

@alexklibisz alexklibisz commented Nov 29, 2023

Related Issue

#611

Changes

There is one LSH model (PermutationLsh) which can emit the same hash multiple times. Because of this, MatchHashesAndScoreQuery has to account for the fact that the same hash can occur multiple times. Through some trial and error, I figured out that this actually has a measurable impact on performance.

So this PR adds an optimized case to MatchHashesAndScoreQuery for hashes that occur once.

This gets the 96% recall up to 190 qps, only 10 qps away from the goal of #611.

image

(One of the benchmarks was as high as 195 qps, but I decided to advertise the lower value)

Testing and Validation

Standard CI and benchmarking

@alexklibisz alexklibisz changed the title Performance: optimize MatchHashesAndScoreQuery for case where a hash occurs once Performance: optimize MatchHashesAndScoreQuery for case where a hash occurs once (195 qps at 95% recall) Nov 29, 2023
@alexklibisz alexklibisz changed the title Performance: optimize MatchHashesAndScoreQuery for case where a hash occurs once (195 qps at 95% recall) Performance: optimize MatchHashesAndScoreQuery for case where a hash occurs once (190 qps at 95% recall) Nov 30, 2023
@alexklibisz alexklibisz changed the title Performance: optimize MatchHashesAndScoreQuery for case where a hash occurs once (190 qps at 95% recall) Performance: optimize MatchHashesAndScoreQuery for case where a hash occurs once (96% recall at 190 qps) Nov 30, 2023
@alexklibisz alexklibisz marked this pull request as ready for review November 30, 2023 05:38
@alexklibisz alexklibisz merged commit 504589b into main Nov 30, 2023
5 checks passed
@alexklibisz alexklibisz deleted the optimize-single-freq branch November 30, 2023 05:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant