Skip to content

Add specific BuildScoreProvider for diversity to avoid extra encoding… #503

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 23, 2025

Conversation

tjake
Copy link
Member

@tjake tjake commented Jul 18, 2025

… and decoding of nodes

For a JMH just benchmarking the diversity calculation this is a huge win

Before:

PQDistanceCalculationBenchmark.distanceCalculation          0         1536           100          10000  avgt    5   418.095 ±  4.628  ms/op
PQDistanceCalculationBenchmark.distanceCalculation         16         1536           100          10000  avgt    5   940.306 ±  2.556  ms/op
PQDistanceCalculationBenchmark.distanceCalculation         64         1536           100          10000  avgt    5  1214.263 ± 70.999  ms/op
PQDistanceCalculationBenchmark.distanceCalculation        192         1536           100          10000  avgt    5  2019.785 ± 67.312  ms/op
Benchmark                                           (M)  (dimension)  (queryCount)  (vectorCount)  Mode  Cnt    Score   Error  Units
PQDistanceCalculationBenchmark.distanceCalculation    0         1536           100          10000  avgt    5  417.770 ± 3.297  ms/op
PQDistanceCalculationBenchmark.distanceCalculation   16         1536           100          10000  avgt    5  261.959 ± 3.048  ms/op
PQDistanceCalculationBenchmark.distanceCalculation   64         1536           100          10000  avgt    5  376.058 ± 3.726  ms/op
PQDistanceCalculationBenchmark.distanceCalculation  192         1536           100          10000  avgt    5  666.985 ± 8.505  ms/op

For actual Graph Build using PQ diversity this is more like 25% boost

Before

Benchmark                                                    (numBaseVectors)  (numberOfPQSubspaces)  (originalDimension)  Mode  Cnt      Score      Error  Units
IndexConstructionWithRandomSetBenchmark.buildIndexBenchmark             10000                     48                  384  avgt    3   2998.352 ±  292.077  ms/op
IndexConstructionWithRandomSetBenchmark.buildIndexBenchmark            100000                     48                  384  avgt    3  30923.062 ± 1689.046  ms/op

After

Benchmark                                                    (numBaseVectors)  (numberOfPQSubspaces)  (originalDimension)  Mode  Cnt      Score      Error  Units
IndexConstructionWithRandomSetBenchmark.buildIndexBenchmark             10000                     48                  384  avgt    3   2370.760 ±  230.455  ms/op
IndexConstructionWithRandomSetBenchmark.buildIndexBenchmark            100000                     48                  384  avgt    3  25302.423 ± 2256.221  ms/op

int numBaseVectors;
@Param({"0", "16"})
@Param({"48"})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"0" is the test permutation that uses FP vectors as baseline, I think you want to keep that on in the list of parameters.
While commenting the parameters in the code is a sin I'm also fully guilty of... :)
if you want to avoid forgetting to uncommenting them back can also use the command line to set them when wanting to make a change in the following way (from the README.md):

-p <param>=<value> - Benchmark parameters

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure this was accidental commit

@@ -41,7 +42,7 @@
* Benchmark that compares the distance calculation of Product Quantized vectors vs full precision vectors.
*/
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good change!


for (int q = 0; q < queryCount; q++) {
for (int i = 0; i < vectorCount; i++) {
final ScoreFunction sf = buildScoreProvider.diversityProviderFor(i).scoreFunction();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test is not the same as the original.
Query vectors are random and not a derivation of the original dataset. Might want to change that to be similar and re-test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diversity provider is for querying the local graph relative to another point in the graph. So maybe I should just change the name of the benchmark?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, so I misunderstood in that case. It sounds like it's ok to just leave as is.
Approving, just remember to uncomment the test permutations :)

Copy link
Collaborator

@sam-herman sam-herman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, just remember to uncomment test permutations before merging.
thanks for the change!

@tjake tjake merged commit c5c3ff9 into main Jul 23, 2025
8 checks passed
@tjake tjake deleted the diversity-perf branch July 23, 2025 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants