Add specific BuildScoreProvider for diversity to avoid extra encoding… #503

tjake · 2025-07-18T09:39:44Z

… and decoding of nodes

For a JMH just benchmarking the diversity calculation this is a huge win

Before:

PQDistanceCalculationBenchmark.distanceCalculation          0         1536           100          10000  avgt    5   418.095 ±  4.628  ms/op
PQDistanceCalculationBenchmark.distanceCalculation         16         1536           100          10000  avgt    5   940.306 ±  2.556  ms/op
PQDistanceCalculationBenchmark.distanceCalculation         64         1536           100          10000  avgt    5  1214.263 ± 70.999  ms/op
PQDistanceCalculationBenchmark.distanceCalculation        192         1536           100          10000  avgt    5  2019.785 ± 67.312  ms/op

Benchmark                                           (M)  (dimension)  (queryCount)  (vectorCount)  Mode  Cnt    Score   Error  Units
PQDistanceCalculationBenchmark.distanceCalculation    0         1536           100          10000  avgt    5  417.770 ± 3.297  ms/op
PQDistanceCalculationBenchmark.distanceCalculation   16         1536           100          10000  avgt    5  261.959 ± 3.048  ms/op
PQDistanceCalculationBenchmark.distanceCalculation   64         1536           100          10000  avgt    5  376.058 ± 3.726  ms/op
PQDistanceCalculationBenchmark.distanceCalculation  192         1536           100          10000  avgt    5  666.985 ± 8.505  ms/op

For actual Graph Build using PQ diversity this is more like 25% boost

Before

Benchmark                                                    (numBaseVectors)  (numberOfPQSubspaces)  (originalDimension)  Mode  Cnt      Score      Error  Units
IndexConstructionWithRandomSetBenchmark.buildIndexBenchmark             10000                     48                  384  avgt    3   2998.352 ±  292.077  ms/op
IndexConstructionWithRandomSetBenchmark.buildIndexBenchmark            100000                     48                  384  avgt    3  30923.062 ± 1689.046  ms/op

After

Benchmark                                                    (numBaseVectors)  (numberOfPQSubspaces)  (originalDimension)  Mode  Cnt      Score      Error  Units
IndexConstructionWithRandomSetBenchmark.buildIndexBenchmark             10000                     48                  384  avgt    3   2370.760 ±  230.455  ms/op
IndexConstructionWithRandomSetBenchmark.buildIndexBenchmark            100000                     48                  384  avgt    3  25302.423 ± 2256.221  ms/op

… and decoding of nodes

sam-herman · 2025-07-18T22:58:50Z

...h/src/main/java/io/github/jbellis/jvector/bench/IndexConstructionWithRandomSetBenchmark.java

    int numBaseVectors;
-    @Param({"0", "16"})
+    @Param({"48"})


"0" is the test permutation that uses FP vectors as baseline, I think you want to keep that on in the list of parameters.
While commenting the parameters in the code is a sin I'm also fully guilty of... :)
if you want to avoid forgetting to uncommenting them back can also use the command line to set them when wanting to make a change in the following way (from the README.md):

-p <param>=<value> - Benchmark parameters

Sure this was accidental commit

sam-herman · 2025-07-18T22:59:09Z

...hmarks-jmh/src/main/java/io/github/jbellis/jvector/bench/PQDistanceCalculationBenchmark.java

@@ -41,7 +42,7 @@
 * Benchmark that compares the distance calculation of Product Quantized vectors vs full precision vectors.
 */
 @BenchmarkMode(Mode.AverageTime)
-@OutputTimeUnit(TimeUnit.MICROSECONDS)
+@OutputTimeUnit(TimeUnit.MILLISECONDS)


good change!

sam-herman · 2025-07-18T23:04:51Z

...hmarks-jmh/src/main/java/io/github/jbellis/jvector/bench/PQDistanceCalculationBenchmark.java

+
+        for (int q = 0; q < queryCount; q++) {
+            for (int i = 0; i < vectorCount; i++) {
+                final ScoreFunction sf = buildScoreProvider.diversityProviderFor(i).scoreFunction();


I think this test is not the same as the original.
Query vectors are random and not a derivation of the original dataset. Might want to change that to be similar and re-test.

The diversity provider is for querying the local graph relative to another point in the graph. So maybe I should just change the name of the benchmark?

Oh I see, so I misunderstood in that case. It sounds like it's ok to just leave as is.
Approving, just remember to uncomment the test permutations :)

sam-herman

Approving, just remember to uncomment test permutations before merging.
thanks for the change!

Add specific BuildScoreProvider for diversity to avoid extra encoding…

bf6b09f

… and decoding of nodes

tlwillke requested review from tlwillke and marianotepper July 18, 2025 14:27

sam-herman reviewed Jul 18, 2025

View reviewed changes

sam-herman approved these changes Jul 21, 2025

View reviewed changes

Review

904dba1

tjake merged commit c5c3ff9 into main Jul 23, 2025
8 checks passed

tjake deleted the diversity-perf branch July 23, 2025 13:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add specific BuildScoreProvider for diversity to avoid extra encoding… #503

Add specific BuildScoreProvider for diversity to avoid extra encoding… #503

Uh oh!

tjake commented Jul 18, 2025

Uh oh!

sam-herman Jul 18, 2025

Uh oh!

tjake Jul 21, 2025

Uh oh!

sam-herman Jul 18, 2025

Uh oh!

sam-herman Jul 18, 2025

Uh oh!

tjake Jul 21, 2025

Uh oh!

sam-herman Jul 21, 2025

Uh oh!

sam-herman left a comment

Uh oh!

Uh oh!

Uh oh!

Add specific BuildScoreProvider for diversity to avoid extra encoding… #503

Add specific BuildScoreProvider for diversity to avoid extra encoding… #503

Uh oh!

Conversation

tjake commented Jul 18, 2025

Uh oh!

sam-herman Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

tjake Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

sam-herman Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

sam-herman Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

tjake Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

sam-herman Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

sam-herman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!