feat: SIMD sum vector aggregators (Long/Double/Float) by clintropolis · Pull Request #19561 · apache/druid

clintropolis · 2026-06-05T00:46:59Z

Description

Follow-up to #19512, this PR adds opt-in jdk.incubator.vector specializations of LongSumVectorAggregator, DoubleSumVectorAggregator, and FloatSumVectorAggregator's ungrouped aggregate(buf, position, startRow, endRow) path. This functionality is controlled by the same druid.expressions.useVectorApi flag introduced for expression vector processors in the prior PR (off by default).

Shows a pretty nice improvement:

// 0: non-expression timeseries reference, 1 columns
"SELECT SUM(long1) FROM expressions",
// 4: non-expression timeseries reference, 5 columns
"SELECT SUM(long1), SUM(long4), SUM(double1), SUM(float3), SUM(long5) FROM expressions",
// 7: math op - 2 longs
"SELECT SUM(long1 * long2) FROM expressions",
// 11: all same math op - 3 longs, 1 double, 1 float
"SELECT SUM(long5 * float3 * long1 * long4 * double1) FROM expressions",

Benchmark                        (complexCompression)  (deferExpressionDimensions)  (jsonObjectStorageEncoding)  (query)  (rowsPerSegment)  (schemaType)  (storageType)  (stringEncoding)  (useVectorApi)  (vectorize)  Mode  Cnt   Score   Error  Units
SqlExpressionBenchmark.querySql                  NONE                 singleString                        SMILE        0           1500000      explicit           MMAP              UTF8           false        force  avgt    5   5.793 ± 0.554  ms/op
SqlExpressionBenchmark.querySql                  NONE                 singleString                        SMILE        0           1500000      explicit           MMAP              UTF8            true        force  avgt    5   5.270 ± 0.264  ms/op
SqlExpressionBenchmark.querySql                  NONE                 singleString                        SMILE        4           1500000      explicit           MMAP              UTF8           false        force  avgt    5  32.833 ± 0.879  ms/op
SqlExpressionBenchmark.querySql                  NONE                 singleString                        SMILE        4           1500000      explicit           MMAP              UTF8            true        force  avgt    5  25.135 ± 0.905  ms/op
SqlExpressionBenchmark.querySql                  NONE                 singleString                        SMILE        7           1500000      explicit           MMAP              UTF8           false        force  avgt    5  14.073 ± 0.602  ms/op
SqlExpressionBenchmark.querySql                  NONE                 singleString                        SMILE        7           1500000      explicit           MMAP              UTF8            true        force  avgt    5   9.602 ± 0.362  ms/op
SqlExpressionBenchmark.querySql                  NONE                 singleString                        SMILE       11           1500000      explicit           MMAP              UTF8           false        force  avgt    5  61.997 ± 1.508  ms/op
SqlExpressionBenchmark.querySql                  NONE                 singleString                        SMILE       11           1500000      explicit           MMAP              UTF8            true        force  avgt    5  25.191 ± 0.953  ms/op

I changed query 0 to be sum on long5 which has nulls which shows a nicer improvement than the original 0 on long1 which has no nulls.

Benchmark                        (complexCompression)  (deferExpressionDimensions)  (jsonObjectStorageEncoding)  (query)  (rowsPerSegment)  (schemaType)  (storageType)  (stringEncoding)  (useVectorApi)  (vectorize)  Mode  Cnt   Score   Error  Units
SqlExpressionBenchmark.querySql                  NONE                 singleString                        SMILE        0           1500000      explicit           MMAP              UTF8           false        force  avgt    5  16.936 ± 1.073  ms/op
SqlExpressionBenchmark.querySql                  NONE                 singleString                        SMILE        0           1500000      explicit           MMAP              UTF8            true        force  avgt    5  10.676 ± 0.752  ms/op

changes:

add NullAwareVectorAggregator marker interface declaring aggregate(buf, position, startRow, endRow, nullVector) for delegates that handle nulls themselves; return value reports whether any non-null row contributed.
update NullableNumericVectorAggregator.aggregate(buf, position, startRow, endRow) to a three-way dispatch: null-free fast path; instanceof NullAwareVectorAggregator -> new null-aware overload (set null marker iff it returned true); else existing scatter-gather fallback.
add SimdLongSumVectorAggregator, SimdDoubleSumVectorAggregator, and SimdFloatSumVectorAggregator under query/aggregation/simd/. Each extends its scalar parent, overrides the ungrouped no-null aggregate with a va.add(vb) + reduceLanes(VectorOperators.ADD) SIMD reduction, and implements the null-aware overload using VectorMask-based masked accumulation with notNull.trueCount() for the non-null check.
wire Long/Double/FloatSumAggregatorFactory.factorizeVector to dispatch on ExpressionProcessing.useVectorApi().
add SimdSumVectorAggregatorTest, for each of the three types, tries various vector sizes and null patterns and asserts SIMD output matches the scalar reference (exact for long, within relative tolerance for double/float to accommodate SIMD's tree-reduce vs scalar's left-to-right reduce).

changes: * add `NullAwareVectorAggregator` marker interface declaring `aggregate(buf, position, startRow, endRow, nullVector)` for delegates that handle nulls themselves; return value reports whether any non-null row contributed. * update `NullableNumericVectorAggregator.aggregate(buf, position, startRow, endRow)` to a three-way dispatch: null-free fast path; `instanceof NullAwareVectorAggregator` -> new null-aware overload (set null marker iff it returned true); else existing scatter-gather fallback. * add `SimdLongSumVectorAggregator`, `SimdDoubleSumVectorAggregator`, and `SimdFloatSumVectorAggregator` under `query/aggregation/simd/`. Each extends its scalar parent, overrides the ungrouped no-null aggregate with a `va.add(vb)` + `reduceLanes(VectorOperators.ADD)` SIMD reduction, and implements the null-aware overload using `VectorMask`-based masked accumulation with `notNull.trueCount()` for the non-null check. * wire `Long/Double/FloatSumAggregatorFactory.factorizeVector` to dispatch on `ExpressionProcessing.useVectorApi()`. * add `SimdSumVectorAggregatorTest`, for each of the three types, tries various vector sizes and null patterns and asserts SIMD output matches the scalar reference (exact for long, within relative tolerance for double/float to accommodate SIMD's tree-reduce vs scalar's left-to-right reduce).

FrankChen021

I have reviewed the code for correctness, edge cases, concurrency, and integration risks; no issues found.

Reviewed 11 of 11 changed files.

This is an automated review by Codex GPT-5.5

gianm

In addition to the line comments, a documentation note: I noticed that druid.expressions.useVectorApi appears once in the docs, just in docs/operations/java.md. It should also be on the main docs/configuration/index.md. Actually I think all of the ExpressionProcessingConfig stuff is missing. They should all be added.

gianm · 2026-06-05T17:49:16Z

+      final ByteBuffer simdBuf = ByteBuffer.allocate(Float.BYTES);
+      scalar.init(scalarBuf, 0);
+      simd.init(simdBuf, 0);
+      scalar.aggregate(scalarBuf, 0, 0, size);


all the tests seem to use buf, 0, 0, size as parameters. Please improve coverage by varying the position, startRow, and endRow. It should be ok to do one call with buf, 0, 0, size and another with buf, 1, 1, size + 1 (and put some junk in the first row of the selector that the startRow of 1 is meant to ignore).

gianm · 2026-06-05T17:51:13Z

+ *     {@link org.apache.druid.query.aggregation.NullAwareVectorAggregator}), compared against a manually-computed
+ *     reference sum.
+ */
+public class SimdSumVectorAggregatorTest extends InitializedNullHandlingTest


For test coverage: add useVectorApi as a parameter in some of the parameterized tests? Maybe the timeseries tests and groupBy tests.

gianm · 2026-06-05T17:51:38Z

      // ===========================
      // 0: non-expression timeseries reference, 1 columns
-      "SELECT SUM(long1) FROM expressions",
+      "SELECT SUM(long5) FROM expressions",


why this change?

oh i can revert it, its mentioned in the PR description that I changed it because long1 has no nulls and long5 has a bunch of them, so they exercise different paths

jtuglu1 · 2026-06-05T21:00:57Z

within relative tolerance for double/float to accommodate SIMD's tree-reduce vs scalar's left-to-right reduce).

Does this error accumulate any different than normal, non-vectorized instructions? In other words, do we risk returning results with more/less error than non-vectorized aggregators?

clintropolis · 2026-06-05T21:42:13Z

Does this error accumulate any different than normal, non-vectorized instructions? In other words, do we risk returning results with more/less error than non-vectorized aggregators?

I wasn't that sure tbh, the internet tells me that there should be significantly less error than our standard aggregators (relevant arxiv link from img https://arxiv.org/pdf/2107.01604):

github-actions Bot added the Area - Dependencies label Jun 5, 2026

clintropolis added the Performance label Jun 5, 2026

FrankChen021 reviewed Jun 5, 2026

View reviewed changes

gianm reviewed Jun 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: SIMD sum vector aggregators (Long/Double/Float)#19561

feat: SIMD sum vector aggregators (Long/Double/Float)#19561
clintropolis wants to merge 1 commit into
apache:masterfrom
clintropolis:simd-vector-aggs

clintropolis commented Jun 5, 2026

Uh oh!

FrankChen021 left a comment

Uh oh!

gianm left a comment

Uh oh!

gianm Jun 5, 2026

Uh oh!

gianm Jun 5, 2026

Uh oh!

gianm Jun 5, 2026

Uh oh!

clintropolis Jun 5, 2026

Uh oh!

jtuglu1 commented Jun 5, 2026

Uh oh!

clintropolis commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

clintropolis commented Jun 5, 2026

Description

Uh oh!

FrankChen021 left a comment

Choose a reason for hiding this comment

Uh oh!

gianm left a comment

Choose a reason for hiding this comment

Uh oh!

gianm Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

gianm Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

gianm Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

clintropolis Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

jtuglu1 commented Jun 5, 2026

Uh oh!

clintropolis commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants