Skip to content

Add vector distance, array math, and array aggregate functions #21536

@crm26

Description

@crm26

Summary

This issue tracks adding vector math and array aggregate scalar functions to DataFusion. These close gaps versus DuckDB and LanceDB for vector search and array analytics workloads.

Replaces #21371 and #21376, which were requested to be split into function-per-PR submissions (per @alamb's review).

Functions

Vector math (with shared vector_math.rs primitives)

Function Signature Reference
cosine_distance (array, array) → float64 DuckDB array_cosine_similarity
inner_product (array, array) → float64 DuckDB array_inner_product
array_normalize (array) → array NumPy / scipy convention

Array element-wise math

Function Signature Reference
array_add (array, array) → array Element-wise addition
array_subtract (array, array) → array Element-wise subtraction
array_scale (array, scalar) → array Scalar multiply

Array aggregate scalars

Function Signature Reference
array_sum / list_sum (array) → numeric DuckDB list_sum
array_product / list_product (array) → numeric DuckDB list_product
array_avg / list_avg (array) → float64 DuckDB list_avg

Alias fix

Fix Description
list_min Missing alias on ArrayMin (parity with existing list_max on ArrayMax)

Submission plan

One PR per function, submitted serially. Each PR will reference this issue.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions