[WIP][SPARK-54179][SQL] Add Native Support for Apache Tuple Sketches #52883

cboumalh · 2025-11-05T00:42:45Z

What changes were proposed in this pull request?

Implement support for tuple sketches in Apache Spark to enable approximate set cardinality, frequency, and similarity computations over multiple dimensions efficiently

Why are the changes needed?

Spark currently lacks support for tuple sketches, which allow efficient approximate computations over key–value data.
These changes add tuple sketch support to enable fast and memory-efficient estimates of distinct counts, frequencies, and set similarities across multiple dimensions.

Does this PR introduce any user-facing change?

Yes

How was this patch tested?

WIP

Was this patch authored or co-authored using generative AI tooling?

Yes

cboumalh · 2025-11-05T00:43:24Z

cc @dtenedor @mkaravel @gengliangwang (still WIP)

[WIP][SPARK-54179][SQL] Add Native Support for Apache Tuple Sketches

47e1470

cboumalh marked this pull request as draft November 5, 2025 00:42

github-actions bot added the SQL label Nov 5, 2025

Chris Boumalhab added 3 commits November 5, 2025 02:59

format

d2d1300

aggregate functions

38e7c45

fix

c7067a0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP][SPARK-54179][SQL] Add Native Support for Apache Tuple Sketches #52883

[WIP][SPARK-54179][SQL] Add Native Support for Apache Tuple Sketches #52883

Uh oh!

cboumalh commented Nov 5, 2025

Uh oh!

cboumalh commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[WIP][SPARK-54179][SQL] Add Native Support for Apache Tuple Sketches #52883

Are you sure you want to change the base?

[WIP][SPARK-54179][SQL] Add Native Support for Apache Tuple Sketches #52883

Uh oh!

Conversation

cboumalh commented Nov 5, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

cboumalh commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant