Skip to content

Conversation

@jerry-024
Copy link
Contributor

@jerry-024 jerry-024 commented Jan 4, 2026

Purpose

Spark support vector search:

SELECT * FROM vector_search([table_name], [vector_column_name], array(50.0f, 51.0f, 52.0f), 5)

Tests

  • VectorSearchPushDownTest
  • BaseVectorSearchPushDownTest

API and Format

Documentation

@jerry-024 jerry-024 marked this pull request as draft January 4, 2026 07:29
@JingsongLi
Copy link
Contributor

https://docs.databricks.com/aws/en/sql/language-manual/functions/vector_search

Introduce something like databricks.

@jerry-024 jerry-024 force-pushed the spark_vector_index_read branch from 5fafa65 to d2138d1 Compare January 5, 2026 01:46
@jerry-024 jerry-024 closed this Jan 5, 2026
@jerry-024 jerry-024 reopened this Jan 5, 2026
@jerry-024 jerry-024 force-pushed the spark_vector_index_read branch from d2138d1 to 4fe7397 Compare January 5, 2026 02:00
@jerry-024 jerry-024 force-pushed the spark_vector_index_read branch from 4fe7397 to d29b44a Compare January 5, 2026 07:00

This comment was marked as resolved.

@jerry-024 jerry-024 force-pushed the spark_vector_index_read branch from 29aefb6 to 2e1eacd Compare January 6, 2026 03:12
@jerry-024 jerry-024 force-pushed the spark_vector_index_read branch from 2e1eacd to 2236d3d Compare January 6, 2026 03:34
@jerry-024 jerry-024 marked this pull request as ready for review January 6, 2026 03:35
Copy link
Contributor

@Zouxxyy Zouxxyy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@JingsongLi JingsongLi merged commit 708ea8f into apache:master Jan 6, 2026
23 checks passed
@jerry-024 jerry-024 deleted the spark_vector_index_read branch January 6, 2026 05:14
jerry-024 added a commit to jerry-024/paimon that referenced this pull request Jan 6, 2026
* upstream/master: (35 commits)
  [spark] Spark support vector search (apache#6950)
  [doc] update Apache Doris document with DLF 3.0 (apache#6954)
  [variant] Fix reading empty shredded variant via variantAccess (apache#6953)
  [python] support alterTable (apache#6952)
  [python] support ray data sink to paimon (apache#6883)
  [python] Rename to TableScan.withSlice to specific start_pos and end_pos
  [python] sync to_ray method args with ray data api (apache#6948)
  [python] light refactor for stats collect (apache#6941)
  [doc] Update cdc ingestion related docs
  [rest] Add tagNamePrefix definition for listTagsPaged (apache#6947)
  [python] support table scan with row range (apache#6944)
  [spark] Fix EqualNullSafe is not correct when column has null value. (apache#6943)
  [python] fix value_stats containing system fields for primary key tables (apache#6945)
  [test][rest] add test case for two sessions with cache for rest commitTable (apache#6438)
  [python] do not retry for connect exception in rest (apache#6942)
  [spark] Fix read shredded and unshredded variant both (apache#6936)
  [python] Let Python write file without value stats by default (apache#6940)
  [python] ray version compatible (apache#6937)
  [core] Unify conflict detect in FileStoreCommitImpl (apache#6932)
  [test] Fix unstable case in CompactActionITCase
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants