Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Vector index placeholder costing #26588

Open
1 task done
tanujnay112 opened this issue Mar 28, 2025 · 0 comments
Open
1 task done

[YSQL] Vector index placeholder costing #26588

tanujnay112 opened this issue Mar 28, 2025 · 0 comments
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue status/awaiting-triage Issue awaiting triage

Comments

@tanujnay112
Copy link
Contributor

tanujnay112 commented Mar 28, 2025

Jira Link: DB-15959

Description

Currently the costing function ybvectorcostestimate just returns without doing anything. It leaves its parameters untouched which has its caller here dealing with uninitialized data, leading to potentially undefined behavior during planning. This is reproducible on an LTO build on the latest master.

create extension vector;
CREATE TABLE public.pg_vector_collection (
    id bigint PRIMARY KEY,
    embedding vector(3)
);

CREATE INDEX pgvector_index 
ON public.pg_vector_collection USING ybhnsw (embedding vector_cosine_ops);

explain select * from pg_vector_collection order by embedding <=> '[1,1,1]';
                                   QUERY PLAN                                   
--------------------------------------------------------------------------------
 Sort  (cost=152.33..154.83 rows=1000 width=48)
   Sort Key: ((embedding <=> '[1,1,1]'::vector))
   ->  Seq Scan on pg_vector_collection  (cost=0.00..102.50 rows=1000 width=48)
(3 rows)

explain select * from pg_vector_collection order by embedding <=> '[1,1,1]';
                                          QUERY PLAN                                           
-----------------------------------------------------------------------------------------------
 Index Scan using pgvector_index on pg_vector_collection  (cost=0.00..6.51 rows=1000 width=48)
   Order By: (embedding <=> '[1,1,1]'::vector)
(2 rows)

Debugging this made it appear on the sequential scan run that the index scan was costed at more than 10e200.

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@tanujnay112 tanujnay112 added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Mar 28, 2025
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Mar 28, 2025
tanujnay112 pushed a commit that referenced this issue Mar 31, 2025
Summary: This change adds generic costing logic to the vector index costing function. Before this change vector index scan costs were left uninitialized, leading to undefined behavior shown in the GHI.

Test Plan:
Jenkins
Manually tested the GHI scenario on an LTO build

Reviewers: kramanathan, telgersma

Reviewed By: telgersma

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D42828
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue status/awaiting-triage Issue awaiting triage
Projects
None yet
Development

No branches or pull requests

2 participants