Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Pinecone vector search by id returns incorrect vectors. #346

Open
2 tasks done
ayansengupta17 opened this issue May 14, 2024 · 3 comments
Open
2 tasks done
Labels
enhancement New feature or request wontfix This will not be worked on

Comments

@ayansengupta17
Copy link

Is this a new bug in the Pinecone Python client?

  • I believe this is a new bug in the Pinecone Python Client
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

If you query pinecone index with a vector ID with topk=1, the returned vectors sometimes have different id. If you keep top_k > 1, sometimes the correct vector is found in positions k>1.

Expected Behavior

If I search using vector id, the whole point is to get the vector whose id matches the query. Then find other vectors with high similarity scores.

Steps To Reproduce

It's hard to provide a reproducible steps, because it happens sometimes. We see it happening a lot in our production environment. So I rather attach some relevant screenshots from the UI.
Screenshot 2024-05-11 at 21 24 33
checkour more examples https://community.pinecone.io/t/bug-pinecone-search-by-id-is-returning-incorrect-result/5554

Relevant log output

check https://community.pinecone.io/t/bug-pinecone-search-by-id-is-returning-incorrect-result/5554

Environment

- OS:
- Python:
- pinecone:

Additional Context

No response

@ayansengupta17 ayansengupta17 added the bug Something isn't working label May 14, 2024
@zackproser
Copy link
Contributor

Hi @ayansengupta17,

Thank you for your post, and thank you for taking the time to get screenshots and file an issue on GitHub.

I’ve discussed this with the relevant teams to double-check, but this is actually not a bug!

Please see our guide on the Limitations of querying by ID to understand why this is happening.

If you want to ensure your results contain the vector you’re requesting by ID, you can use fetch instead, as outlined here.

I hope this helps!

Best,
Zack

@ayansengupta17
Copy link
Author

@zackproser Thanks for pointing to the documentation. That was really helpful. I want to suggest two things here

  • When a user is querying vector by an ID it is expected behaviour to get that particulat vector as the first hit and then the nearestest neighbours as other hits.
  • If I search by an ID that doesn't exist in the database, we get the response that the ID doesn't exists. That indicates, when you search by an ID that exixts, you should have the ID in the results. Otherwise the product design is not consistent.

@likid1412
Copy link

  • When a user is querying vector by an ID it is expected behaviour to get that particulat vector as the first hit and then the nearestest neighbours as other hits.

I think @ayansengupta17 is right, it should be the first hit when query with the ID. Below is what I'm using in another vector database, it return the ID in first hit as I expected

image

ref: 向量数据库 基于 Doc ID 相似度检索-SDK 参考-文档中心-腾讯云

@anawishnoff anawishnoff added enhancement New feature or request status:needs-triage An issue that needs to be triaged by the Pinecone team wontfix This will not be worked on and removed bug Something isn't working status:needs-triage An issue that needs to be triaged by the Pinecone team labels Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

4 participants