Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Build HNSW hierarchy from a base CAGRA graph on GPU #432

Open
cjnolet opened this issue Oct 28, 2024 · 0 comments
Open

[FEA] Build HNSW hierarchy from a base CAGRA graph on GPU #432

cjnolet opened this issue Oct 28, 2024 · 0 comments

Comments

@cjnolet
Copy link
Member

cjnolet commented Oct 28, 2024

This feature succeeds #431 and improves the performance (and likely the quality) by building the HNSW hierarchy on a base CAGRA graph on the GPU.

There are two ways in which we can construct this hierarchy (both of which are used to construct the k-means tree in Google's SCaNN algorithm):

  1. Top-down: nested (hierarchical) kmeans. I suspect we have a lot of this already in our balanced kmeans, but it hides the hierarchy away and extracts flattened clusters like HDBSCAN

  2. Bottom-up: Using agglomerative clustering: Basically it's the single-linkage + condense hierarchy steps from HDBSCAN.

We suspect that building the hierarchy after the base graph is built will improve the insertion capability for CAGRA, but it will also have a positive impact on recall, since the hierarchy will be built by taking all vectors in the index into consideration, rather than being done in a greedy manner like HNSW where the quality of the hierarchy depends on the completely random ordering of the vectors during construction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

1 participant