Skip to content

Conversation

@Iamdavidonuh
Copy link
Collaborator

This PR begins the work on #184

* `ep` is **temporarily updated** at each layer to point to the best candidate from the previous layer; the global entry point remains unchanged.
* The algorithm mimics **INSERT’s first phase** but without actually inserting a new element. You’re just finding nearest neighbors.
* It seems like ef in K-NN-SEARCH could be treated as optional, defaulting to efConstruction if the user doesn’t provide a value. This way, the search would automatically match the recall used during insertion, while still allowing users to override it for custom search precision or speed.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't seem to cover deletion of a node

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding deletion: the original HNSW paper does not describe any true node-deletion procedure, which leads me to assume full deletion isn’t supported. Given how interconnected the graph becomes, removing a node would effectively require rebuilding significant parts of the structure. The only related mechanism discussed is neighbor pruning through search-layer heuristics, not actual removal. We can mark full deletion as a TODO / requires research.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not having deletion isnt particularly tenable though but I agree we can mark as TODO

In order to merge it in though we'd need to give users the assurance that upon removing embeddings and searching through the index doesn't turn up stale vectors

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's now an algorithm for delete, check it out.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're curious this paper from microsoft outlines a pretty solid deletion strategy which ensures your search won't degrade after multiple rounds of insertions/deletions

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this reference! I’ll check it out.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nnethercott Thanks again for the reference! I went through it and it’s a solid approach for deletion.

For my implementation, I’m leaning toward a slightly different strategy: maintaining back-links for each node. This makes deletion straightforward, as we only need to remove references from incoming neighbors (back-links), keeping the operation localized. There’s a memory overhead, but for the speed and simplicity of deletions, it’s a trade-off I’m willing to accept for now.

I understand that this approach means insertions will involve more updates to other nodes, and I’m curious to see how costly that will be in practice. For now, I plan to stick with this approach and will consider reverting only if the drawbacks outweigh the benefits.

We’ll also test recall by potentially replicating the deletion experiment from the paper: randomly remove a percentage of nodes and reinsert them over multiple cycles, then observe whether search performance (recall) remains stable. This will allow us to compare how our back-link deletion strategy performs relative to the Delete Policies described in the article.

@Iamdavidonuh Iamdavidonuh changed the title HNSW HNSW Graph: Design Specification Dec 15, 2025
@Iamdavidonuh Iamdavidonuh marked this pull request as ready for review December 15, 2025 13:45
@deven96 deven96 merged commit 3aedec8 into main Dec 15, 2025
@deven96 deven96 deleted the david/hnsw branch December 15, 2025 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants