-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
introduce optimized nndescent to build graph index #272
Comments
Algorithm Design and Related Issues Building a proximal graph for search involves addressing two key issues: finding approximate K-nearest neighbors (KNN) and ensuring graph pruning while maintaining connectivity. For the former, NSG and NNDescent primarily obtain an approximate KNN graph through iterative methods, while HNSW and Vamana derive an optimal KNN graph mainly through search methods. For the latter, HNSW and Vamana ensure the reachability of each point by inserting reverse edges during the search process, whereas NSG ensures reachability through searches by constructing a tree and connecting each search node to the tree. Despite this, the aforementioned methods still have some issues: There are no constraints on the in-degree of nodes, which can result in some points having an in-degree of zero. Algorithm Implementation The algorithm mainly consists of three parts: Finding the nearest neighbors through iterative methods. #./build-release/tools/test_performance '/tmp/dataset/gist-1m-960-euclidean.hdf5' 'build' 'diskann' '{"dim": 960, "dtype": "float32", "metric_type": "l2", "hnsw": {"max_degree": 12, "ef_construction": 100}, "diskann": {"max_degree": 24, "ef_construction": 100, "graph_type":"vamana", "alpha":1.2, "turn": 20, "pq_dims": 64, "pq_sample_rate": 0.1}}' '{"hnsw":{"ef_search":100},"diskann":{"ef_search":100,"beam_search":4,"io_limit":200,"use_reorder":true}}'
[2024-12-31 13:18:02.090] [info] diskann build full (graph) cost 2141.208s
[2024-12-31 13:20:05.831] [info] diskann build full (pq) cost 123.741s
[2024-12-31 13:20:11.570] [info] diskann build full (disk layout) cost 5.740s
[2024-12-31 13:20:19.164] [info] diskann serialize cost 2.466s
{
"build_parameters": "{\"dim\": 960, \"dtype\": \"float32\", \"metric_type\": \"l2\", \"hnsw\": {\"max_degree\": 12, \"ef_construction\": 100}, \"diskann\": {\"max_degree\": 24, \"ef_construction\": 100, \"graph_type\":\"vamana\", \"alpha\":1.2, \"turn\": 20, \"pq_dims\": 64, \"pq_sample_rate\": 0.1}}",
"build_time_in_second": 2275.817019665,
"dataset": "/tmp/dataset/gist-1m-960-euclidean.hdf5",
"index_name": "diskann",
"num_base": 1000000,
"tps": 439.40263710093876
}
#./build-release/tools/test_performance '/tmp/dataset/gist-1m-960-euclidean.hdf5' 'build' 'diskann' '{"dim": 960, "dtype": "float32", "metric_type": "l2", "hnsw": {"max_degree": 12, "ef_construction": 100}, "diskann": {"max_degree": 24, "ef_construction": 100, "graph_type":"odescent", "alpha":1.2, "turn": 20, "pq_dims": 64, "pq_sample_rate": 0.1}}' '{"hnsw":{"ef_search":100},"diskann":{"ef_search":100,"beam_search":4,"io_limit":200,"use_reorder":true}}'
[2024-12-31 14:45:13.505] [info] odescent build full (graph) cost 2170.382s
[2024-12-31 14:46:57.184] [info] diskann build full (pq) cost 103.679s
[2024-12-31 14:47:00.827] [info] diskann build full (disk layout) cost 3.643s
[2024-12-31 14:47:04.318] [info] diskann serialize cost 921.343ms
{
"build_parameters": "{\"dim\": 960, \"dtype\": \"float32\", \"metric_type\": \"l2\", \"hnsw\": {\"max_degree\": 12, \"ef_construction\": 100}, \"diskann\": {\"max_degree\": 24, \"ef_construction\": 100, \"graph_type\":\"odescent\", \"alpha\":1.2, \"turn\": 20, \"pq_dims\": 64, \"pq_sample_rate\": 0.1}}",
"build_time_in_second": 2280.273783652,
"dataset": "/tmp/dataset/gist-1m-960-euclidean.hdf5",
"index_name": "diskann",
"num_base": 1000000,
"tps": 438.543830644072
}
#./build-release/tools/test_performance '/tmp/dataset/gist-1m-960-euclidean.hdf5' 'search' 'diskann' '{"dim": 960, "dtype": "float32", "metric_type": "l2", "hnsw": {"max_degree": 12, "ef_construction": 100}, "diskann": {"max_degree": 24, "ef_construction": 100, "graph_type":"vamana", "alpha":1.2, "turn": 20, "pq_dims": 64, "pq_sample_rate": 0.1}}' '{"hnsw":{"ef_search":100},"diskann":{"ef_search":100,"beam_search":4,"io_limit":200,"use_reorder":true}}'
[2024-12-31 14:48:47.351] [info] diskann deserialize cost 174.976ms
{
"correct": 903,
"dataset": "/tmp/dataset/gist-1m-960-euclidean.hdf5",
"estimate_used_memory": 74952960,
"index_name": "diskann",
"memory": 125366272,
"num_query": 1000,
"qps": 722.1118474664544,
"recall": 0.902999997138977,
"search_parameters": "{\"hnsw\":{\"ef_search\":100},\"diskann\":{\"ef_search\":100,\"beam_search\":4,\"io_limit\":200,\"use_reorder\":true}}",
"search_time_in_second": 1.384827023,
"top_k": 1
} SIFT #./build-release/tools/test_performance '/tmp/dataset/sift-1m-128-euclidean.hdf5' 'build' 'diskann' '{"dim": 128, "dtype": "float32", "metric_type": "l2", "hnsw": {"max_degree": 12, "ef_construction": 100}, "diskann": {"max_degree": 24, "ef_construction": 200, "graph_type":"vamana", "alpha":1.2, "turn": 20, "pq_dims": 64, "pq_sample_rate": 0.1}}' '{"hnsw":{"ef_search":100},"diskann":{"ef_search":100,"beam_search":4,"io_limit":200,"use_reorder":true}}'
[2024-12-31 15:22:07.361] [info] diskann build full (graph) cost 821.301s
[2024-12-31 15:24:25.050] [info] diskann build full (pq) cost 137.689s
[2024-12-31 15:24:26.217] [info] diskann build full (disk layout) cost 1.167s
[2024-12-31 15:24:27.774] [info] diskann serialize cost 444.411ms
{
"build_parameters": "{\"dim\": 128, \"dtype\": \"float32\", \"metric_type\": \"l2\", \"hnsw\": {\"max_degree\": 12, \"ef_construction\": 100}, \"diskann\": {\"max_degree\": 24, \"ef_construction\": 200, \"graph_type\":\"vamana\", \"alpha\":1.2, \"turn\": 20, \"pq_dims\": 64, \"pq_sample_rate\": 0.1}}",
"build_time_in_second": 961.269516893,
"dataset": "/tmp/dataset/sift-1m-128-euclidean.hdf5",
"index_name": "diskann",
"num_base": 1000000,
"tps": 1040.290971914083
}
#./build-release/tools/test_performance '/tmp/dataset/sift-1m-128-euclidean.hdf5' 'search' 'diskann' '{"dim": 128, "dtype": "float32", "metric_type": "l2", "hnsw": {"max_degree": 12, "ef_construction": 100}, "diskann": {"max_degree": 24, "ef_construction": 200, "graph_type":"vamana", "alpha":1.2, "turn": 20, "pq_dims": 64, "pq_sample_rate": 0.1}}' '{"hnsw":{"ef_search":100},"diskann":{"ef_search":100,"beam_search":4,"io_limit":200,"use_reorder":true}}'
[2024-12-31 15:25:05.247] [info] diskann deserialize cost 173.148ms
{
"correct": 9851,
"dataset": "/tmp/dataset/sift-1m-128-euclidean.hdf5",
"estimate_used_memory": 72393728,
"index_name": "diskann",
"memory": 117366784,
"num_query": 10000,
"qps": 897.6470347211786,
"recall": 0.9850999712944031,
"search_parameters": "{\"hnsw\":{\"ef_search\":100},\"diskann\":{\"ef_search\":100,\"beam_search\":4,\"io_limit\":200,\"use_reorder\":true}}",
"search_time_in_second": 11.14023621,
"top_k": 1
} |
No description provided.
The text was updated successfully, but these errors were encountered: