Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

define the "pyramid" index to support vector search with paths #166

Open
inabao opened this issue Nov 27, 2024 · 0 comments
Open

define the "pyramid" index to support vector search with paths #166

inabao opened this issue Nov 27, 2024 · 0 comments
Assignees
Labels
kind/feature New feature or request

Comments

@inabao
Copy link
Collaborator

inabao commented Nov 27, 2024

Background

  • Problem: pyramid aims to solve the vector retrieval problem with paths.
  • Example: Each vector is associated with paths such as "a/b/c", "a/b/c/d", "a/b", "a/b/c/e". When given a query path "a/b/c", it retrieves the nearest vector under "a/b/c/*" to the target vector.

Interface Definition
Users need to pass the vector paths through the following interface. Other behaviors of pyramid are defined similarly to other indices.

// Pass the corresponding path to the index
virtual DatasetPtr Paths(const std::string* paths) = 0;

Parameter Design
Build Parameters

{
    "pyramid": {
        "sub_index_type": "index_name", // Specify a sub-index
        "index_param": {
            ………… // Parameters corresponding to the sub-index
        }
    }
}

Search Parameters

{
    "pyramid": {
        ………… // Search parameters corresponding to the sub-index
    }
}

Implementation Principle

  • Based on the path of the vector, we can build an index tree, where each node in the index tree contains the corresponding data volume.
  • For each node in the index tree, we construct the corresponding index.
  • The index can share the underlying data storage, and the redundancy of the index equals the average number of levels.

Usage Demo
Create Index

auto pyramid_build_parameters = R"(
{
    "dtype": "float32",
    "metric_type": "l2",
    "dim": 128,
    "index_param": {
        "sub_index_type": "hnsw",
        "index_param": {
            "max_degree": 16,
            "ef_construction": 100
        }
    }
}
)";
auto index = vsag::Factory::CreateIndex("pyramid", pyramid_build_parameters).value();
auto base = vsag::Dataset::Make();
base->NumElements(num_vectors)->Dim(dim)->Ids(ids)->Float32Vectors(vectors)->Paths(paths)->Owner(false);
index->Build(base);

Prepare a Query Vector

// Memory will be released by querying the dataset since owner is set to true when creating the query.
auto query_vector = new float[dim];
auto query_path = new std::string[1];
for (int64_t i = 0; i < dim; ++i) {
    query_vector[i] = distrib_real(rng);
}
query_path[0] = ......; // generate a random path for query vector.

Search on the Index

auto pyramid_search_parameters = R"(
{
    "pyramid": {
        "ef_search": 100
    }
}
)";
int64_t topk = 10;
auto query = vsag::Dataset::Make();
query->NumElements(1)->Dim(dim)->Float32Vectors(query_vector)->Paths(query_path)->Owner(true);
auto result = index->KnnSearch(query, topk, pyramid_search_parameters).value();

Implementation Detail
The work will be finished by following PRs:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants