Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when building index #705

Closed
sammlapp opened this issue Nov 10, 2024 · 4 comments
Closed

Error when building index #705

sammlapp opened this issue Nov 10, 2024 · 4 comments

Comments

@sammlapp
Copy link

Following the hoplite readme, I try to build an index for ANN with:

index_jax.build_sharded_index(
    db,
    shard_size=500_000,
    shard_degree_bound=128,
    degree_bound=512,
    max_delegates=256,
    alpha=1.5,
    num_steps=-1,
    random_seed=42,
    max_violations=1,
    # sample_size=0, #tried with and without this arg
)

and get OperationalError: too many SQL variables.

Full stack trace:

OperationalError                          Traceback (most recent call last)
Cell In[13], [line 3](vscode-notebook-cell:?execution_count=13&line=3)
      [1](vscode-notebook-cell:?execution_count=13&line=1) from chirp.projects.hoplite import index_jax
----> [3](vscode-notebook-cell:?execution_count=13&line=3) index_jax.build_sharded_index(
      [4](vscode-notebook-cell:?execution_count=13&line=4)     db,
      [5](vscode-notebook-cell:?execution_count=13&line=5)     shard_size=500_000,
      [6](vscode-notebook-cell:?execution_count=13&line=6)     shard_degree_bound=128,
      [7](vscode-notebook-cell:?execution_count=13&line=7)     degree_bound=512,
      [8](vscode-notebook-cell:?execution_count=13&line=8)     max_delegates=256,
      [9](vscode-notebook-cell:?execution_count=13&line=9)     alpha=1.5,
     [10](vscode-notebook-cell:?execution_count=13&line=10)     num_steps=-1,
     [11](vscode-notebook-cell:?execution_count=13&line=11)     random_seed=42,
     [12](vscode-notebook-cell:?execution_count=13&line=12)     max_violations=1,
     [13](vscode-notebook-cell:?execution_count=13&line=13)     sample_size=0,
     [14](vscode-notebook-cell:?execution_count=13&line=14) )

File ~/miniconda3/envs/opso_tf_cuda/lib/python3.11/site-packages/chirp/projects/hoplite/index_jax.py:125, in build_sharded_index(db, shard_size, shard_degree_bound, degree_bound, alpha, num_steps, random_seed, max_violations, **kwargs)
    [123](https://vscode-remote+ssh-002dremote-002bsnow.vscode-resource.vscode-cdn.net/home/sml161/eccoXX_hawaii/3_query_embeddings/~/miniconda3/envs/opso_tf_cuda/lib/python3.11/site-packages/chirp/projects/hoplite/index_jax.py:123) shard = np.concatenate([shard, [root_node]], axis=0)
    [124](https://vscode-remote+ssh-002dremote-002bsnow.vscode-resource.vscode-cdn.net/home/sml161/eccoXX_hawaii/3_query_embeddings/~/miniconda3/envs/opso_tf_cuda/lib/python3.11/site-packages/chirp/projects/hoplite/index_jax.py:124) shards.append(shard)
--> [125](https://vscode-remote+ssh-002dremote-002bsnow.vscode-resource.vscode-cdn.net/home/sml161/eccoXX_hawaii/3_query_embeddings/~/miniconda3/envs/opso_tf_cuda/lib/python3.11/site-packages/chirp/projects/hoplite/index_jax.py:125) new_edges = index_shard(
    [126](https://vscode-remote+ssh-002dremote-002bsnow.vscode-resource.vscode-cdn.net/home/sml161/eccoXX_hawaii/3_query_embeddings/~/miniconda3/envs/opso_tf_cuda/lib/python3.11/site-packages/chirp/projects/hoplite/index_jax.py:126)     db,
    [127](https://vscode-remote+ssh-002dremote-002bsnow.vscode-resource.vscode-cdn.net/home/sml161/eccoXX_hawaii/3_query_embeddings/~/miniconda3/envs/opso_tf_cuda/lib/python3.11/site-packages/chirp/projects/hoplite/index_jax.py:127)     shard,
    [128](https://vscode-remote+ssh-002dremote-002bsnow.vscode-resource.vscode-cdn.net/home/sml161/eccoXX_hawaii/3_query_embeddings/~/miniconda3/envs/opso_tf_cuda/lib/python3.11/site-packages/chirp/projects/hoplite/index_jax.py:128)     alpha=alpha,
    [129](https://vscode-remote+ssh-002dremote-002bsnow.vscode-resource.vscode-cdn.net/home/sml161/eccoXX_hawaii/3_query_embeddings/~/miniconda3/envs/opso_tf_cuda/lib/python3.11/site-packages/chirp/projects/hoplite/index_jax.py:129)     shard_degree_bound=shard_degree_bound,
    [130](https://vscode-remote+ssh-002dremote-002bsnow.vscode-resource.vscode-cdn.net/home/sml161/eccoXX_hawaii/3_query_embeddings/~/miniconda3/envs/opso_tf_cuda/lib/python3.11/site-packages/chirp/projects/hoplite/index_jax.py:130)     max_violations=max_violations,
    [131](https://vscode-remote+ssh-002dremote-002bsnow.vscode-resource.vscode-cdn.net/home/sml161/eccoXX_hawaii/3_query_embeddings/~/miniconda3/envs/opso_tf_cuda/lib/python3.11/site-packages/chirp/projects/hoplite/index_jax.py:131)     **kwargs,
    [132](https://vscode-remote+ssh-002dremote-002bsnow.vscode-resource.vscode-cdn.net/home/sml161/eccoXX_hawaii/3_query_embeddings/~/miniconda3/envs/opso_tf_cuda/lib/python3.11/site-packages/chirp/projects/hoplite/index_jax.py:132) )
    [133](https://vscode-remote+ssh-002dremote-002bsnow.vscode-resource.vscode-cdn.net/home/sml161/eccoXX_hawaii/3_query_embeddings/~/miniconda3/envs/opso_tf_cuda/lib/python3.11/site-packages/chirp/projects/hoplite/index_jax.py:133) for s, e in zip(shard, new_edges):
    [134](https://vscode-remote+ssh-002dremote-002bsnow.vscode-resource.vscode-cdn.net/home/sml161/eccoXX_hawaii/3_query_embeddings/~/miniconda3/envs/opso_tf_cuda/lib/python3.11/site-packages/chirp/projects/hoplite/index_jax.py:134)   edges[s].append(e)

File ~/miniconda3/envs/opso_tf_cuda/lib/python3.11/site-packages/chirp/projects/hoplite/index_jax.py:361, in index_shard(db, shard, shard_degree_bound, **kwargs)
    [359](https://vscode-remote+ssh-002dremote-002bsnow.vscode-resource.vscode-cdn.net/home/sml161/eccoXX_hawaii/3_query_embeddings/~/miniconda3/envs/opso_tf_cuda/lib/python3.11/site-packages/chirp/projects/hoplite/index_jax.py:359) def index_shard(db, shard, shard_degree_bound, **kwargs):
    [360](https://vscode-remote+ssh-002dremote-002bsnow.vscode-resource.vscode-cdn.net/home/sml161/eccoXX_hawaii/3_query_embeddings/~/miniconda3/envs/opso_tf_cuda/lib/python3.11/site-packages/chirp/projects/hoplite/index_jax.py:360)   """Index a subset of embeddings."""
--> [361](https://vscode-remote+ssh-002dremote-002bsnow.vscode-resource.vscode-cdn.net/home/sml161/eccoXX_hawaii/3_query_embeddings/~/miniconda3/envs/opso_tf_cuda/lib/python3.11/site-packages/chirp/projects/hoplite/index_jax.py:361)   shard, embs = db.get_embeddings(shard)
    [362](https://vscode-remote+ssh-002dremote-002bsnow.vscode-resource.vscode-cdn.net/home/sml161/eccoXX_hawaii/3_query_embeddings/~/miniconda3/envs/opso_tf_cuda/lib/python3.11/site-packages/chirp/projects/hoplite/index_jax.py:362)   embs = jnp.asarray(embs)
    [363](https://vscode-remote+ssh-002dremote-002bsnow.vscode-resource.vscode-cdn.net/home/sml161/eccoXX_hawaii/3_query_embeddings/~/miniconda3/envs/opso_tf_cuda/lib/python3.11/site-packages/chirp/projects/hoplite/index_jax.py:363)   shard_edges = -1 * jnp.ones([embs.shape[0], shard_degree_bound], jnp.int32)

File ~/miniconda3/envs/opso_tf_cuda/lib/python3.11/site-packages/chirp/projects/hoplite/sqlite_impl.py:279, in SQLiteGraphSearchDB.get_embeddings(self, embedding_ids)
    [274](https://vscode-remote+ssh-002dremote-002bsnow.vscode-resource.vscode-cdn.net/home/sml161/eccoXX_hawaii/3_query_embeddings/~/miniconda3/envs/opso_tf_cuda/lib/python3.11/site-packages/chirp/projects/hoplite/sqlite_impl.py:274) query = (
    [275](https://vscode-remote+ssh-002dremote-002bsnow.vscode-resource.vscode-cdn.net/home/sml161/eccoXX_hawaii/3_query_embeddings/~/miniconda3/envs/opso_tf_cuda/lib/python3.11/site-packages/chirp/projects/hoplite/sqlite_impl.py:275)     'SELECT id, embedding FROM hoplite_embeddings WHERE id IN '
    [276](https://vscode-remote+ssh-002dremote-002bsnow.vscode-resource.vscode-cdn.net/home/sml161/eccoXX_hawaii/3_query_embeddings/~/miniconda3/envs/opso_tf_cuda/lib/python3.11/site-packages/chirp/projects/hoplite/sqlite_impl.py:276)     f'({placeholders}) ORDER BY id;'
    [277](https://vscode-remote+ssh-002dremote-002bsnow.vscode-resource.vscode-cdn.net/home/sml161/eccoXX_hawaii/3_query_embeddings/~/miniconda3/envs/opso_tf_cuda/lib/python3.11/site-packages/chirp/projects/hoplite/sqlite_impl.py:277) )
    [278](https://vscode-remote+ssh-002dremote-002bsnow.vscode-resource.vscode-cdn.net/home/sml161/eccoXX_hawaii/3_query_embeddings/~/miniconda3/envs/opso_tf_cuda/lib/python3.11/site-packages/chirp/projects/hoplite/sqlite_impl.py:278) cursor = self._get_cursor()
--> [279](https://vscode-remote+ssh-002dremote-002bsnow.vscode-resource.vscode-cdn.net/home/sml161/eccoXX_hawaii/3_query_embeddings/~/miniconda3/envs/opso_tf_cuda/lib/python3.11/site-packages/chirp/projects/hoplite/sqlite_impl.py:279) results = cursor.execute(
    [280](https://vscode-remote+ssh-002dremote-002bsnow.vscode-resource.vscode-cdn.net/home/sml161/eccoXX_hawaii/3_query_embeddings/~/miniconda3/envs/opso_tf_cuda/lib/python3.11/site-packages/chirp/projects/hoplite/sqlite_impl.py:280)     query, tuple(int(c) for c in embedding_ids)
    [281](https://vscode-remote+ssh-002dremote-002bsnow.vscode-resource.vscode-cdn.net/home/sml161/eccoXX_hawaii/3_query_embeddings/~/miniconda3/envs/opso_tf_cuda/lib/python3.11/site-packages/chirp/projects/hoplite/sqlite_impl.py:281) ).fetchall()
    [282](https://vscode-remote+ssh-002dremote-002bsnow.vscode-resource.vscode-cdn.net/home/sml161/eccoXX_hawaii/3_query_embeddings/~/miniconda3/envs/opso_tf_cuda/lib/python3.11/site-packages/chirp/projects/hoplite/sqlite_impl.py:282) result_ids = np.array(tuple(int(c[0]) for c in results))
    [283](https://vscode-remote+ssh-002dremote-002bsnow.vscode-resource.vscode-cdn.net/home/sml161/eccoXX_hawaii/3_query_embeddings/~/miniconda3/envs/opso_tf_cuda/lib/python3.11/site-packages/chirp/projects/hoplite/sqlite_impl.py:283) embeddings = np.array(
    [284](https://vscode-remote+ssh-002dremote-002bsnow.vscode-resource.vscode-cdn.net/home/sml161/eccoXX_hawaii/3_query_embeddings/~/miniconda3/envs/opso_tf_cuda/lib/python3.11/site-packages/chirp/projects/hoplite/sqlite_impl.py:284)     tuple(
    [285](https://vscode-remote+ssh-002dremote-002bsnow.vscode-resource.vscode-cdn.net/home/sml161/eccoXX_hawaii/3_query_embeddings/~/miniconda3/envs/opso_tf_cuda/lib/python3.11/site-packages/chirp/projects/hoplite/sqlite_impl.py:285)         deserialize_embedding(c[1], self.embedding_dtype) for c in results
    [286](https://vscode-remote+ssh-002dremote-002bsnow.vscode-resource.vscode-cdn.net/home/sml161/eccoXX_hawaii/3_query_embeddings/~/miniconda3/envs/opso_tf_cuda/lib/python3.11/site-packages/chirp/projects/hoplite/sqlite_impl.py:286)     )
    [287](https://vscode-remote+ssh-002dremote-002bsnow.vscode-resource.vscode-cdn.net/home/sml161/eccoXX_hawaii/3_query_embeddings/~/miniconda3/envs/opso_tf_cuda/lib/python3.11/site-packages/chirp/projects/hoplite/sqlite_impl.py:287) )

OperationalError: too many SQL variables

@sdenton4
Copy link
Collaborator

Interesting; probably needs some batching...

I'm currently looking to move everything over to sqlite_usearch_impl and probably drop the index_jax methods entirely. Could you give that a shot? In that case, the index is build online during the embedding process.

@sammlapp
Copy link
Author

Sure, I guess I misunderstood the documentation/readme, I was thinking that sqlite_usearch_impl was for brute search and the index_jax was for the ANN search.

Could you provide an example of how to perform the approximate nearest neighbors search? I've just been following the readme so far, so I created the db with

db = sqlite_impl.SQLiteGraphSearchDB.create(
    db_path=db_file_path,
    embedding_dim=1280,
)
db.setup()

and can brute force query using

results, scores = brutalism.threaded_brute_search(db, query, score_fn)

@sdenton4
Copy link
Collaborator

Sure thing. ANN search with usearch is not yet properly hooked up or documented... But here's how to do it! :P
Note that you'll need to embed into the SQLiteUsearchDB first.

db = sqlite_usearch_impl.SQLiteUsearchDB(db_path)
query_embedding = db.get_embedding(1234)  # or provide something new...
matches = db.ui.search(query_embedding)
results = search_results.TopKSearchResults(top_k=len(matches))
for m, d in zip(matches.keys, matches.distances):
  results.update(search_results.SearchResult(m, d)

@sdenton4
Copy link
Collaborator

This is now covered by the dedicated Hoplite repository, so closing here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants