Update tiledb.py vectorstore #105

BBC-Esq · 2025-06-11T03:14:32Z

Enable 8-bit Vector Types & Extra Distance Metrics in `langchain_community/vectorstores/tiledb.py`

Background

TileDB-Vector-Search already supports

8-bit vector storage (TILEDB_INT8, TILEDB_UINT8)
Distance metrics — L2 (Euclidean), squared-L2 (sum-of-squares) and Cosine (TileDB transparently normalises vectors for cosine)
INT8 indices since the May-2024 release

The upstream LangChain wrapper always cast embeddings to float32 and exposed only "euclidean".

What this PR adds

Area	Change
Metric support	`INDEX_METRICS` now allows `"euclidean"`, `"squared_l2"` and `"cosine"`, mapped to `vspy.DistanceMetric`.
Dtype handling	Hard-coded `astype(np.float32)` casts removed. Wrapper accepts `np.float32`, `np.int8`, `np.uint8`. Half-precision inputs (`float16`,`bfloat16`) auto-upcast to `float32` for storage.
Cosine workflow	Normalisation is left to TileDB’s internal routines; wrapper performs no ingest-time or query-time normalisation (except a local copy for MMR post-processing).
Index creation	`TileDB.create()` forwards chosen dtype + metric to `flat_index` / `ivf_flat_index`.
Query helper	New `_prepare_query_vector()` guarantees correct shape/dtype, upcasts half-precision if needed.
Ingestion paths	`from_texts()`, `from_embeddings()`, `add_texts()` honour an optional `vector_dtype` parameter and keep the selected dtype end-to-end.
Validation	Clear `ValueError` for unsupported metric/dtype; float16/bfloat16 guard for older NumPy; pickle-safety flag retained.
Backward compatibility	Default (`float32`, `"euclidean"`) behaviour unchanged—existing code runs without modification.

Usage Examples

import numpy as np
from langchain_community.vectorstores import TileDB
from langchain_community.embeddings import SentenceTransformerEmbeddings

emb = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
texts = ["Vector search is fast.", "Cosine similarity loves unit vectors!"]

# 1 – INT8 IVF_FLAT index with Cosine distance
db = TileDB.from_texts(
    texts,
    emb,
    metric="cosine",
    vector_dtype=np.int8,
    index_type="IVF_FLAT",
    index_uri="/tmp/tiledb_int8_cosine",
)

docs = db.similarity_search("speedy vector search", k=2)

# 2 – UINT8 FLAT index with squared-L2
pairs = list(zip(texts, emb.embed_documents(texts)))
db2 = TileDB.from_embeddings(
    pairs,
    emb,
    metric="squared_l2",
    vector_dtype=np.uint8,
    index_type="FLAT",
    index_uri="/tmp/tiledb_uint8_sumsq",
)

# 3 – Load existing index and query
db3 = TileDB.load("/tmp/tiledb_uint8_sumsq", emb, metric="squared_l2")
print(db3.similarity_search_with_score("vector maths", k=1))

BBC-Esq · 2025-06-11T14:27:39Z

@tomaarsen perhaps your could review as well since you're familiar with the sentence transformers side of things?

BBC-Esq · 2025-06-11T14:28:43Z

@ihnorton I forgot to mention that it would be helpful if you could review as well since you're familiar with the tiledb vector search side?

BBC-Esq · 2025-06-15T09:05:08Z

Can this get a review please?

Update tiledb.py

c2834b3

BBC-Esq changed the title ~~Update tiledb.py~~ Update tiledb.py vectorstore Jun 11, 2025

BBC-Esq mentioned this pull request Jun 11, 2025

formally support int8 and uint8 within langchain and 2 distance metrics TileDB-Inc/TileDB-Vector-Search#561

Open

BBC-Esq added 3 commits June 11, 2025 00:36

Update tiledb.py

242ac06

Update tiledb.py

ff31adb

Update tiledb.py

9cb2e4e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update tiledb.py vectorstore #105

Update tiledb.py vectorstore #105

Uh oh!

BBC-Esq commented Jun 11, 2025 •

edited

Loading

Uh oh!

BBC-Esq commented Jun 11, 2025

Uh oh!

BBC-Esq commented Jun 11, 2025 •

edited

Loading

Uh oh!

BBC-Esq commented Jun 15, 2025

Uh oh!

Uh oh!

Update tiledb.py vectorstore #105

Are you sure you want to change the base?

Update tiledb.py vectorstore #105

Uh oh!

Conversation

BBC-Esq commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Enable 8-bit Vector Types & Extra Distance Metrics in langchain_community/vectorstores/tiledb.py

Background

What this PR adds

Usage Examples

Uh oh!

BBC-Esq commented Jun 11, 2025

Uh oh!

BBC-Esq commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BBC-Esq commented Jun 15, 2025

Uh oh!

Uh oh!

BBC-Esq commented Jun 11, 2025 •

edited

Loading

Enable 8-bit Vector Types & Extra Distance Metrics in `langchain_community/vectorstores/tiledb.py`

BBC-Esq commented Jun 11, 2025 •

edited

Loading