Update tiledb.py vectorstore #105
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Enable 8-bit Vector Types & Extra Distance Metrics in
langchain_community/vectorstores/tiledb.py
Background
TileDB-Vector-Search
already supportsTILEDB_INT8
,TILEDB_UINT8
)The upstream LangChain wrapper always cast embeddings to
float32
and exposed only"euclidean"
.What this PR adds
INDEX_METRICS
now allows"euclidean"
,"squared_l2"
and"cosine"
, mapped tovspy.DistanceMetric
.astype(np.float32)
casts removed. Wrapper acceptsnp.float32
,np.int8
,np.uint8
. Half-precision inputs (float16
,bfloat16
) auto-upcast tofloat32
for storage.TileDB.create()
forwards chosen dtype + metric toflat_index
/ivf_flat_index
._prepare_query_vector()
guarantees correct shape/dtype, upcasts half-precision if needed.from_texts()
,from_embeddings()
,add_texts()
honour an optionalvector_dtype
parameter and keep the selected dtype end-to-end.ValueError
for unsupported metric/dtype; float16/bfloat16 guard for older NumPy; pickle-safety flag retained.float32
,"euclidean"
) behaviour unchanged—existing code runs without modification.Usage Examples