A practical vector database from first principles. Built for clarity, correctness, and performance
Kairon — derived from Kairos (καιρός).
Kairon is a production-quality vector database designed from first principles, implementing multiple indexing strategies (HNSW, KD-Tree, IVF) with deterministic testing, reproducibility, and practical features like metadata filtering and hybrid search.
from kairon import HNSWIndex
import numpy as np
# Generate some vectors
points = np.random.randn(1000, 128).astype(np.float32)
# Build index
index = HNSWIndex.build(points, M=16, ef_construct=200)
# Save index
index.save('models/hnsw.idx')
# Search
query = np.random.randn(128).astype(np.float32)
results = index.search(query, k=10, ef_search=100)
print(results) # List of (id, distance) tuples- Multiple Index Types: HNSW, KD-Tree, and IVF (Inverted File Index)
- Metadata Filtering: Filter results by metadata predicates
- Hybrid Search: Combine vector similarity with metadata scores
- Persistence: Save and load indices with versioned format
- Incremental Updates: Add vectors to indices (delete via metadata tombstoning planned)
- Deterministic: All operations use seeded RNG for reproducibility
- Benchmarks: Comprehensive benchmark harness with recall/QPS metrics
/kairon/
src/
kairon/ # Main package
tests/ # Test suite
examples/ # Usage examples
bench/ # Benchmark scripts and datasets
docs/ # Design documentation
pip install -e .Fast approximate nearest neighbor search using multi-layer graphs. Recommended for high-dimensional data.
index = HNSWIndex.build(points, M=16, ef_construct=200)
results = index.search(query, k=10, ef_search=100)Balanced binary tree with dimension-based splits. Good for low-dimensional data (< 10 dimensions).
from kairon import KDIndex
index = KDIndex.build(points, leaf_size=10)
results = index.search(query, k=10)Coarse quantization with inverted lists. Efficient for very large datasets.
from kairon import IVFIndex
index = IVFIndex.build(points, nlist=100, nprobe=10)
results = index.search(query, k=10, nprobe=10)# Add metadata to vectors
metadata = [
{"severity": "P1", "service": "auth", "timestamp": 1680000000},
{"severity": "P2", "service": "api", "timestamp": 1680001000},
# ...
]
index.add_metadata(metadata)
# Filter by metadata
filters = {"severity": {"eq": "P1"}, "service": {"eq": "auth"}}
results = index.search(query, k=10, filters=filters)
# Hybrid search: combine vector similarity + metadata score
results = index.search(query, k=10, hybrid_weight=0.7)Run benchmarks to evaluate performance:
cd bench
python run_bench.py --dataset synthetic --n 10000 --dim 128This produces bench_results.json and visualization PNGs in bench/output/.
pytest tests/All tests use deterministic seeds for reproducibility.
MIT
- Malkov, Y. A., & Yashunin, D. A. (2018). Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. IEEE transactions on pattern analysis and machine intelligence.
- Jégou, H., Douze, M., & Schmid, C. (2010). Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence.