Skip to content

salma2vec/kairon

Repository files navigation

🧭 Kairon

A practical vector database from first principles. Built for clarity, correctness, and performance

License: MIT Python

Kairon — derived from Kairos (καιρός).

Kairon is a production-quality vector database designed from first principles, implementing multiple indexing strategies (HNSW, KD-Tree, IVF) with deterministic testing, reproducibility, and practical features like metadata filtering and hybrid search.

Quick Start

from kairon import HNSWIndex
import numpy as np

# Generate some vectors
points = np.random.randn(1000, 128).astype(np.float32)

# Build index
index = HNSWIndex.build(points, M=16, ef_construct=200)

# Save index
index.save('models/hnsw.idx')

# Search
query = np.random.randn(128).astype(np.float32)
results = index.search(query, k=10, ef_search=100)
print(results)  # List of (id, distance) tuples

Features

  • Multiple Index Types: HNSW, KD-Tree, and IVF (Inverted File Index)
  • Metadata Filtering: Filter results by metadata predicates
  • Hybrid Search: Combine vector similarity with metadata scores
  • Persistence: Save and load indices with versioned format
  • Incremental Updates: Add vectors to indices (delete via metadata tombstoning planned)
  • Deterministic: All operations use seeded RNG for reproducibility
  • Benchmarks: Comprehensive benchmark harness with recall/QPS metrics

Repository Structure

/kairon/
  src/
    kairon/            # Main package
  tests/               # Test suite
  examples/            # Usage examples
  bench/               # Benchmark scripts and datasets
  docs/                # Design documentation

Installation

pip install -e .

Index Types

HNSW (Hierarchical Navigable Small World)

Fast approximate nearest neighbor search using multi-layer graphs. Recommended for high-dimensional data.

index = HNSWIndex.build(points, M=16, ef_construct=200)
results = index.search(query, k=10, ef_search=100)

KD-Tree

Balanced binary tree with dimension-based splits. Good for low-dimensional data (< 10 dimensions).

from kairon import KDIndex

index = KDIndex.build(points, leaf_size=10)
results = index.search(query, k=10)

IVF (Inverted File Index)

Coarse quantization with inverted lists. Efficient for very large datasets.

from kairon import IVFIndex

index = IVFIndex.build(points, nlist=100, nprobe=10)
results = index.search(query, k=10, nprobe=10)

Metadata and Hybrid Search

# Add metadata to vectors
metadata = [
    {"severity": "P1", "service": "auth", "timestamp": 1680000000},
    {"severity": "P2", "service": "api", "timestamp": 1680001000},
    # ...
]
index.add_metadata(metadata)

# Filter by metadata
filters = {"severity": {"eq": "P1"}, "service": {"eq": "auth"}}
results = index.search(query, k=10, filters=filters)

# Hybrid search: combine vector similarity + metadata score
results = index.search(query, k=10, hybrid_weight=0.7)

Benchmarks

Run benchmarks to evaluate performance:

cd bench
python run_bench.py --dataset synthetic --n 10000 --dim 128

This produces bench_results.json and visualization PNGs in bench/output/.

Testing

pytest tests/

All tests use deterministic seeds for reproducibility.

Documentation

  • DESIGN.md - Architecture and design decisions
  • TUNING.md - Tuning guide for production workloads

License

MIT

References

  • Malkov, Y. A., & Yashunin, D. A. (2018). Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. IEEE transactions on pattern analysis and machine intelligence.
  • Jégou, H., Douze, M., & Schmid, C. (2010). Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence.

About

Vector db from first principles. HNSW, KD-tree, IVF w/ metadata filtering + hybrid search.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages