Name	Name	Last commit message	Last commit date
parent directory ..
graph_brain	graph_brain
README.md	README.md
enrich.py	enrich.py
episode_detector.py	episode_detector.py
fact_extractor.py	fact_extractor.py
hot_commit.py	hot_commit.py
memory_engine.py	memory_engine.py

Hybrid Memory System

A memory layer combining vector search (Qdrant), knowledge graphs (FalkorDB), and cross-encoder reranking for AI agent long-term memory.

Architecture

User Query
    │
    ▼
┌──────────────────┐
│ Query Expansion   │ → 5+ search angles from one question
│ (entity, topic,   │   (original, entity-specific, topic,
│  temporal, source) │    source-filtered, semantic opposite)
└────────┬─────────┘
         │
    ┌────┴────┬──────────┐
    ▼         ▼          ▼
┌────────┐ ┌────────┐ ┌────────┐
│ Qdrant │ │FalkorDB│ │  BM25  │
│ Vector │ │ Graph  │ │ Sparse │
│ Search │ │Traversal│ │ Search │
└────┬───┘ └────┬───┘ └────┬───┘
     └──────────┼──────────┘
                ▼
         ┌──────────┐
         │ Reranker │ → bge-reranker-v2-m3
         │(cross-enc)│
         └────┬─────┘
              ▼
         ┌──────────┐
         │  Dedup   │ → Content hash deduplication
         │ + Format │   Smart truncation for LLM context
         └──────────┘

Components

`memory_engine.py` — Core Search & Commit

Multi-angle query expansion: One user question generates 5+ search queries, each catching different memories
Entity graph lookup: Maintains a local entity graph for fast pre-search context
Observational memory: Sub-millisecond keyword lookup against compressed observations
Hybrid search: Vector (Qdrant) + sparse (BM25) + graph (FalkorDB)
Cross-encoder reranking: bge-reranker-v2-m3 reranks combined results
Smart deduplication: Content-based dedup across multiple search angles
Modes: recall, commit, briefing, challenge, deep, whois

`enrich.py` — Importance Scoring Pipeline

Batches 10 chunks per LLM call for importance scoring (1-10) and 1-line summaries
Runs nightly, processing 500 chunks per run (~8 hours for 96K chunks total)
Scores written back to Qdrant payload for priority-weighted retrieval
Zero-cost: uses local Qwen model for all scoring

`hot_commit.py` — Real-Time Fact Extraction (No LLM)

Heuristic-based fact extraction from live sessions — no model call needed
Scans recent session files for user statements, decisions, preferences
Pattern matching for "I am", "decided", "changed", "starting" etc.
Sub-second execution, safe for high-frequency cron

`episode_detector.py` — Episodic Memory Detection

Parses session transcripts into coherent "episodes" (topic-bounded conversation segments)
Time-gap segmentation (30min threshold) + topic coherence scoring
LLM-powered episode summarization and embedding
Stored in dedicated Qdrant collection for temporal retrieval

`fact_extractor.py` — LLM-Powered Personal Knowledge Mining

Deep fact extraction from conversation history using local Qwen
Structured output: who, what, when, significance
Deduplication against existing facts (content hash)
Runs every 4 hours via cron

`graph_brain/` — Knowledge Graph Layer

FalkorDB graph database alongside Qdrant vectors
Node types: Memory, Person, Organization, Project, Topic, Location
Relationship types: MENTIONS, ABOUT, RELATED_TO, WORKS_AT, etc.
Entity extraction via local LLM (structured JSON output)
Multi-hop traversal for "who is connected to what?" queries
Full-text index on memory text for graph-side keyword search

What Makes This Novel

Multi-angle query expansion — most RAG systems do one embedding search. This generates 5+ search queries per question (entity-specific, topic, temporal, source-filtered) and deduplicates results. Catches memories that single-vector search misses.
Hybrid three-way search — vector similarity + BM25 sparse + graph traversal, combined and reranked. Each modality catches different types of memories.
Zero-cost enrichment — importance scoring runs on local inference, processing the entire 96K vector store for $0. Enables priority-weighted retrieval without API costs.
Heuristic hot commit — the fast path doesn't even call a model. Pattern matching extracts facts in <100ms, so crons can run every few minutes without GPU contention.
Episodic memory — not just "facts" but "stories". Episode detection clusters conversations into coherent narratives with summaries, enabling "what happened last time we discussed X?" queries.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Hybrid Memory System

Architecture

Components

`memory_engine.py` — Core Search & Commit

`enrich.py` — Importance Scoring Pipeline

`hot_commit.py` — Real-Time Fact Extraction (No LLM)

`episode_detector.py` — Episodic Memory Detection

`fact_extractor.py` — LLM-Powered Personal Knowledge Mining

`graph_brain/` — Knowledge Graph Layer

What Makes This Novel

FilesExpand file tree

memory

Directory actions

More options

Directory actions

More options

Latest commit

History

memory

Folders and files

parent directory

README.md

Hybrid Memory System

Architecture

Components

memory_engine.py — Core Search & Commit

enrich.py — Importance Scoring Pipeline

hot_commit.py — Real-Time Fact Extraction (No LLM)

episode_detector.py — Episodic Memory Detection

fact_extractor.py — LLM-Powered Personal Knowledge Mining

graph_brain/ — Knowledge Graph Layer

What Makes This Novel

`memory_engine.py` — Core Search & Commit

`enrich.py` — Importance Scoring Pipeline

`hot_commit.py` — Real-Time Fact Extraction (No LLM)

`episode_detector.py` — Episodic Memory Detection

`fact_extractor.py` — LLM-Powered Personal Knowledge Mining

`graph_brain/` — Knowledge Graph Layer