This repository contains a collection of implementations for Knowledge Graph-based RAG (Retrieval Augmented Generation) approaches and baseline methods for comparison. The code is structured as a Python package with modular components.
The repository implements several RAG approaches:
-
Baseline approaches:
- Standard RAG: Traditional retrieval-based approach using vector similarity
- Chain-of-Thought RAG: Enhanced retrieval with explicit reasoning steps
-
KG-RAG approaches:
- Entity-based approach: Uses embedding-based entity matching and beam search to find relevant information in the knowledge graph
- Cypher-based approach: Uses Cypher queries to retrieve information from a Neo4j graph database
- GraphRAG-based approach: Implements a community detection and hierarchical search strategy
This project uses uv for dependency management.
# Clone the repository
git clone https://github.com/yourusername/kg-rag.git
cd kg-rag
# Install uv if you don't have it
curl -sSf https://astral.sh/uv/install.sh | bash
uv sync
source .venv/bin/activate
For development, you can install the dev dependencies:
uv sync --dev
source .venv/bin/activate
Export the following environment variables:
OPENAI_API_KEY=your_openai_api_key
For the Cypher-based approach, also add:
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your_password
First, build a vector store for the baseline RAG methods:
python -m scripts.build_baseline_vectordb \
--docs-dir data/sec-10-q/docs \
--collection-name sec_10q \
--persist-dir chroma_db \
--verbose
Build a knowledge graph for KG-RAG methods:
python -m scripts.build_entity_graph \
--docs-dir data/sec-10-q/docs \
--output-dir data/graphs \
--graph-name sec10q_entity_graph \
--verbose
To interactively query using baseline methods:
python -m scripts.run_baseline_rag \
--collection-name sec_10q \
--persist-dir chroma_db \
--model gpt-4o \
--verbose
To interactively query using KG-RAG methods:
python -m scripts.run_entity_rag \
--graph-path data/graphs/sec10q_entity_graph.pkl \
--beam-width 10 \
--max-depth 8 \
--top-k 100 \
--verbose
To evaluate the performance of various RAG methods on a test dataset:
python -m kg_rag.evaluation.run_evaluation \
--data-path data/test_questions.csv \
--graph-path data/graphs/sec10q_entity_graph.pkl \
--method all \
--output-dir evaluation_results \
--collection-name sec_10q \
--persist-dir chroma_db \
--max-samples 50 \
--verbose
To find the optimal hyperparameters for a method:
python -m kg_rag.evaluation.hyperparameter_search \
--data-path data/test_questions.csv \
--graph-path data/graphs/sec10q_entity_graph.pkl \
--method entity \
--configs-path kg_rag/evaluation/hyperparameter_configs.json \
--output-dir hyperparameter_search \
--max-samples 10 \
--verbose
This project uses pre-commit hooks to ensure code quality:
# Run pre-commit hooks on all files
pre-commit run --all-files
# Run tests
pytest
# Run tests with coverage
pytest --cov=kg_rag tests/