KG-RAG: Knowledge Graph-based Retrieval Augmented Generation

This repository contains a collection of implementations for Knowledge Graph-based RAG (Retrieval Augmented Generation) approaches and baseline methods for comparison. The code is structured as a Python package with modular components.

Overview

The repository implements several RAG approaches:

Baseline approaches:
- Standard RAG: Traditional retrieval-based approach using vector similarity
- Chain-of-Thought RAG: Enhanced retrieval with explicit reasoning steps
KG-RAG approaches:
- Entity-based approach: Uses embedding-based entity matching and beam search to find relevant information in the knowledge graph
- Cypher-based approach: Uses Cypher queries to retrieve information from a Neo4j graph database
- GraphRAG-based approach: Implements a community detection and hierarchical search strategy

Installation

Using uv (Recommended)

This project uses uv for dependency management.

# Clone the repository
git clone https://github.com/yourusername/kg-rag.git
cd kg-rag

# Install uv if you don't have it
curl -sSf https://astral.sh/uv/install.sh | bash

uv sync
source .venv/bin/activate

For development, you can install the dev dependencies:

uv sync --dev
source .venv/bin/activate

Environment Variables

Export the following environment variables:

OPENAI_API_KEY=your_openai_api_key

For the Cypher-based approach, also add:

NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your_password

Usage

1. Building Vector Store for Baseline Methods

First, build a vector store for the baseline RAG methods:

python -m scripts.build_baseline_vectordb \
    --docs-dir data/sec-10-q/docs \
    --collection-name sec_10q \
    --persist-dir chroma_db \
    --verbose

2. Building Knowledge Graphs

Build a knowledge graph for KG-RAG methods:

python -m scripts.build_entity_graph \
    --docs-dir data/sec-10-q/docs \
    --output-dir data/graphs \
    --graph-name sec10q_entity_graph \
    --verbose

3. Running Interactive Query Mode

To interactively query using baseline methods:

python -m scripts.run_baseline_rag \
    --collection-name sec_10q \
    --persist-dir chroma_db \
    --model gpt-4o \
    --verbose

To interactively query using KG-RAG methods:

python -m scripts.run_entity_rag \
    --graph-path data/graphs/sec10q_entity_graph.pkl \
    --beam-width 10 \
    --max-depth 8 \
    --top-k 100 \
    --verbose

4. Running Evaluation

To evaluate the performance of various RAG methods on a test dataset:

python -m kg_rag.evaluation.run_evaluation \
    --data-path data/test_questions.csv \
    --graph-path data/graphs/sec10q_entity_graph.pkl \
    --method all \
    --output-dir evaluation_results \
    --collection-name sec_10q \
    --persist-dir chroma_db \
    --max-samples 50 \
    --verbose

5. Running Hyperparameter Search

To find the optimal hyperparameters for a method:

python -m kg_rag.evaluation.hyperparameter_search \
    --data-path data/test_questions.csv \
    --graph-path data/graphs/sec10q_entity_graph.pkl \
    --method entity \
    --configs-path kg_rag/evaluation/hyperparameter_configs.json \
    --output-dir hyperparameter_search \
    --max-samples 10 \
    --verbose

Development

Pre-commit hooks

This project uses pre-commit hooks to ensure code quality:

# Run pre-commit hooks on all files
pre-commit run --all-files

Running tests

# Run tests
pytest

# Run tests with coverage
pytest --cov=kg_rag tests/

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.github		.github
blog		blog
data		data
kg_rag		kg_rag
scripts		scripts
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md
literature_review.md		literature_review.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KG-RAG: Knowledge Graph-based Retrieval Augmented Generation

Overview

Installation

Using uv (Recommended)

Environment Variables

Usage

1. Building Vector Store for Baseline Methods

2. Building Knowledge Graphs

3. Running Interactive Query Mode

4. Running Evaluation

5. Running Hyperparameter Search

Development

Pre-commit hooks

Running tests

About

Releases

Packages

Contributors 4

Languages

License

VectorInstitute/kg-rag

Folders and files

Latest commit

History

Repository files navigation

KG-RAG: Knowledge Graph-based Retrieval Augmented Generation

Overview

Installation

Using uv (Recommended)

Environment Variables

Usage

1. Building Vector Store for Baseline Methods

2. Building Knowledge Graphs

3. Running Interactive Query Mode

4. Running Evaluation

5. Running Hyperparameter Search

Development

Pre-commit hooks

Running tests

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages