A production-ready, enterprise-grade FastAPI service for GraphRAG (Graph Retrieval-Augmented Generation) operations with comprehensive security, performance monitoring, and multi-workspace support.
- Zero Critical Vulnerabilities: Comprehensive security audit passed
- Path Traversal Protection: Advanced input validation and containment checking
- JWT Authentication: Secure token-based authentication with RBAC
- API Key Management: Granular access control with role-based permissions
- Security Middleware: Rate limiting, CORS, security headers
- GraphQL API: Modern, flexible query language with real-time subscriptions
- REST API: Traditional endpoints with comprehensive OpenAPI documentation
- 100% Feature Parity: All operations available through both interfaces
- Interactive Playgrounds: Built-in GraphQL Playground and Swagger UI
- Multi-Workspace Support: Isolated environments for different projects
- Performance Monitoring: Real-time metrics with Prometheus integration
- Distributed Caching: Redis-based caching with intelligent invalidation
- Database Abstraction: SQLite for lightweight deployment, PostgreSQL ready
- Containerized Deployment: Docker and Docker Compose support
- Microsoft GraphRAG: Complete integration with the GraphRAG framework
- Knowledge Graph Operations: Entity and relationship querying with advanced analytics
- Indexing Management: Background job processing with progress tracking
- Advanced Query Engine: Multi-hop queries, community detection, centrality analysis
- Automatic Cleanup: Configurable TTL-based workspace expiration
- Usage Tracking: Comprehensive metrics for access patterns and resource utilization
- Idle Detection: Smart cleanup based on workspace activity patterns
- Size Management: Automatic cleanup when workspaces exceed size limits
- Graceful Handling: Active operation detection with configurable grace periods
- Manual Controls: Force cleanup and manual cleanup cycle triggers via API
graph TD
A[Workspace Created] --> B[Usage Tracking Started]
B --> C{Workspace Active?}
C -->|Yes| D[Update Access Time]
C -->|No| E{Check Cleanup Criteria}
D --> F[Track Operations]
F --> G[Update Metrics]
G --> C
E --> H{TTL Expired?}
E --> I{Idle Too Long?}
E --> J{Size Exceeded?}
H -->|Yes| K[Mark for Cleanup]
I -->|Yes| K
J -->|Yes| K
H -->|No| L[Continue Monitoring]
I -->|No| L
J -->|No| L
K --> M{Active Operations?}
M -->|Yes| N[Wait Grace Period]
M -->|No| O[Archive Workspace]
N --> P{Grace Period Expired?}
P -->|No| M
P -->|Yes| O
O --> Q[Remove Files]
Q --> R[Delete from Database]
R --> S[Update Statistics]
L --> T[Sleep Interval]
T --> C
style A fill:#e1f5fe
style K fill:#fff3e0
style O fill:#ffebee
style S fill:#e8f5e8
# System Requirements
Python 3.9+
Redis 6.0+ (optional, for caching)
Docker & Docker Compose (for containerized deployment)
# Hardware Recommendations
RAM: 4GB+ (8GB+ for production)
CPU: 2+ cores
Storage: 10GB+ available space# 1. Clone the repository
git clone https://github.com/pierregrothe/graphrag-api.git
cd graphrag-api
# 2. Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Configure environment
cp .env.example .env
# Edit .env with your configuration (see Configuration section)
# 5. Initialize database
python -m src.graphrag_api_service.database.init_db
# 6. Start the service
python -m uvicorn src.graphrag_api_service.main:app --reload --host 0.0.0.0 --port 8000# 1. Clone and configure
git clone https://github.com/pierregrothe/graphrag-api.git
cd graphrag-api
cp .env.example .env
# 2. Start with Docker Compose
docker-compose up -d
# 3. Check service health
curl http://localhost:8000/health# 1. Access the APIs
# REST API Documentation: http://localhost:8000/docs
# GraphQL Playground: http://localhost:8000/graphql
# Health Check: http://localhost:8000/health
# 2. Create your first workspace
curl -X POST "http://localhost:8000/api/workspaces" \
-H "Content-Type: application/json" \
-d '{"name": "my-project", "description": "My first GraphRAG workspace"}'
# 3. Start indexing (replace {workspace_id} with actual ID)
curl -X POST "http://localhost:8000/api/indexing/jobs" \
-H "Content-Type: application/json" \
-d '{"workspace_id": "{workspace_id}", "data_path": "/path/to/your/data"}'graph TB
subgraph "Client Layer"
A[Web Client]
B[API Client]
C[GraphQL Client]
end
subgraph "API Gateway"
D[FastAPI Router]
E[GraphQL Schema]
F[Authentication]
G[Rate Limiting]
end
subgraph "Business Logic"
H[Workspace Manager]
I[Graph Operations]
J[Indexing Engine]
K[Query Processor]
end
subgraph "Data Layer"
L[(SQLite/PostgreSQL)]
M[Redis Cache]
N[File Storage]
O[Vector Store]
end
subgraph "External Services"
P[LLM Providers]
Q[Monitoring]
R[Logging]
end
A --> D
B --> D
C --> E
D --> F
D --> G
E --> F
F --> H
F --> I
F --> J
F --> K
H --> L
I --> M
J --> N
K --> O
I --> P
J --> P
K --> P
D --> Q
E --> Q
H --> R
I --> R
J --> R
K --> R
Create a .env file with the following configuration:
# Application Settings
APP_NAME="GraphRAG API Service"
DEBUG=false
PORT=8000
LOG_LEVEL=INFO
# Security (REQUIRED for production)
JWT_SECRET_KEY=your-super-secret-jwt-key-here
API_KEY_SECRET=your-api-key-secret-here
# GraphRAG Configuration
GRAPHRAG_DATA_PATH=/path/to/graphrag/data
BASE_WORKSPACES_PATH=workspaces
# Workspace Lifecycle Management
WORKSPACE_TTL_HOURS=24 # Default workspace TTL (0 = no expiration)
WORKSPACE_CLEANUP_ENABLED=true # Enable automatic cleanup
WORKSPACE_CLEANUP_INTERVAL_MINUTES=60 # Cleanup cycle interval
WORKSPACE_MAX_IDLE_HOURS=12 # Max idle time before cleanup
WORKSPACE_GRACE_PERIOD_MINUTES=30 # Grace period for active operations
WORKSPACE_MAX_SIZE_MB=1000 # Max workspace size (0 = no limit)
WORKSPACE_USAGE_TRACKING_ENABLED=true # Enable usage tracking
# LLM Provider Settings
LLM_PROVIDER=ollama # or google_gemini
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_LLM_MODEL=gemma:4b
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
# Google Gemini (if using)
GOOGLE_API_KEY=your-google-api-key
GOOGLE_PROJECT_ID=your-project-id
# Database
DATABASE_URL=sqlite:///./data/graphrag.db
# Redis Cache (optional)
REDIS_URL=redis://localhost:6379/0
CACHE_TTL=3600
# Monitoring
ENABLE_METRICS=true
METRICS_PORT=9090For production deployment, ensure:
# Generate secure keys
JWT_SECRET_KEY=$(openssl rand -hex 32)
API_KEY_SECRET=$(openssl rand -hex 32)
# Set secure permissions
chmod 600 .env
# Use environment-specific configurations
export ENVIRONMENT=productionOur service implements comprehensive path traversal protection:
# Secure workspace path validation
def validate_workspace_path(workspace_id: str, settings) -> str:
"""Validate and resolve workspace path to prevent traversal attacks."""
# Input format validation
# Path resolution and containment checking
# Directory existence verification
# Secure error handling- JWT Tokens: Secure, stateless authentication with singleton token management
- API Keys: Service-to-service authentication with granular scopes
- Role-Based Access Control (RBAC): Granular permissions and workspace isolation
- Rate Limiting: Advanced protection against abuse with configurable thresholds
- Session Management: Secure session handling with automatic cleanup
- π Caching Security: Migrated from pickle to JSON serialization, eliminating deserialization vulnerabilities
- π‘οΈ JWT Token Management: Implemented singleton pattern for consistent token blacklist across requests
- β‘ Authentication Exception Handling: Enhanced error handling with proper HTTP status code mapping
- π Password Validation: Strengthened password requirements with comprehensive validation
- π¨ Security Logging: Enhanced security event logging with detailed audit trails
- Content Security Policy (CSP)
- X-Frame-Options
- X-Content-Type-Options
- Strict-Transport-Security
- β Zero Critical Vulnerabilities: All security issues resolved
- β 100% Test Coverage: Comprehensive authentication and security testing
- β Production Ready: Enterprise-grade security controls implemented
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Service health check |
/api/workspaces |
GET, POST | Workspace management |
/api/workspaces/{id} |
GET, PUT, DELETE | Individual workspace operations |
/api/workspaces/{id}/cleanup |
POST | Force cleanup specific workspace |
/api/workspaces/cleanup/stats |
GET | Get cleanup service statistics |
/api/workspaces/cleanup/run |
POST | Trigger manual cleanup cycle |
/api/graph/entities |
GET | Query graph entities |
/api/graph/relationships |
GET | Query graph relationships |
/api/graph/communities |
GET | Query graph communities |
/api/indexing/jobs |
GET, POST | Indexing job management |
/api/system/status |
GET | System status and metrics |
type Query {
workspaces(limit: Int, offset: Int): WorkspaceConnection
entities(workspaceId: String!, limit: Int): EntityConnection
relationships(workspaceId: String!, limit: Int): RelationshipConnection
communities(workspaceId: String!): [Community]
systemHealth: SystemHealth
}
type Mutation {
createWorkspace(input: WorkspaceCreateInput!): Workspace
updateWorkspace(id: String!, input: WorkspaceUpdateInput!): Workspace
deleteWorkspace(id: String!): Boolean
startIndexing(input: IndexingJobInput!): IndexingJob
}
type Subscription {
indexingProgress(jobId: String!): IndexingProgress
systemMetrics: SystemMetrics
}The service is organized into focused, loosely-coupled modules:
auth/- Authentication & authorization systemcaching/- Redis caching implementationdatabase/- SQLite database abstractiongraph/- Graph operations & analysisgraphql/- GraphQL API layermiddleware/- Security & performance middlewareperformance/- Monitoring & optimizationroutes/- REST API endpointssystem/- System health & operationsworkspace/- Multi-workspace management
version: "3.8"
services:
graphrag-api:
build: .
ports:
- "8000:8000"
environment:
- DEBUG=true
- REDIS_URL=redis://redis:6379/0
depends_on:
- redis
volumes:
- ./data:/app/data
- ./workspaces:/app/workspaces
redis:
image: redis:7-alpine
ports:
- "6379:6379"# Build production image
docker build -t graphrag-api:latest .
# Run with production settings
docker run -d \
--name graphrag-api \
-p 8000:8000 \
-e ENVIRONMENT=production \
-e JWT_SECRET_KEY=${JWT_SECRET_KEY} \
-v /data/graphrag:/app/data \
graphrag-api:latest- Request latency and throughput
- Database query performance
- Cache hit/miss rates
- Memory and CPU usage
- GraphRAG operation metrics
/health- Basic service health/health/detailed- Comprehensive system status/metrics- Prometheus metrics endpoint
# Run all tests
pytest
# Run with coverage
pytest --cov=src/graphrag_api_service --cov-report=html
# Run specific test categories
pytest tests/unit/
pytest tests/integration/
pytest tests/security/- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
# Install development dependencies
pip install -r requirements-dev.txt
# Install pre-commit hooks
pre-commit install
# Run code quality checks
ruff check src/
mypy src/
black src/
isort src/This project is licensed under the MIT License - see the LICENSE file for details.
- Microsoft GraphRAG - Core GraphRAG implementation
- FastAPI - Modern web framework
- Strawberry GraphQL - GraphQL library for Python
- Redis - In-memory data structure store
Built with β€οΈ for enterprise-grade knowledge graph applications