StudyFetch AI Tutor is a web application that helps students understand PDF documents through an interactive split-screen interface. Users can upload PDFs and chat with an AI about the document's content, with the AI able to reference and highlight relevant parts of the PDF in real-time.
- 🔐 User Authentication: Secure email/password signup and login with JWT-based session management
- 📄 PDF Upload & Viewing: Upload, store, and navigate PDF documents with streaming processing
- 💬 AI Chat Interface: Interact with the AI about document content via text with streaming responses
- 🔍 Advanced Document Search:
- Hybrid Search: Combines semantic (vector) and keyword (full-text) search for better retrieval
- Re-ranking: Cross-encoder re-ranking improves result quality
- Query Expansion: Optional multi-query retrieval for complex queries
- 📌 Context-Aware Responses: AI references specific page numbers and content from the PDF
- 📝 Persistent Conversations: Chat history is saved and can be resumed later
- 🔄 Multi-Document Support: Upload and manage multiple documents with separate conversation histories
- ⚡ Performance Optimizations: Redis caching, background workers, rate limiting
- Next.js 15+ with App Router
- React 19
- TailwindCSS for styling
- React PDF for PDF rendering
- React Markdown for formatted AI responses
- FastAPI - Modern Python web framework
- SQLAlchemy - ORM for database operations
- PostgreSQL with pgvector extension for vector similarity search
- Redis - Caching and rate limiting
- ARQ - Background job processing for document uploads
- AWS S3 - PDF file storage
- JWT - Token-based authentication
The backend is split into two Python services (see MICROSERVICES_ARCHITECTURE.md for details):
- Main Backend (
backend, port 8001) – FastAPI app with business logic and database access (no heavy ML dependencies) - Embedding Service (
embedding-service, port 8002) – FastAPI microservice that hosts PyTorch/sentence-transformers models for embeddings and reranking
- OpenAI GPT-4/GPT-4o-mini - Chat completions with streaming support
- sentence-transformers - Embedding generation (all-mpnet-base-v2, 768 dimensions) running inside the embedding service
- LangChain - Document processing and core RAG utilities
- Cross-Encoder Re-ranking - ms-marco-MiniLM-L-6-v2 for improved retrieval quality
- LangGraph-based Agent Workflow - Optional multi-step RAG agent pipeline
Image credit: https://www.dailydoseofds.com/
The application follows an advanced Retrieval Augmented Generation (RAG) approach:
- Document Processing: PDF documents are processed into chunks with streaming support
- Embedding Generation: Each chunk gets a vector embedding representing its semantic meaning
- Hybrid Retrieval: When the user asks a question:
- Query embedding is generated
- Semantic Search: Vector similarity search using pgvector
- Keyword Search: PostgreSQL full-text search (optional)
- Hybrid Fusion: Combines both results with weighted scoring
- Re-ranking: Cross-encoder re-ranks top candidates for better relevance
- Response Generation: The AI generates a streaming response based on retrieved context with citations
- Frontend: Next.js app (UI only, no API routes)
- Main Backend API: FastAPI server on port 8001 (business logic, authentication, RAG orchestration)
- Embedding Service: Separate FastAPI microservice on port 8002 for embeddings and reranking
- Background Workers: ARQ workers for async document processing
- PostgreSQL Database: Stores user data, documents, conversations, and vector embeddings
- Redis: Caching layer and rate limiting
The LangGraph-based agent workflow (via RAGAgentService) runs inside the main backend and can be toggled per request using the use_agent flag in chat requests.
- Node.js v18+
- Python 3.9+
- Docker and Docker Compose
- OpenAI API key
- AWS S3 credentials (for production deployment)
git clone https://github.com/CruiseDevice/ai-tutor
cd ai-tutornpm installcd backend
pip install -r requirements.txt
cd ..# First-time build (builds backend + embedding service)
docker-compose down -v # optional cleanup
DOCKER_BUILDKIT=1 docker-compose up --build
# Subsequent runs (reuses cached embedding service image)
docker-compose up -d
# Verify services are running
curl http://localhost:8001/health # Backend API
curl http://localhost:5432 # PostgreSQL (should fail, but confirms port is open)The Docker Compose setup includes:
- PostgreSQL (
db, port 5432) - Database with pgvector extension - Redis (
redis, port 6379) - Caching and rate limiting - Embedding Service (
embedding-service, port 8002) - ML microservice for embeddings and reranking - Main Backend API (
backend, port 8001) - FastAPI server - Worker (
worker) - ARQ background worker for document processing
Create a .env file in the root directory:
# Database
DATABASE_URL="postgresql://postgres:postgres@localhost:5432/studyfetch"
# Redis
REDIS_URL="redis://localhost:6379/0"
# S3 Storage
AWS_REGION="us-east-1"
AWS_ACCESS_KEY_ID="your-access-key"
AWS_SECRET_ACCESS_KEY="your-secret-key"
S3_PDFBUCKET_NAME="your-bucket-name"
# Security
JWT_SECRET="change-this-secret-in-production"
ENCRYPTION_KEY="your-32-byte-encryption-key-for-api-keys"
# Backend URL (for frontend)
NEXT_PUBLIC_BACKEND_URL="http://localhost:8001"
# Environment
NODE_ENV="development"Docker Compose will automatically pick up JWT_SECRET, AWS_REGION, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, S3_PDFBUCKET_NAME, and NODE_ENV from this .env file when bringing up the backend and worker containers.
The database tables are automatically created on backend startup. The backend includes automatic migrations for:
- pgvector extension setup
- HNSW indexes for vector search
- Full-text search indexes for hybrid search
- Document status fields
- Other schema updates
If you need to manually verify:
# Connect to PostgreSQL
docker exec -it study-fetch-tutor-db-1 psql -U postgres -d studyfetch
# Check pgvector extension
\dx
# Check tables
\dt-
Start backend stack with Docker (from repo root):
DOCKER_BUILDKIT=1 docker-compose up --build
This starts PostgreSQL, Redis, embedding service, backend API, and worker.
-
Start the frontend locally:
npm run dev
-
Visit the app at:
- Frontend: http://localhost:3000
- Backend API (in Docker): http://localhost:8001
- Embedding Service (in Docker): http://localhost:8002/health
-
Start the backend directly:
cd backend uvicorn app.main:app --reload --port 8001 -
Start the frontend:
npm run dev
-
Start the worker (if not using Docker):
cd backend arq app.workers.arq_config.WorkerSettings
The application will be available at:
- Frontend: http://localhost:3000
- Backend API: http://localhost:8001
- API Documentation: http://localhost:8001/docs (Swagger UI)
Use Docker Compose for production deployment:
docker-compose up -dMake sure to set appropriate environment variables in your .env file, especially:
JWT_SECRET- Use a strong random secretENCRYPTION_KEY- 32-byte key for API key encryptionCOOKIE_SECURE=True- For HTTPSNODE_ENV=production
- Register/Login: Create an account or sign in
- API Setup: Navigate to Settings and add your OpenAI API key (encrypted and stored securely)
- Upload a PDF: On the dashboard, click "Upload PDF" to begin
- Wait for Processing: Documents are processed in the background (check status in the UI)
- Chat with the Document: Ask questions about the PDF content
- Document History: Access previous documents and conversations from the sidebar
/
├── backend/ # Python FastAPI Backend
│ ├── app/
│ │ ├── api/ # API route handlers
│ │ │ ├── auth.py # Authentication endpoints
│ │ │ ├── chat.py # Chat endpoints (with streaming)
│ │ │ ├── documents.py # Document upload/management
│ │ │ ├── conversations.py # Conversation management
│ │ │ └── user.py # User profile/settings
│ │ ├── core/ # Core utilities
│ │ │ ├── security.py # JWT, password hashing
│ │ │ ├── deps.py # FastAPI dependencies
│ │ │ └── rate_limiting.py # Rate limiting middleware
│ │ ├── models/ # SQLAlchemy models
│ │ │ ├── user.py
│ │ │ ├── document.py
│ │ │ └── conversation.py
│ │ ├── schemas/ # Pydantic schemas
│ │ ├── services/ # Business logic
│ │ │ ├── chat_service.py # RAG pipeline with hybrid search
│ │ │ ├── document_service.py # PDF processing
│ │ │ ├── embedding_service.py # Embedding generation
│ │ │ ├── rerank_service.py # Cross-encoder re-ranking
│ │ │ ├── cache_service.py # Redis caching
│ │ │ └── auth_service.py # Authentication logic
│ │ ├── workers/ # Background jobs
│ │ │ ├── arq_config.py # ARQ configuration
│ │ │ └── document_jobs.py # Document processing jobs
│ │ ├── database.py # Database connection
│ │ ├── database_migrations.py # Schema migrations
│ │ ├── config.py # Settings and configuration
│ │ └── main.py # FastAPI app entry point
│ ├── Dockerfile
│ └── requirements.txt
│
├── src/ # Next.js Frontend
│ ├── app/ # Next.js app directory
│ │ ├── dashboard/ # Dashboard page
│ │ ├── login/ # Login page
│ │ ├── register/ # Registration page
│ │ ├── settings/ # Settings page
│ │ └── layout.tsx # Root layout
│ ├── components/ # React components
│ │ ├── ChatInterface.tsx # Chat UI with streaming
│ │ ├── Dashboard.tsx # Main application component
│ │ ├── EnhancedPDFViewer.tsx # PDF viewer with annotations
│ │ └── ...
│ └── lib/
│ └── api-client.ts # Backend API client
│
├── docker-compose.yml # Docker services configuration
└── package.json # Frontend dependencies
All backend APIs are available at http://localhost:8001:
POST /api/auth/register- User registrationPOST /api/auth/login- User login (returns JWT cookie)POST /api/auth/logout- User logoutGET /api/auth/me- Get current userGET /api/auth/verify-session- Verify sessionPOST /api/auth/password-reset/request- Request password resetPOST /api/auth/password-reset/confirm- Confirm password reset
POST /api/documents- Upload PDFPOST /api/documents/process- Process document (triggers background job)GET /api/documents- List user's documentsGET /api/documents/{id}- Get specific document with signed S3 URLDELETE /api/documents/{id}- Delete document
POST /api/chat- Send message and get AI responsePOST /api/chat/stream- Send message and get streaming AI response (SSE)GET /api/conversations- List conversationsGET /api/conversations/{id}- Get conversation with messagesDELETE /api/conversations/{id}- Delete conversation
GET /api/user/profile- Get user profilePUT /api/user/profile- Update profilePOST /api/user/apikey- Update OpenAI API key (encrypted)GET /api/user/apikey/check- Check if API key existsDELETE /api/user/apikey- Delete API key
GET /api/config- Get backend configuration
Combines semantic (vector) and keyword (full-text) search for improved retrieval:
- Configurable weights:
SEMANTIC_SEARCH_WEIGHT(default: 0.7) andKEYWORD_SEARCH_WEIGHT(default: 0.3) - Falls back to semantic-only if keyword search fails
Cross-encoder re-ranking improves retrieval quality:
- Enabled by default:
RERANK_ENABLED=True - Retrieves top 20 candidates, re-ranks, returns top 5
- Uses
cross-encoder/ms-marco-MiniLM-L-6-v2model
Optional multi-query retrieval for complex queries:
- Disabled by default:
QUERY_EXPANSION_ENABLED=False - Generates query variations using GPT-4o-mini
- Uses Reciprocal Rank Fusion (RRF) to combine results
Redis-based caching for performance:
- Embedding Cache: Caches query embeddings (24h TTL)
- Response Cache: Caches similar queries (2h TTL)
- Chunk Cache: Caches retrieval results (72h TTL)
- Compression for large cache values
- Document Processing: Streaming PDF processing for progressive availability
- Chat Responses: Server-Sent Events (SSE) for real-time streaming responses
cd backend
python -m pytest test_*.pyMigrations are handled automatically on startup via database_migrations.py. For manual migrations:
# In Python shell or script
from app.database import engine
from app.database_migrations import add_pgvector_hnsw_index
add_pgvector_hnsw_index(engine)To change the embeddings model, update in backend/app/services/embedding_service.py:
EMBEDDING_MODEL = "sentence-transformers/all-mpnet-base-v2"To change the re-ranking model, update in backend/app/config.py:
RERANK_MODEL = "cross-encoder/ms-marco-MiniLM-L-6-v2"See backend/app/config.py for all available configuration options:
- Rate limiting settings
- Cache TTLs and thresholds
- Search weights and parameters
- Feature flags (re-ranking, query expansion, streaming)
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- OpenAI for the GPT API
- Sentence Transformers for embeddings
- LangChain for document processing utilities
- FastAPI for the excellent Python web framework
- pgvector for PostgreSQL vector search capabilities
