AgenticRAG implements a powerful Deep Research Framework that combines intelligent query planning with reliable Retrieval-Augmented Generation (RAG) for comprehensive question answering. The system takes complex queries and automatically breaks them down into multiple focused sub-queries, processes each through the full Agentic RAG pipeline, and synthesizes all findings into comprehensive reports.
The core innovation is a deep research framework that orchestrates comprehensive analysis:
Deep Research Framework with AgenticRAG
│
├── Search Planner (generate multiple focused queries)
│
├── For each query:
│ ├── AgenticRAG Processing
│ │ ├── Query Routing
│ │ ├── RAG Pipeline (with all reliability features)
│ │ │ ├── Document Retrieval
│ │ │ ├── Document Grading
│ │ │ ├── Query Rewriting (if needed)
│ │ │ └── Web Search Fallback
│ │ └── Answer Generation
│ └── Store individual answer
│
└── Final Report Generator (aggregate all answers)
- Intelligent Query Planning: Automatically generates 3-5 diverse, focused search queries from complex topics
- Reliable Processing: Each sub-query is processed through the full Agentic RAG pipeline with all reliability features
- Comprehensive Synthesis: Results are aggregated into a well-structured, comprehensive report
For a query like "What are the latest advancements in quantum computing?":
-
Query Planning: Generates focused sub-queries:
- "Recent breakthroughs in quantum hardware and qubit stability"
- "Latest quantum algorithms for optimization and machine learning"
- "Current industry applications of quantum computing"
-
Agentic RAG Processing: Each sub-query is processed with:
- Intelligent routing (direct response, RAG, or web search)
- Document retrieval and relevance grading
- Query rewriting when needed (up to 3 retries)
- Web search fallback for time-sensitive information
-
Report Generation: Synthesizes all findings into a comprehensive report
The Agentic RAG system follows a modular architecture with the following components:
- Python 3.8+
- Jupyter Notebook or JupyterLab
- Ollama with Qwen model (qwen3:8b)
- Serper API key for web search functionality
-
Clone the repository:
git clone <repository-url> cd AgenticRAG
-
Install required dependencies:
pip install langchain langchain-community langchain-huggingface langgraph faiss-cpu requests
-
Install Ollama and pull the Qwen model:
# Install Ollama from https://ollama.com/ ollama pull qwen3:8b
-
Set up your Serper API key:
export SERPER_API_KEY=your_serper_api_key_here
Open and run notebooks/DeepResearch_with_AgenticRAG.ipynb
:
- The system takes complex queries and automatically breaks them down into multiple focused sub-queries
- Each sub-query is processed through the full Agentic RAG pipeline with all reliability features
- Results are synthesized into a comprehensive report
Example complex query: "What are the latest advancements in quantum computing?"
The system will:
- Generate focused sub-queries on different aspects
- Process each through the reliable Agentic RAG system
- Create a comprehensive report synthesizing all findings
The Agentic RAG system that powers each query processing includes:
- Adaptive RAG: Intelligently routes queries to the most appropriate processing path
- Corrective RAG: Uses web search as a fallback when internal knowledge is insufficient
- Self-RAG: Implements self-correction mechanisms to reduce hallucinations
Routes queries to one of three processing paths:
- Direct Response: For simple greetings or general knowledge questions
- Vectorstore RAG: For domain-specific questions with document relevance grading
- Web Search: As a fallback when internal RAG fails
- Document relevance grading to ensure quality responses
- Query rewriting mechanism for improved retrieval on failed attempts
- Automatic fallback to web search after maximum RAG retries
- Stateful agent behavior with conversation history tracking
The notebooks include code to rebuild the FAISS vector database from source documents. You can customize the knowledge base by:
- Modifying the source URLs in the database creation section
- Adjusting chunk size and overlap parameters
- Changing the embedding model
You can customize the LLM by modifying the ChatOpenAI
configuration:
- Change the model name
- Adjust temperature for creativity vs. consistency
- Modify other model parameters
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
- This implementation is based on concepts from the LangChain documentation and tutorials
- The knowledge base is built from Lilian Weng's blog posts on LLM topics
- Uses FAISS for efficient vector similarity search
- Integrates with Serper API for web search capabilities