A question-answering system built with LangChain and ChromaDB that provides accurate answers based on a knowledge base.
- Document loading and text splitting
- Vector embeddings using OpenAI
- Persistent vector storage with ChromaDB
- Interactive question-answering interface
- Cross-platform compatibility
project/
├── data/ # Knowledge base files
│ └── knowledge.txt
├── db/ # Vector database storage
│ └── chroma_db/
├── src/ # Source code
│ └── qa_system.py
├── .env # Environment variables
├── .gitignore # Git ignore rules
├── requirements.txt # Python dependencies
└── README.md # This file
- Python 3.8 or higher
- OpenAI API key with sufficient credits
- Required Python packages:
- langchain>=0.1.0
- openai>=1.0.0
- chromadb>=0.4.0
-
Clone the repository:
git clone <repository-url> cd <project-directory>
-
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Create a
.env
file in the project root with your API key:OPENAI_API_KEY=your-api-key-here
-
Add your knowledge base file to the
data/
directory:- Place your text file in
data/knowledge.txt
- The system will automatically split and process the content
- Place your text file in
The system comes with a pre-configured knowledge base about migratory birds in data/knowledge.txt
. You can:
- Replace the content with your own knowledge base
- Ensure the text is well-structured with clear headings and sections
- Keep the file size reasonable for optimal performance
-
Run the Q&A system:
python src/qa_system.py
-
Enter your questions when prompted
-
Type 'quit' to exit the program
The system is optimized for questions about:
- Basic migration concepts and navigation methods
- Different types of migration (Latitudinal, Longitudinal, Altitudinal)
- Notable migratory species and their records
- Conservation challenges and efforts
- Migration timing and seasons
- The system loads your knowledge base from
data/knowledge.txt
- Documents are split into chunks for processing
- OpenAI embeddings are generated for each chunk
- ChromaDB stores the embeddings persistently
- When you ask a question:
- The question is embedded
- Similar chunks are retrieved
- An answer is generated using the context
- The vector database is stored in
db/chroma_db/
- To force a fresh start, delete the
db/chroma_db/
directory - The system will automatically recreate the database on next run
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Create a
.env
file in the project root - Add your OpenAI API key to the
.env
file:OPENAI_API_KEY=your-api-key-here
- Ensure
.env
is listed in your.gitignore
- Never commit the
.env
file or share it publicly
- Maximum query length: 500 characters
- Rate limits: Follow OpenAI's API rate limits
- Error handling: The system provides clear error messages without exposing sensitive information
If you encounter errors:
- Check your API key is valid
- Ensure the knowledge.txt file exists
- Verify you have sufficient API credits
- Check your internet connection
Current version: 1.0.0