Document-Based Q&A Retrieval System

This repository contains the implementation of a Document Q&A Application that utilizes LangChain, FAISS, and Google Generative AI Embeddings for vector-based similarity search and document-based question answering.

Introduction

The Gemma Model Document Q&A Application is a Streamlit-based tool that uses advanced language models like Groq Llama3-8b-8192 and Google Generative AI Embeddings to answer user queries based on the contents of uploaded documents.
It enables:

PDF ingestion and processing.
Chunked document embeddings for efficient retrieval.
Question-answering based on context-relevant documents.

Features

Streamlit UI: Interactive web interface for document embedding and querying.
FAISS Vector Store: Efficient vector-based similarity search.
LangChain Integration: Structured document chains and retrieval chains.
Google Generative AI Embeddings: High-quality embedding generation for context retrieval.
PDF Document Processing: Support for loading multiple PDF documents.

Prerequisites

Python 3.8 or higher
API keys for:
- Groq API (GROQ_API_KEY)
- Google API (GOOGLE_API_KEY)

Setup and Installation

Clone the Repository:

git clone https://github.com/your-username/gemma-document-qa.git
cd gemma-document-qa

Create a Virtual Environment:

python -m venv env
source env/bin/activate  # On Windows: env\Scripts\activate

Install Dependencies:
```
pip install -r requirements.txt
```
Environment Configuration: Create a .env file in the root directory and add:
```
GROQ_API_KEY=<your_groq_api_key>
GOOGLE_API_KEY=<your_google_api_key>
```
Data Setup:
- Place the PDF documents you want to process in the us_census directory.

How to Run the Application

Start the Streamlit App:
```
streamlit run app.py
```
Steps to Use:
- Step 1: Click on the "Documents Embedding" button to load and process the documents.
- Step 2: Enter your question in the input field.
- Step 3: View the AI-generated answer and related document excerpts.

Project Workflow

Data Ingestion: Load PDF documents using PyPDFDirectoryLoader.
Text Splitting: Chunk documents into smaller pieces using RecursiveCharacterTextSplitter.
Vector Store Creation: Use FAISS for embedding and similarity search.
Prompt Engineering: Define a structured prompt for the language model.
Query Processing:
- Use the retrieval chain to fetch relevant chunks.
- Generate answers based on the query and relevant context.

Dependencies

The project relies on the following libraries and frameworks:

Streamlit: Web application framework.
FAISS (faiss-cpu): Vector search library for similarity search.
LangChain: Framework for creating structured chains.
PyPDF2: Library for handling PDF documents.
Google Generative AI Embeddings: Pre-trained embeddings for document representation.
python-dotenv: Manage environment variables.

Install dependencies via requirements.txt:

streamlit
faiss-cpu
langchain
langchain_google_genai
langchain_community
PyPDF2
python-dotenv

License

This project is licensed under the MIT License.

Feel free to contribute and make this project better! 😊

Happy coding! 🌟

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
us_census		us_census
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document-Based Q&A Retrieval System

Table of Contents

Introduction

Features

Prerequisites

Setup and Installation

How to Run the Application

Project Workflow

Dependencies

License

About

Releases

Packages

Languages

License

Sayanjones/SmartDoc_RAG

Folders and files

Latest commit

History

Repository files navigation

Document-Based Q&A Retrieval System

Table of Contents

Introduction

Features

Prerequisites

Setup and Installation

How to Run the Application

Project Workflow

Dependencies

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages