RAG-IMAGE-QA

A Streamlit-based application for extracting structured information from bill/invoice images and enabling question-answering over the extracted data using Google Generative AI (Gemini) via LangChain.

Project Overview

RAG-IMAGE-QA allows users to upload bill/invoice images, extract structured data from them using OCR and LLMs, and ask questions about the extracted data. It supports both default and user-uploaded images, storing results in JSON for efficient retrieval and QA.

Features

Image Upload & Parsing: Upload multiple bill images or use default samples.
OCR & LLM Extraction: Uses Tesseract OCR and Google Gemini (via LangChain) to extract and structure bill data.
Question Answering: Ask natural language questions about the extracted bill data.
Streamlit UI: Simple, interactive web interface.
JSON Storage: Extracted data is stored in JSON for easy access and further processing.
Robust Image Validation: Uses PIL to validate images before processing, ensuring only valid images are parsed.

Directory Structure

rag-image-qa/
│
├── app/
│   ├── main.py               # Streamlit app entry point
│   ├── billParser.py         # Image parsing and data extraction logic
│   ├── jsonStore.py          # JSON storage and QA logic
│   ├── models.py             # LLM model initialization
│   └── __pycache__/          # Python cache files
│
├── data/
│   ├── default/
│   │   ├── Images/           # Default bill images
│   │   └── images_data.json  # Stores structured data extracted from images in this folder
│   └── browse/
│       ├── Images/           # User-uploaded images
│       └── images_data.json
│
├── UI Specification.pdf      # UI details and functionalities
├── Application_Diagram.png   # Application architecture diagram
├── Sequence_Diagram.png      # Sequence flow diagram
├── requirements.txt          # Python dependencies
├── test.py                   # Script for testing extraction and parsing
├── Notes.txt                 # Project notes and ideas
└── README.md                 # Project documentation

Setup Instructions

1. Clone the Repository

git clone <repo-url>
cd rag-image-qa

2. Install Dependencies

Install all dependencies using the provided requirements.txt:

pip install -r requirements.txt

3. Environment Variables

Create a .env file and add Google Generative AI API key

GOOGLE_API_KEY=your_google_api_key_here

How It Works

Image Selection:
- Use default images or upload your own via the Streamlit UI.
Text Extraction:
- Images are processed with Tesseract OCR to extract raw text. Only valid images (checked using PIL) are processed.
Data Structuring:
- The extracted text is passed to a Google Gemini LLM (via LangChain) with a prompt to convert it into a structured JSON schema.
Storage:
- The structured data is saved in data/default/images_data.json or data/browse/images_data.json depending on the mode.
Question Answering:
- Users can ask questions about the bills (e.g., "What is the total bill amount?").
- The LLM is prompted with the question and the JSON data, and returns an answer based only on the provided data.

Usage

Run the Streamlit App

streamlit run app/main.py

How to Use the UI

Launch the App:
- Open your browser to the local Streamlit URL (usually shown in the terminal, e.g., http://localhost:8501).
Choose Image Source:
- At the top, use the "Use default images" toggle:
  - On: The app will use the sample bill images provided in data/default/Images/.
  - Off: You can upload your own bill images for analysis.
Uploading Images (if not using default):
- Click the file uploader to select one or more .png, .jpg, or .jpeg images from your computer.
- The app will automatically process and extract data from the uploaded images.
Ask a Question:
- Enter your question in the text input box (e.g., "What is the total bill amount?").
- Click the Submit button.
View the Answer:
- The answer generated by the LLM will be displayed below the input box.
- If the answer cannot be determined from the data, you will see: "No information in the provided Images."
Reload Default Images (if using default):
- Click the Reload Default images! button to reprocess the default images if needed.

Data Schema

Each bill is parsed into the following schema:

{
  "image_name": {
    "invoice_number": "string",
    "order_id": "string",
    "order_date": "string",
    "invoice_date": "string",
    "seller": "string",
    "buyer": "string",
    "billing_address": "string",
    "items": [
      {
        "name": "string",
        "quantity": "int",
        "unit_price": "float",
        "tax_percent": "float",
        "total_price": "float"
      }
    ],
    "subtotal": "float",
    "tax_percent": "float",
    "total": "float",
    "payment_method": "string"
  }
}

Customization

Use your own Images: You can upload multiple images from the UI. Note: Uploaded images via the UI are processed for the current session only and are not stored for reuse.
Add More Images: Place them in data/default/Images/ to make them available as default images.
Change LLM Model: Edit app/models.py to use a different model or parameters.
Extend Schema: Modify the schema in app/billParser.py and app/jsonStore.py as needed.

Notes

The app uses multi-threading for faster OCR on multiple images.
All price units are assumed to be in USD.
If a question cannot be answered from the data, the app will respond with "No information in the provided Images."
For development/testing, see test.py for standalone extraction and parsing examples.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAG-IMAGE-QA

Table of Contents

Project Overview

Features

Directory Structure

Setup Instructions

1. Clone the Repository

2. Install Dependencies

3. Environment Variables

How It Works

Usage

Run the Streamlit App

How to Use the UI

Data Schema

Customization

Notes

Diagrams

Application Diagram

Sequence Diagram

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.devcontainer		.devcontainer
app		app
data		data
Application_Diagram.png		Application_Diagram.png
LICENSE		LICENSE
Notes.txt		Notes.txt
README.md		README.md
Sequence_Diagram.png		Sequence_Diagram.png
UI Specification.pdf		UI Specification.pdf
requirements.txt		requirements.txt
test.py		test.py

License

yash-meshram/rag-image-qa

Folders and files

Latest commit

History

Repository files navigation

RAG-IMAGE-QA

Table of Contents

Project Overview

Features

Directory Structure

Setup Instructions

1. Clone the Repository

2. Install Dependencies

3. Environment Variables

How It Works

Usage

Run the Streamlit App

How to Use the UI

Data Schema

Customization

Notes

Diagrams

Application Diagram

Sequence Diagram

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages