Skip to content

yash-meshram/rag-image-qa

Repository files navigation

RAG-IMAGE-QA

A Streamlit-based application for extracting structured information from bill/invoice images and enabling question-answering over the extracted data using Google Generative AI (Gemini) via LangChain.


Table of Contents


Project Overview

RAG-IMAGE-QA allows users to upload bill/invoice images, extract structured data from them using OCR and LLMs, and ask questions about the extracted data. It supports both default and user-uploaded images, storing results in JSON for efficient retrieval and QA.


Features

  • Image Upload & Parsing: Upload multiple bill images or use default samples.
  • OCR & LLM Extraction: Uses Tesseract OCR and Google Gemini (via LangChain) to extract and structure bill data.
  • Question Answering: Ask natural language questions about the extracted bill data.
  • Streamlit UI: Simple, interactive web interface.
  • JSON Storage: Extracted data is stored in JSON for easy access and further processing.
  • Robust Image Validation: Uses PIL to validate images before processing, ensuring only valid images are parsed.

Directory Structure

rag-image-qa/
│
├── app/
│   ├── main.py               # Streamlit app entry point
│   ├── billParser.py         # Image parsing and data extraction logic
│   ├── jsonStore.py          # JSON storage and QA logic
│   ├── models.py             # LLM model initialization
│   └── __pycache__/          # Python cache files
│
├── data/
│   ├── default/
│   │   ├── Images/           # Default bill images
│   │   └── images_data.json  # Stores structured data extracted from images in this folder
│   └── browse/
│       ├── Images/           # User-uploaded images
│       └── images_data.json
│
├── UI Specification.pdf      # UI details and functionalities
├── Application_Diagram.png   # Application architecture diagram
├── Sequence_Diagram.png      # Sequence flow diagram
├── requirements.txt          # Python dependencies
├── test.py                   # Script for testing extraction and parsing
├── Notes.txt                 # Project notes and ideas
└── README.md                 # Project documentation

Setup Instructions

1. Clone the Repository

git clone <repo-url>
cd rag-image-qa

2. Install Dependencies

Install all dependencies using the provided requirements.txt:

pip install -r requirements.txt

3. Environment Variables

Create a .env file and add Google Generative AI API key

GOOGLE_API_KEY=your_google_api_key_here

How It Works

  1. Image Selection:

    • Use default images or upload your own via the Streamlit UI.
  2. Text Extraction:

    • Images are processed with Tesseract OCR to extract raw text. Only valid images (checked using PIL) are processed.
  3. Data Structuring:

    • The extracted text is passed to a Google Gemini LLM (via LangChain) with a prompt to convert it into a structured JSON schema.
  4. Storage:

    • The structured data is saved in data/default/images_data.json or data/browse/images_data.json depending on the mode.
  5. Question Answering:

    • Users can ask questions about the bills (e.g., "What is the total bill amount?").
    • The LLM is prompted with the question and the JSON data, and returns an answer based only on the provided data.

Usage

Run the Streamlit App

streamlit run app/main.py

How to Use the UI

  1. Launch the App:

    • Open your browser to the local Streamlit URL (usually shown in the terminal, e.g., http://localhost:8501).
  2. Choose Image Source:

    • At the top, use the "Use default images" toggle:
      • On: The app will use the sample bill images provided in data/default/Images/.
      • Off: You can upload your own bill images for analysis.
  3. Uploading Images (if not using default):

    • Click the file uploader to select one or more .png, .jpg, or .jpeg images from your computer.
    • The app will automatically process and extract data from the uploaded images.
  4. Ask a Question:

    • Enter your question in the text input box (e.g., "What is the total bill amount?").
    • Click the Submit button.
  5. View the Answer:

    • The answer generated by the LLM will be displayed below the input box.
    • If the answer cannot be determined from the data, you will see: "No information in the provided Images."
  6. Reload Default Images (if using default):

    • Click the Reload Default images! button to reprocess the default images if needed.

Data Schema

Each bill is parsed into the following schema:

{
  "image_name": {
    "invoice_number": "string",
    "order_id": "string",
    "order_date": "string",
    "invoice_date": "string",
    "seller": "string",
    "buyer": "string",
    "billing_address": "string",
    "items": [
      {
        "name": "string",
        "quantity": "int",
        "unit_price": "float",
        "tax_percent": "float",
        "total_price": "float"
      }
    ],
    "subtotal": "float",
    "tax_percent": "float",
    "total": "float",
    "payment_method": "string"
  }
}

Customization

  • Use your own Images: You can upload multiple images from the UI. Note: Uploaded images via the UI are processed for the current session only and are not stored for reuse.
  • Add More Images: Place them in data/default/Images/ to make them available as default images.
  • Change LLM Model: Edit app/models.py to use a different model or parameters.
  • Extend Schema: Modify the schema in app/billParser.py and app/jsonStore.py as needed.

Notes

  • The app uses multi-threading for faster OCR on multiple images.
  • All price units are assumed to be in USD.
  • If a question cannot be answered from the data, the app will respond with "No information in the provided Images."
  • For development/testing, see test.py for standalone extraction and parsing examples.

Diagrams

Application Diagram

Application Diagram

Sequence Diagram

Sequence Diagram


License

MIT

About

RAG system for bill images

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages