A Streamlit-based application for extracting structured information from bill/invoice images and enabling question-answering over the extracted data using Google Generative AI (Gemini) via LangChain.
- Project Overview
- Features
- Directory Structure
- Setup Instructions
- How It Works
- Usage
- Data Schema
- Customization
- Notes
- Diagrams
- License
RAG-IMAGE-QA allows users to upload bill/invoice images, extract structured data from them using OCR and LLMs, and ask questions about the extracted data. It supports both default and user-uploaded images, storing results in JSON for efficient retrieval and QA.
- Image Upload & Parsing: Upload multiple bill images or use default samples.
- OCR & LLM Extraction: Uses Tesseract OCR and Google Gemini (via LangChain) to extract and structure bill data.
- Question Answering: Ask natural language questions about the extracted bill data.
- Streamlit UI: Simple, interactive web interface.
- JSON Storage: Extracted data is stored in JSON for easy access and further processing.
- Robust Image Validation: Uses PIL to validate images before processing, ensuring only valid images are parsed.
rag-image-qa/
│
├── app/
│ ├── main.py # Streamlit app entry point
│ ├── billParser.py # Image parsing and data extraction logic
│ ├── jsonStore.py # JSON storage and QA logic
│ ├── models.py # LLM model initialization
│ └── __pycache__/ # Python cache files
│
├── data/
│ ├── default/
│ │ ├── Images/ # Default bill images
│ │ └── images_data.json # Stores structured data extracted from images in this folder
│ └── browse/
│ ├── Images/ # User-uploaded images
│ └── images_data.json
│
├── UI Specification.pdf # UI details and functionalities
├── Application_Diagram.png # Application architecture diagram
├── Sequence_Diagram.png # Sequence flow diagram
├── requirements.txt # Python dependencies
├── test.py # Script for testing extraction and parsing
├── Notes.txt # Project notes and ideas
└── README.md # Project documentation
git clone <repo-url>
cd rag-image-qa
Install all dependencies using the provided requirements.txt
:
pip install -r requirements.txt
Create a .env
file and add Google Generative AI API key
GOOGLE_API_KEY=your_google_api_key_here
-
Image Selection:
- Use default images or upload your own via the Streamlit UI.
-
Text Extraction:
- Images are processed with Tesseract OCR to extract raw text. Only valid images (checked using PIL) are processed.
-
Data Structuring:
- The extracted text is passed to a Google Gemini LLM (via LangChain) with a prompt to convert it into a structured JSON schema.
-
Storage:
- The structured data is saved in
data/default/images_data.json
ordata/browse/images_data.json
depending on the mode.
- The structured data is saved in
-
Question Answering:
- Users can ask questions about the bills (e.g., "What is the total bill amount?").
- The LLM is prompted with the question and the JSON data, and returns an answer based only on the provided data.
streamlit run app/main.py
-
Launch the App:
- Open your browser to the local Streamlit URL (usually shown in the terminal, e.g., http://localhost:8501).
-
Choose Image Source:
- At the top, use the "Use default images" toggle:
- On: The app will use the sample bill images provided in
data/default/Images/
. - Off: You can upload your own bill images for analysis.
- On: The app will use the sample bill images provided in
- At the top, use the "Use default images" toggle:
-
Uploading Images (if not using default):
- Click the file uploader to select one or more
.png
,.jpg
, or.jpeg
images from your computer. - The app will automatically process and extract data from the uploaded images.
- Click the file uploader to select one or more
-
Ask a Question:
- Enter your question in the text input box (e.g., "What is the total bill amount?").
- Click the Submit button.
-
View the Answer:
- The answer generated by the LLM will be displayed below the input box.
- If the answer cannot be determined from the data, you will see: "No information in the provided Images."
-
Reload Default Images (if using default):
- Click the Reload Default images! button to reprocess the default images if needed.
Each bill is parsed into the following schema:
{
"image_name": {
"invoice_number": "string",
"order_id": "string",
"order_date": "string",
"invoice_date": "string",
"seller": "string",
"buyer": "string",
"billing_address": "string",
"items": [
{
"name": "string",
"quantity": "int",
"unit_price": "float",
"tax_percent": "float",
"total_price": "float"
}
],
"subtotal": "float",
"tax_percent": "float",
"total": "float",
"payment_method": "string"
}
}
- Use your own Images: You can upload multiple images from the UI. Note: Uploaded images via the UI are processed for the current session only and are not stored for reuse.
- Add More Images: Place them in
data/default/Images/
to make them available as default images. - Change LLM Model: Edit
app/models.py
to use a different model or parameters. - Extend Schema: Modify the schema in
app/billParser.py
andapp/jsonStore.py
as needed.
- The app uses multi-threading for faster OCR on multiple images.
- All price units are assumed to be in USD.
- If a question cannot be answered from the data, the app will respond with "No information in the provided Images."
- For development/testing, see
test.py
for standalone extraction and parsing examples.