Invoice Agent

A microservice and workflow for extracting invoice data via OCR/LLM, matching products against an internal catalog, and managing orders—including handling uncertain items for manual review.

Setup Instructions

invoice-agent standalone service setup

# Install the UVicorn server
make install-uv

# Install Python dependencies
make dep

# Run the service and reinitialize the index each time
make run

# Run in development mode (does not reinitialize index)
make dev

# Manually initialize or reinitialize the index
make init

n8n launching

docker-compose up -d

Services

n8n editor & workflow: http://localhost:5678
invoice-agent API: http://invoice-agent:8000 (only accessible inside the Docker network)

📚 Indexing Explanation

To perform product matching against our internal catalog, index each product name (and its aliases) into embedding vectors with the following metadata:

{
  "original_id": "<original_id_val>",
  "original_display": "<display_text_val>",
  "indexed_keyword": "<keyword_val>"
}

Because the raw product_list.xlsx contains multiple aliases per ID (e.g. "生花生\\花生仁"), first preprocess it into this JSON-ready structure:

[
  {
    "id": "S021490",
    "display_text": "炸薯(地瓜)片",
    "keywords": ["炸薯(地瓜)片"]
  },
  {
    "id": "S023200",
    "display_text": "熟花生",
    "keywords": ["熟花生"]
  },
  {
    "id": "S023220",
    "display_text": "生花生\\花生仁",
    "keywords": ["生花生", "花生仁"]
  }
]

This enhances embedding richness and ensures that any alias query (e.g. “花生仁”) will hit the correct product.

💡 Idea & Implementation Decisions

RAG libraries — thought experiments

Goal: Keep the RAG stack simple and iteration-friendly for fast indexing & retrieval.

RAGatouille – Not chosen: ColBERT’s raw score range makes uncertainty thresholds tricky.
txtai – Chosen: minimal API surface, straightforward indexing and search.

Quick test method: Use eval_1.png as a baseline for extraction → matching pipeline validation.

ColBERT (eliminated)

Retrieval task: fuzzy matching of extracted product names.
Theory: Not every top-K result is a true match.
Strategy: compute sim_gap = top1_score − top2_score; if sim_gap is high → confidence.
However:
- ColBERT’s score formula simi,j = Dⱼ · Qᵢ yields a dynamic range −|Q| … |Q|
- Hard to set static thresholds.
- Solutions considered:
  1. Normalize by query length
  2. Use relative gap = (top1 − top2) / top1 → normalized to [0…1]

🤖 Comparing OCR vs. LLM Approaches

EasyOCR (out‐of‐the‐box)

Flow: image → EasyOCR → raw text → LLM parse → structured data
Cost: Free, runs locally

extracted_texts = [
  (65.0,  "幅塔6兩",  0.0897),
  (81.5,  "#7",      0.1738),
  (174.5, "?23付",   0.0079),
  (189.5, "契枇并",   0.00007),
  (258.0, "酯.把",   0.0110),
  (289.0, "3絲",     0.00027),
  (334.5, "嵯之?",   0.00123),
  (348.5, "(-|!32&,,)5", 0.00084)
]

LLM Version

Model: google/gemini-flash-1.5-8b
Cost:
- Input tokens: $0.038 / 1 K tokens
- Output tokens: $0.15 / 1 K tokens
Endpoint: https://openrouter.ai/google/gemini-flash-1.5-8b

extracted_texts = [
  {'name':'九層塔','price':'6雨','quantity':'6','unit':'颗'},
  {'name':'熟花生','price':'3斤','quantity':'1','unit':'斤'},
  {'name':'腰果','price':'3件','quantity':'1','unit':'件'},
  {'name':'海帶絲','price':'3斤','quantity':'1','unit':'斤'},
  {'name':'醋','price':'1','quantity':'1','unit':'锅'},
  {'name':'韭黃','price':'1','quantity':'1','unit':'包'},
  {'name':'不明食材','price':'1','quantity':'1','unit':'包'}
]

Decision: In this phase, the OCR module is a swappable component—using cloud-hosted LLMs now, with room to pivot later.

🔍 Fuzzy Matching Logic

score — raw similarity score from embedding search
relative_sim_gap — (highest_score − second_highest_score) / highest_score

An uncertainty metric: small gap → flag for manual review.

📐 Project Design Pattern

📁 invoice_agent/
├── tools/      # External integrations (Excel, OpenRouter, OCR, DB)
├── services/   # Core business logic (init, extract, match, order)
└── api/        # FastAPI routes & CLI entrypoint

tools: low-level I/O, embedding index, DB schema
services: orchestrates indexing, extraction, matching, order creation
api: HTTP endpoints (FastAPI) & CLI (typer)

test_ocr_llm Evaluation

This section outlines how to evaluate the invoice agent pipelines for each candidate solution. The test_ocr_llm.py pytest script:

Sets up a temporary environment, dummy product list, and initializes the service.
Runs services.extract_texts_from_input(...) against sample files (eval_1.png, eval_2.png, eval_3.pdf).
Compares extracted+matched results to ground truth (tests/gt.json), computing:
- Total ground-truth items
- Matched count
- Correctly matched count & accuracy
- Uncertain item count
Asserts overall accuracy > 0.0 to catch breaking changes.
Outputs a timestamped CSV in tests/evaluation_reports/ for deeper analysis.

Use this test harness to benchmark and compare future OCR/LLM or pure-OCR approaches before merging into main.

For full code examples and tests, see the ./tests folder and individual modules under ./src/invoice_agent/.

n8n Workflow Explanation

This section describes the InvoiceAgent n8n workflow (n8n_workflow/InvoiceAgent.json), outlining the end-to-end process from form submission to Slack notifications:

On form submission (formTrigger)
- Presents a form with Name, File (image/PDF), and Date fields.
- Triggers the workflow when a user submits.
Check OCR Readability (HTTP Request)
- POSTs the uploaded file to /check-ocr-readability.
- Branches via If1: only proceeds if the image is deemed readable.
Extract Order (HTTP Request)
- POSTs customer_name, order_date, and the invoice file to /extract-order, kicking off the extraction and matching process.
Get Uncertain Items (HTTP Request)
- Queries /uncertain-items to retrieve any items that the service flagged as uncertain.
Decision (If node)
- Routes based on the count of uncertain items:
  - > 0 → handle uncertain items.
Read/Write Files from Disk
- Fetches the saved invoice file (in the service’s .artifacts/uncertain_invoices directory).
Slack Upload Image (Slack file upload)
- Uploads the uncertain invoice file to Slack and retrieves a permalink.
Slack Send Message (Slack message)
- Posts to #all-invoice-agent with:
  - New Uncertain Invoices header.
  - From/Date metadata.
  - List of uncertain item details (ID, input, quantity, unit).
  - Download link to the uploaded invoice image.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
n8n_workflow		n8n_workflow
src/invoice_agent		src/invoice_agent
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Invoice Agent

Setup Instructions

invoice-agent standalone service setup

n8n launching

Services

📚 Indexing Explanation

💡 Idea & Implementation Decisions

RAG libraries — thought experiments

ColBERT (eliminated)

🤖 Comparing OCR vs. LLM Approaches

EasyOCR (out‐of‐the‐box)

LLM Version

🔍 Fuzzy Matching Logic

📐 Project Design Pattern

test_ocr_llm Evaluation

n8n Workflow Explanation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

author31/invoice-agent

Folders and files

Latest commit

History

Repository files navigation

Invoice Agent

Setup Instructions

invoice-agent standalone service setup

n8n launching

Services

📚 Indexing Explanation

💡 Idea & Implementation Decisions

RAG libraries — thought experiments

ColBERT (eliminated)

🤖 Comparing OCR vs. LLM Approaches

EasyOCR (out‐of‐the‐box)

LLM Version

🔍 Fuzzy Matching Logic

📐 Project Design Pattern

test_ocr_llm Evaluation

n8n Workflow Explanation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages