A microservice and workflow for extracting invoice data via OCR/LLM, matching products against an internal catalog, and managing orders—including handling uncertain items for manual review.
# Install the UVicorn server
make install-uv
# Install Python dependencies
make dep
# Run the service and reinitialize the index each time
make run
# Run in development mode (does not reinitialize index)
make dev
# Manually initialize or reinitialize the index
make initdocker-compose up -d- n8n editor & workflow: http://localhost:5678
- invoice-agent API: http://invoice-agent:8000 (only accessible inside the Docker network)
To perform product matching against our internal catalog, index each product name (and its aliases) into embedding vectors with the following metadata:
{
"original_id": "<original_id_val>",
"original_display": "<display_text_val>",
"indexed_keyword": "<keyword_val>"
}Because the raw product_list.xlsx contains multiple aliases per ID (e.g. "生花生\\花生仁"), first preprocess it into this JSON-ready structure:
[
{
"id": "S021490",
"display_text": "炸薯(地瓜)片",
"keywords": ["炸薯(地瓜)片"]
},
{
"id": "S023200",
"display_text": "熟花生",
"keywords": ["熟花生"]
},
{
"id": "S023220",
"display_text": "生花生\\花生仁",
"keywords": ["生花生", "花生仁"]
}
]This enhances embedding richness and ensures that any alias query (e.g. “花生仁”) will hit the correct product.
Goal: Keep the RAG stack simple and iteration-friendly for fast indexing & retrieval.
-
RAGatouille – Not chosen: ColBERT’s raw score range makes uncertainty thresholds tricky.
-
txtai – Chosen: minimal API surface, straightforward indexing and search.
Quick test method:
Use eval_1.png as a baseline for extraction → matching pipeline validation.
-
Retrieval task: fuzzy matching of extracted product names.
-
Theory: Not every top-K result is a true match.
-
Strategy: compute
sim_gap = top1_score − top2_score; ifsim_gapis high → confidence. -
However:
-
ColBERT’s score formula
simi,j = Dⱼ · Qᵢyields a dynamic range−|Q| … |Q| -
Hard to set static thresholds.
-
Solutions considered:
- Normalize by query length
- Use relative gap =
(top1 − top2) / top1→ normalized to[0…1]
-
- Flow: image → EasyOCR → raw text → LLM parse → structured data
- Cost: Free, runs locally
extracted_texts = [
(65.0, "幅塔6兩", 0.0897),
(81.5, "#7", 0.1738),
(174.5, "?23付", 0.0079),
(189.5, "契枇并", 0.00007),
(258.0, "酯.把", 0.0110),
(289.0, "3絲", 0.00027),
(334.5, "嵯之?", 0.00123),
(348.5, "(-|!32&,,)5", 0.00084)
]-
Model:
google/gemini-flash-1.5-8b -
Cost:
- Input tokens: $0.038 / 1 K tokens
- Output tokens: $0.15 / 1 K tokens
extracted_texts = [
{'name':'九層塔','price':'6雨','quantity':'6','unit':'颗'},
{'name':'熟花生','price':'3斤','quantity':'1','unit':'斤'},
{'name':'腰果','price':'3件','quantity':'1','unit':'件'},
{'name':'海帶絲','price':'3斤','quantity':'1','unit':'斤'},
{'name':'醋','price':'1','quantity':'1','unit':'锅'},
{'name':'韭黃','price':'1','quantity':'1','unit':'包'},
{'name':'不明食材','price':'1','quantity':'1','unit':'包'}
]Decision: In this phase, the OCR module is a swappable component—using cloud-hosted LLMs now, with room to pivot later.
- score — raw similarity score from embedding search
- relative_sim_gap —
(highest_score − second_highest_score) / highest_score
An uncertainty metric: small gap → flag for manual review.
📁 invoice_agent/
├── tools/ # External integrations (Excel, OpenRouter, OCR, DB)
├── services/ # Core business logic (init, extract, match, order)
└── api/ # FastAPI routes & CLI entrypoint
- tools: low-level I/O, embedding index, DB schema
- services: orchestrates indexing, extraction, matching, order creation
- api: HTTP endpoints (FastAPI) & CLI (
typer)
This section outlines how to evaluate the invoice agent pipelines for each candidate solution. The test_ocr_llm.py pytest script:
-
Sets up a temporary environment, dummy product list, and initializes the service.
-
Runs
services.extract_texts_from_input(...)against sample files (eval_1.png,eval_2.png,eval_3.pdf). -
Compares extracted+matched results to ground truth (
tests/gt.json), computing:- Total ground-truth items
- Matched count
- Correctly matched count & accuracy
- Uncertain item count
-
Asserts overall accuracy > 0.0 to catch breaking changes.
-
Outputs a timestamped CSV in
tests/evaluation_reports/for deeper analysis.
Use this test harness to benchmark and compare future OCR/LLM or pure-OCR approaches before merging into
main.
For full code examples and tests, see the
./testsfolder and individual modules under./src/invoice_agent/.
This section describes the InvoiceAgent n8n workflow (n8n_workflow/InvoiceAgent.json), outlining the end-to-end process from form submission to Slack notifications:
-
On form submission (
formTrigger)- Presents a form with Name, File (image/PDF), and Date fields.
- Triggers the workflow when a user submits.
-
Check OCR Readability (
HTTP Request)- POSTs the uploaded file to
/check-ocr-readability. - Branches via If1: only proceeds if the image is deemed readable.
- POSTs the uploaded file to
-
Extract Order (
HTTP Request)- POSTs
customer_name,order_date, and the invoice file to/extract-order, kicking off the extraction and matching process.
- POSTs
-
Get Uncertain Items (
HTTP Request)- Queries
/uncertain-itemsto retrieve any items that the service flagged as uncertain.
- Queries
-
Decision (If node)
-
Routes based on the count of uncertain items:
- > 0 → handle uncertain items.
-
-
Read/Write Files from Disk
- Fetches the saved invoice file (in the service’s
.artifacts/uncertain_invoicesdirectory).
- Fetches the saved invoice file (in the service’s
-
Slack Upload Image (
Slackfile upload)- Uploads the uncertain invoice file to Slack and retrieves a permalink.
-
Slack Send Message (
Slackmessage)-
Posts to
#all-invoice-agentwith:- New Uncertain Invoices header.
- From/Date metadata.
- List of uncertain item details (ID, input, quantity, unit).
- Download link to the uploaded invoice image.
-