🧾 Agentic Invoice Parser –

This repository contains the capstone project for the Post Graduate Program in AI & Machine Learning – an Agentic Workflow-based Invoice Parser designed to extract structured data from complex Indian invoice PDFs.

🚀 Project Overview

In real-world enterprise settings, invoices often arrive as PDF files containing multiple invoices, with each invoice potentially spanning multiple pages and including tables, images, and varied layouts. This project leverages an agentic architecture to intelligently parse, segment, and extract relevant information from such documents into a structured schema.

Built using Pydantic-AI and Pydantic-Graph, the workflow uses Large Language Models (LLMs) to reason through document structure and extract key fields in a reliable, modular fashion.

Here is The workflow

📦 Features

🔍 Multi-invoice PDF support: Automatically detects and segments individual invoices from a single PDF.
📄 Multi-page invoice parsing: Handles invoices that span across several pages.
🧠 Agentic workflow: Implements modular agent steps using Pydantic-AI and Pydantic-Graph.
📑 Structured output: Extracted data is validated and output using a well-defined Pydantic schema.
📊 Table & key-value extraction: Supports varied layouts including tables, text blocks, and image-embedded sections.

🛠️ Tech Stack

Python
Pydantic / Pydantic-AI
Pydantic-Graph
LLMs (OpenAI/GPT)
pypdfium2 - (PDF to Images)
Pydantic-settings - (Config Management)
Project Management uv

📁 Folder Structure

🧪 How to Run

Install Dependencies

   uv sync --frozen --no-dev

Run the Parser

    uv run main.py --pdf path/to/your/invoices.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
.config		.config
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
Invoice_parser.drawio.svg		Invoice_parser.drawio.svg
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧾 Agentic Invoice Parser –

🚀 Project Overview

Here is The workflow

📦 Features

🛠️ Tech Stack

📁 Folder Structure

🧪 How to Run

About

Uh oh!

Releases

Packages

Uh oh!

Languages

RahulDas-dev/invoice-parser

Folders and files

Latest commit

History

Repository files navigation

🧾 Agentic Invoice Parser –

🚀 Project Overview

Here is The workflow

📦 Features

🛠️ Tech Stack

📁 Folder Structure

🧪 How to Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages