LLM Extractinator

⚠️ This tool is a prototype in active development and may change significantly. Always verify results!

LLM Extractinator enables efficient extraction of structured data from unstructured text using large language models (LLMs). It supports configurable task definitions, CLI or Python usage, a point‑and‑click GUI Studio, and flexible data input/output formats.

📘 Full documentation: https://DIAGNijmegen.github.io/llm_extractinator/

🔧 Installation

1. Install Ollama

On Linux

curl -fsSL https://ollama.com/install.sh | sh

On Windows or macOS

Download the installer from: https://ollama.com/download

2. Install the Package

Create a fresh conda environment:

conda create -n llm_extractinator python=3.11
conda activate llm_extractinator

Install the package via pip:

pip install llm_extractinator

Or from source:

git clone https://github.com/DIAGNijmegen/llm_extractinator.git
cd llm_extractinator
pip install -e .

Tip: to be able to run the latest models, update the Ollama client regularly:
pip install --upgrade ollama langchain-ollama

🖥️ Interactive Studio GUI

Starting with v0.5, Extractinator ships with a Streamlit‑based Studio for designing, running and monitoring extraction tasks with zero code:

🚀 To run:

launch-extractinator  # opens http://localhost:8501 in your browser

Features


🗂️ Project Manager	Create / select datasets, parsers and tasks with file previews
🔧 Parser Builder	Visual Pydantic schema designer (nested models supported)
🚀 One‑click Runs	Configure model, sampling & advanced flags, then watch live logs
🛠️ Task JSON Wizard	Step‑by‑step helper to generate valid `TaskXXX.json` files
🆘 Help bubbles everywhere	Inline docs so you never lose context

The Studio is fully optional: anything you configure here can still be executed from the CLI or Python API.

🚀 Quick Usage

GUI

launch-extractinator  # recommended for new users

CLI

extractinate --task_id 001 --model_name "phi4"

Python

from llm_extractinator import extractinate

extractinate(task_id=1, model_name="phi4")

📁 Task Files

Each task is defined by a JSON file stored in tasks/.

Filename format:

TaskXXX_name.json

Example:

{
  "Description": "Extract product data from text.",
  "Data_Path": "products.csv",
  "Input_Field": "text",
  "Parser_Format": "product_parser.py"
}

Parser_Format points to a .py file in tasks/parsers/ that implements a Pydantic OutputParser model used to structure the LLM output.

🛠️ Visual Schema Builder (optional)

If you prefer a graphical approach to designing parsers, run:

build-parser

This starts the same builder embedded in the Studio, letting you assemble nested Pydantic models visually. Save the resulting .py file in tasks/parsers/ and reference it via Parser_Format.

Name		Name	Last commit message	Last commit date
Latest commit History 314 Commits
.github/workflows		.github/workflows
data		data
docs		docs
llm_extractinator		llm_extractinator
tasks		tasks
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
build.sh		build.sh
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM Extractinator

🔧 Installation

1. Install Ollama

On Linux

On Windows or macOS

2. Install the Package

🖥️ Interactive Studio GUI

🚀 Quick Usage

GUI

CLI

Python

📁 Task Files

🛠️ Visual Schema Builder (optional)

📄 Citation

🤝 Contributing

About

Uh oh!

Releases 9

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

DIAGNijmegen/llm_extractinator

Folders and files

Latest commit

History

Repository files navigation

LLM Extractinator

🔧 Installation

1. Install Ollama

On Linux

On Windows or macOS

2. Install the Package

🖥️ Interactive Studio GUI

🚀 Quick Usage

GUI

CLI

Python

📁 Task Files

🛠️ Visual Schema Builder (optional)

📄 Citation

🤝 Contributing

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages