GitHub - C3BI-pasteur-fr/PasteurAIze

PasteurAIze

Context-Engineered AI for Secure Clinical-Biological Data Analytics

📝 Table of Contents

About
Getting Started
How to structure my data
Usage
Built Using
Authors

🧐 About

PasteurAIze uses modular AI agents and the Model Context Protocol to turn natural-language questions into reproducible biomedical analyses.

PasteurAIze will transform biomedical data analysis at Institut Pasteur through “context-engineered large-language-model (LLM) agents. It will converts a plain-language query (e.g. “Does age affect lymphocyte counts?”) into executable code, runs the code on institutional resources and returns fully documented results.

PasteurAIze leverages the institutional LLM repository to carefully manage a full control over data governance and the model size for sobriety concerns. Its architecture features 3 core components:

A text-to-SQL agent converting natural language to database queries.
A visualization agent using Vega-lite MCP for graphic.
A scientific literature agent employing Google Scholar MCP with Retrieval-Augmented Generation (RAG).

🏁 Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

What things you need to install the software and how to install them.

Have UV installed:

pip install uv

Have an acess to LLM (base_url and api_key).
Have a local database (ex: postgresql).
Have some projects data well structured (to know how to structure your data go to How to structure my data).
The file extensions handled for dataframes are : .parquet,.tsv,.csv.
All data for one project must be in the same folder (including yaml if provided). Set the FOLDER_PATHin your .envwith the path to this folder.
optional : yaml files describing your data (following the example.yaml schema in the pasteuraize folder).

Installing

A step by step series of examples that tell you how to get a development env running.

Create the .venv and install dependencies with :

uv sync

Install vanna with:

uv pip install vanna

Create a .env file and place in it each field from the .env.exemple file (fulfilled by yourself).
Add your data in the database and train Vanna by using the auto_table.py ⚠️ You can add only one project per execution and folder. If you want to add another project; change FOLDER_PATH and use again auto_table.py.
Give the name of the project that you are inserting and wait.
Once done; start chatting with :

streamlit run pasteuraize.py

📊 How to structure my data

A series of rules to follow to make your data structured as wanted by the application.

Follow this usual project structure : one table/dataframe named target, one taxonomy and one count_table.
Primary keys must be the first columns.
Foreign keysmust be named as YourName_FK.
Foreign keysmust have the same name as the primary keys from other tables (with the _FK).
optional : if you want to use an already existing yaml it should be named : name_of_the_table.yaml

🎈 Usage

demo-PasteurAIze.mp4

⛏️ Built Using

Pydantic-AI - Agent Framework
Lite-LLM - LLM Access
Streamlit - Web Framework
Pandas - Dataframe Treatment
Crawl4AI - Scraping & Crawling
Pydantic-Logfire - Debugging Agent
Scholarly - Searching papers on Google Scholar

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
pasteuraize		pasteuraize
.env.exemple		.env.exemple
.gitignore		.gitignore
README.md		README.md
logo.png		logo.png
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PasteurAIze

📝 Table of Contents

🧐 About

🏁 Getting Started

Prerequisites

Installing

📊 How to structure my data

🎈 Usage

⛏️ Built Using

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

C3BI-pasteur-fr/PasteurAIze

Folders and files

Latest commit

History

Repository files navigation

PasteurAIze

📝 Table of Contents

🧐 About

🏁 Getting Started

Prerequisites

Installing

📊 How to structure my data

🎈 Usage

⛏️ Built Using

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages