Context-Engineered AI for Secure Clinical-Biological Data
Analytics
PasteurAIze uses modular AI agents and the Model Context Protocol to turn natural-language questions into reproducible biomedical analyses.
PasteurAIze will transform biomedical data analysis at Institut Pasteur through “context-engineered large-language-model (LLM) agents. It will converts a plain-language query (e.g. “Does age affect lymphocyte counts?”) into executable code, runs the code on institutional resources and returns fully documented results.
PasteurAIze leverages the institutional LLM repository to carefully manage a full control over data governance and the model size for sobriety concerns. Its architecture features 3 core components:
- A text-to-SQL agent converting natural language to database queries.
- A visualization agent using Vega-lite MCP for graphic.
- A scientific literature agent employing Google Scholar MCP with Retrieval-Augmented Generation (RAG).
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
What things you need to install the software and how to install them.
- Have UV installed:
pip install uv
- Have an acess to LLM (base_url and api_key).
- Have a local database (ex: postgresql).
- Have some projects data well structured (to know how to structure your data go to How to structure my data).
- The file extensions handled for dataframes are :
.parquet
,.tsv
,.csv
. - All data for one project must be in the same folder (including yaml if provided). Set the
FOLDER_PATH
in your.env
with the path to this folder. - optional : yaml files describing your data (following the example.yaml schema in the pasteuraize folder).
A step by step series of examples that tell you how to get a development env running.
- Create the .venv and install dependencies with :
uv sync
- Install vanna with:
uv pip install vanna
-
Create a .env file and place in it each field from the .env.exemple file (fulfilled by yourself).
-
Add your data in the database and train Vanna by using the
auto_table.py
⚠️ You can add only one project per execution and folder. If you want to add another project; changeFOLDER_PATH
and use againauto_table.py
. -
Give the name of the project that you are inserting and wait.
-
Once done; start chatting with :
streamlit run pasteuraize.py
A series of rules to follow to make your data structured as wanted by the application.
-
Follow this usual project structure : one table/dataframe named
target
, onetaxonomy
and onecount_table
. -
Primary keys
must be the first columns. -
Foreign keys
must be named asYourName_FK
. -
Foreign keys
must have the same name as the primary keys from other tables (with the_FK
). -
optional : if you want to use an already existing yaml it should be named :
name_of_the_table.yaml
demo-PasteurAIze.mp4
- Pydantic-AI - Agent Framework
- Lite-LLM - LLM Access
- Streamlit - Web Framework
- Pandas - Dataframe Treatment
- Crawl4AI - Scraping & Crawling
- Pydantic-Logfire - Debugging Agent
- Scholarly - Searching papers on Google Scholar