Skip to content

C3BI-pasteur-fr/PasteurAIze

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project logo

PasteurAIze

Status


Context-Engineered AI for Secure Clinical-Biological Data Analytics

📝 Table of Contents

🧐 About

PasteurAIze uses modular AI agents and the Model Context Protocol to turn natural-language questions into reproducible biomedical analyses.

PasteurAIze will transform biomedical data analysis at Institut Pasteur through “context-engineered large-language-model (LLM) agents. It will converts a plain-language query (e.g. “Does age affect lymphocyte counts?”) into executable code, runs the code on institutional resources and returns fully documented results.

PasteurAIze leverages the institutional LLM repository to carefully manage a full control over data governance and the model size for sobriety concerns. Its architecture features 3 core components:

  1. A text-to-SQL agent converting natural language to database queries.
  2. A visualization agent using Vega-lite MCP for graphic.
  3. A scientific literature agent employing Google Scholar MCP with Retrieval-Augmented Generation (RAG).

🏁 Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

What things you need to install the software and how to install them.

  • Have UV installed:
pip install uv
  • Have an acess to LLM (base_url and api_key).
  • Have a local database (ex: postgresql).
  • Have some projects data well structured (to know how to structure your data go to How to structure my data).
  • The file extensions handled for dataframes are : .parquet,.tsv,.csv.
  • All data for one project must be in the same folder (including yaml if provided). Set the FOLDER_PATHin your .envwith the path to this folder.
  • optional : yaml files describing your data (following the example.yaml schema in the pasteuraize folder).

Installing

A step by step series of examples that tell you how to get a development env running.

  1. Create the .venv and install dependencies with :
uv sync
  1. Install vanna with:
uv pip install vanna
  1. Create a .env file and place in it each field from the .env.exemple file (fulfilled by yourself).

  2. Add your data in the database and train Vanna by using the auto_table.py ⚠️ You can add only one project per execution and folder. If you want to add another project; change FOLDER_PATH and use again auto_table.py.

  3. Give the name of the project that you are inserting and wait.

  4. Once done; start chatting with :

streamlit run pasteuraize.py

📊 How to structure my data

A series of rules to follow to make your data structured as wanted by the application.

  • Follow this usual project structure : one table/dataframe named target, one taxonomy and one count_table.

  • Primary keys must be the first columns.

  • Foreign keysmust be named as YourName_FK.

  • Foreign keysmust have the same name as the primary keys from other tables (with the _FK).

  • optional : if you want to use an already existing yaml it should be named : name_of_the_table.yaml

🎈 Usage

demo-PasteurAIze.mp4

⛏️ Built Using

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages