CGM Pattern Analysis for Food Diary Events

This repository provides code for analyzing continuous glucose monitoring (CGM) data in food diary entries from the Shanghai T2D dataset (Zhao et al., 2023), with a focus on clinical stratification and dietary pattern insights.

The analyses obtained with the jupyter notebook main_analyses.ipynb are described in the submission M5 (WP3) - EXplainable ML Model development, Sections 4 and 5.

Overview

The code integrates:

CGM response analysis: Aligns CGM data around food events and stratifies responses by biomarker values, caloric density, and meal type.
Statistical comparisons: Quantifies significance across time, both between and within groups (e.g., high vs. low BMI or caloric intake).
Data visualization: Generates interpretable plots of group differences and CGM trajectories.
Raw diary preprocessing: Uses Pydantic models and LLMs (OpenAI / Anthropic) to transform free-text meal entries into structured nutritional data.
Caching and parallelization: Supports efficient batch inference using local or cloud-based LLM APIs with built-in rate limiting.

Key Features

LLM-powered parsing of real-world food diaries with validation and error flagging
Time-aligned CGM visualizations across meals and clinical strata
Significance heatmaps and p-value overlays to highlight meaningful effects
Local caching + structured outputs to streamline iterative workflows
Flexible backends for OpenAI, Anthropic, vLLM, and LiteLLM-compatible endpoints

Installation

Follow the steps below to run the project.

Create and activate a virtual environment:

cd path/to/your/project
python -m venv venv
source venv/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```
Download the dataset Right now, the folder ./data contains the file processed_food_diary_entries.parquet, which is the categorization of food events obtained with the LLM-pipeline. To run all analyses, you also need to store the Shanghai T2DM dataset there. Do so with this script:

python ./scripts/data_download.py

How to use

Run main_analyses.ipynb to explore CGM responses and visualizations.

If you want to obtain new classification results from an Anthropic or OpenAI LLM,

rename the file .env.copy to .env in the root of the repository and add your API keys to the file.

OPENAI_API_KEY=your_openai_api_key
ANTHROPIC_API_KEY=your_anthropic_api_key

run LLM_pipeline.ipynb.

Contact Information

For questions, please contact enrica.troiano -at- hk3lab.ai or tf -at- hk3lab.ai.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CGM Pattern Analysis for Food Diary Events

Overview

Key Features

Installation

How to use

Contact Information

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
scripts		scripts
.env.copy		.env.copy
.gitignore		.gitignore
LLM_pipeline.ipynb		LLM_pipeline.ipynb
README.md		README.md
main_analyses.ipynb		main_analyses.ipynb
requirements.txt		requirements.txt

HK3-Lab-Team/PRAESIIDIUM_M5

Folders and files

Latest commit

History

Repository files navigation

CGM Pattern Analysis for Food Diary Events

Overview

Key Features

Installation

How to use

Contact Information

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages