This repository provides code for analyzing continuous glucose monitoring (CGM) data in food diary entries from the Shanghai T2D dataset (Zhao et al., 2023), with a focus on clinical stratification and dietary pattern insights.
The analyses obtained with the jupyter notebook main_analyses.ipynb
are described in the submission M5 (WP3) - EXplainable ML Model development, Sections 4 and 5.
The code integrates:
-
CGM response analysis: Aligns CGM data around food events and stratifies responses by biomarker values, caloric density, and meal type.
-
Statistical comparisons: Quantifies significance across time, both between and within groups (e.g., high vs. low BMI or caloric intake).
-
Data visualization: Generates interpretable plots of group differences and CGM trajectories.
-
Raw diary preprocessing: Uses Pydantic models and LLMs (OpenAI / Anthropic) to transform free-text meal entries into structured nutritional data.
-
Caching and parallelization: Supports efficient batch inference using local or cloud-based LLM APIs with built-in rate limiting.
-
LLM-powered parsing of real-world food diaries with validation and error flagging
-
Time-aligned CGM visualizations across meals and clinical strata
-
Significance heatmaps and p-value overlays to highlight meaningful effects
-
Local caching + structured outputs to streamline iterative workflows
-
Flexible backends for OpenAI, Anthropic, vLLM, and LiteLLM-compatible endpoints
Follow the steps below to run the project.
-
Create and activate a virtual environment:
cd path/to/your/project python -m venv venv source venv/bin/activate
-
Install dependencies:
pip install -r requirements.txt
-
Download the dataset Right now, the folder
./data
contains the fileprocessed_food_diary_entries.parquet
, which is the categorization of food events obtained with the LLM-pipeline. To run all analyses, you also need to store the Shanghai T2DM dataset there. Do so with this script:
python ./scripts/data_download.py
Run main_analyses.ipynb
to explore CGM responses and visualizations.
If you want to obtain new classification results from an Anthropic or OpenAI LLM,
- rename the file
.env.copy
to.env
in the root of the repository and add your API keys to the file.
OPENAI_API_KEY=your_openai_api_key
ANTHROPIC_API_KEY=your_anthropic_api_key
- run
LLM_pipeline.ipynb
.
For questions, please contact enrica.troiano -at- hk3lab.ai or tf -at- hk3lab.ai.