Skip to content

IS2AI/Multitask-Food-Portion-Estimation

Repository files navigation

YOLOv12-FoodWeight

Multi-Task Real-Time Food Detection and Weight Estimation


Proposed Multi-task food detection and weight estimation model based YOLOv12 architecture.


Overview

This project builds upon the YOLOv12 architecture to perform multi-task learning:

  • Object Detection: Detect food items.
  • Weight Prediction: Predict the weight (in grams) of each detected food item.

We introduce an additional regression head to YOLOv12 to predict weights, enabling simultaneous localization and portion estimation from a single image.

Main Features

  • Multi-task Food object detection and weight (in grams) prediction.
  • Single unified model: Jointly trained for classification, localization, and regression tasks.
  • Evaluation metrics: Includes MAE (Mean Absolute Error) for weight estimation.

Dataset Format

Our model is trained and evaluated on a specialized food dataset with annotated bounding boxes and weight labels in grams, available on Hugging Face:

➡️ Download Food Portion Benchmark Dataset on Hugging Face

Each image has an associated .txt label file containing six columns:

  • class_id (integer): ID of the food class.
  • x_center (float): Normalized x center of bounding box (0 to 1).
  • y_center (float): Normalized y center of bounding box (0 to 1).
  • width (float): Normalized width of bounding box (0 to 1).
  • height (float): Normalized height of bounding box (0 to 1).
  • weight (float): Ground truth weight of the food item in grams.

This extended label format enables simultaneous object detection and weight regression.

Results


Training results comparing the different versions of the YOLOv8 and YOLOv12 models

Pretrained Weights

You can download the best-performing pretrained YOLOv12-M model weights here:

Installation

conda create -n yolov12_foodweight python=3.11
conda activate yolov12_foodweight

# Install dependencies
pip install -r requirements.txt
pip install -e .

# (Optional) For FlashAttention support
wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu11torch2.2cxx11abiFALSE-cp311-cp311-linux_x86_64.whl
pip install flash_attn-2.7.3+cu11torch2.2cxx11abiFALSE-cp311-cp311-linux_x86_64.whl

Training

Training is handled through the train.py script. This script loads the modified YOLOv12 model configuration, prepares the dataset, and launches the training process.

  • You can train the model from scratch or fine-tune a pretrained YOLOv12 checkpoint.
  • The model is trained to perform both object detection and weight regression tasks simultaneously.
  • The training outputs include model checkpoints, loss curves, and metric evaluations over epochs.

Testing and Prediction

We provide few scripts to generate predictions:

  • calculate_weight_MAE.py: Runs inference, calculates regression MAE metric for weight prediction, and optionally save annotated images showing detection and predicted weights.
  • predict_txt.py: Runs inference and saves the predictions in a .txt format.
  • predict_csv.py: Runs inference and saves the predictions in a .csv format.
  • YOLOv8_version_code: Includes code for the YOLOv8 version of this project, as described in the paper.

Each prediction contains:

  • image_name, class_id, xmin, ymin, xmax, ymax, weight, confidence

Choose the format depending on your post-processing or evaluation needs.

Acknowledgment

This project is based on ultralytics/ultralytics and YOLOv12. We extend the original work with an additional regression head for food weight prediction.

Citation

Please cite our work if you use the Multi-task model. (Citation will be added after publication.)

@article{,
  title={A Multitask Deep Learning Model for Food Scene Recognition and Portion Estimation—the Food Portion Benchmark (FPB) Dataset}, 
  author={Sanatbyek, Aibota and Rakhimzhanova, Tomiris and Nurmanova, Bibinur and Omarova, Zhuldyz and Rakhmankulova, Aidana and Orazbayev, Rustem and Varol, Huseyin Atakan and Chan, Mei Yen},
  journal={IEEE Access}, 
  year={2025}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages