YOLOv12-FoodWeight

Multi-Task Real-Time Food Detection and Weight Estimation

Proposed Multi-task food detection and weight estimation model based YOLOv12 architecture.

Overview

This project builds upon the YOLOv12 architecture to perform multi-task learning:

Object Detection: Detect food items.
Weight Prediction: Predict the weight (in grams) of each detected food item.

We introduce an additional regression head to YOLOv12 to predict weights, enabling simultaneous localization and portion estimation from a single image.

Main Features

Multi-task Food object detection and weight (in grams) prediction.
Single unified model: Jointly trained for classification, localization, and regression tasks.
Evaluation metrics: Includes MAE (Mean Absolute Error) for weight estimation.

Dataset Format

Our model is trained and evaluated on a specialized food dataset with annotated bounding boxes and weight labels in grams, available on Hugging Face:

➡️ Download Food Portion Benchmark Dataset on Hugging Face

Each image has an associated .txt label file containing six columns:

class_id (integer): ID of the food class.
x_center (float): Normalized x center of bounding box (0 to 1).
y_center (float): Normalized y center of bounding box (0 to 1).
width (float): Normalized width of bounding box (0 to 1).
height (float): Normalized height of bounding box (0 to 1).
weight (float): Ground truth weight of the food item in grams.

This extended label format enables simultaneous object detection and weight regression.

Results

Training results comparing the different versions of the YOLOv8 and YOLOv12 models

Pretrained Weights

You can download the best-performing pretrained YOLOv12-M model weights here:

YOLOv12-FoodWeight Medium (best checkpoint)

Installation

conda create -n yolov12_foodweight python=3.11
conda activate yolov12_foodweight

# Install dependencies
pip install -r requirements.txt
pip install -e .

# (Optional) For FlashAttention support
wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu11torch2.2cxx11abiFALSE-cp311-cp311-linux_x86_64.whl
pip install flash_attn-2.7.3+cu11torch2.2cxx11abiFALSE-cp311-cp311-linux_x86_64.whl

Training

Training is handled through the train.py script. This script loads the modified YOLOv12 model configuration, prepares the dataset, and launches the training process.

You can train the model from scratch or fine-tune a pretrained YOLOv12 checkpoint.
The model is trained to perform both object detection and weight regression tasks simultaneously.
The training outputs include model checkpoints, loss curves, and metric evaluations over epochs.

Testing and Prediction

We provide few scripts to generate predictions:

calculate_weight_MAE.py: Runs inference, calculates regression MAE metric for weight prediction, and optionally save annotated images showing detection and predicted weights.
predict_txt.py: Runs inference and saves the predictions in a .txt format.
predict_csv.py: Runs inference and saves the predictions in a .csv format.
YOLOv8_version_code: Includes code for the YOLOv8 version of this project, as described in the paper.

Each prediction contains:

image_name, class_id, xmin, ymin, xmax, ymax, weight, confidence

Choose the format depending on your post-processing or evaluation needs.

Acknowledgment

This project is based on ultralytics/ultralytics and YOLOv12. We extend the original work with an additional regression head for food weight prediction.

Citation

Please cite our work if you use the Multi-task model. (Citation will be added after publication.)

@article{,
  title={A Multitask Deep Learning Model for Food Scene Recognition and Portion Estimation—the Food Portion Benchmark (FPB) Dataset}, 
  author={Sanatbyek, Aibota and Rakhimzhanova, Tomiris and Nurmanova, Bibinur and Omarova, Zhuldyz and Rakhmankulova, Aidana and Orazbayev, Rustem and Varol, Huseyin Atakan and Chan, Mei Yen},
  journal={IEEE Access}, 
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
YOLOv8_version_code		YOLOv8_version_code
assets		assets
docker		docker
examples		examples
logs		logs
tests		tests
ultralytics		ultralytics
LICENSE		LICENSE
README.md		README.md
calculate_weight_MAE.py		calculate_weight_MAE.py
data.yaml		data.yaml
mkdocs.yml		mkdocs.yml
predict_csv.py		predict_csv.py
predict_txt.py		predict_txt.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

YOLOv12-FoodWeight

Multi-Task Real-Time Food Detection and Weight Estimation

Overview

Main Features

Dataset Format

Results

Pretrained Weights

Installation

Training

Testing and Prediction

Acknowledgment

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

IS2AI/Multitask-Food-Portion-Estimation

Folders and files

Latest commit

History

Repository files navigation

YOLOv12-FoodWeight

Multi-Task Real-Time Food Detection and Weight Estimation

Overview

Main Features

Dataset Format

Results

Pretrained Weights

Installation

Training

Testing and Prediction

Acknowledgment

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages