In this work, we propose an efficient Frame-Action Cross-attention Temporal modeling (FACT) framework that (i) performs temporal modeling on frame and action levels in parallel and (ii) leverage this parallelism to achieve iterative bidirectional information transfer between action/frame features and refine them.
We achieve SOTA on four datasets while enjoy lower computational cost.
pip3 install -r requirements.txt
mkdir FACT_actseg
cd FACT_actseg
git clone https://github.com/ZijiaLewisLu/CVPR2024-FACT.git
mv CVPR2024-FACT src
mkdir data
- download Breakfast and GTEA data from link1 or link2, and place them in
FACT_actseg/data
. - download EgoProcel and Epic-Kitchens data from here, and place them in
FACT_actseg/data
. - Features for Epic-Kitchens can be downloaded via this script and extracted with utils/extract_epic_kitchen.py.
- After this,
FACT_actseg/data
should contain four folders, one for each dataset.
The training is configured using YAML, and all the configurations are listed in configs. You can use the following commands to run the experiments.
cd FACT_actseg
# breakfast
python3 -m src.train --cfg src/configs/breakfast.yaml --set aux.gpu 0 split "split1"
# gtea
python3 -m src.train --cfg src/configs/gtea.yaml --set aux.gpu 0 split "split1"
# egoprocel
python3 -m src.train --cfg src/configs/egoprocel.yaml --set aux.gpu 0 split "split1"
# epic-kitchens
python3 -m src.train --cfg src/configs/epic-kitchens.yaml --set aux.gpu 0 split "split1"
By default, log will be saved to FACT_actseg/log/<experiment-path>
. Evaluation results are saved as Checkpoint
objects defined utils/evaluate.py. Loss and metrics are also visualized with wandb.
Pre-trained model weights can be downloaded from here. You can place the files under FACT_actseg/ckpts
and test the models with the following command.
python3 -m src.eval
We lost the original data and model weights in a disk failure. These models are replicated afterward, thus the performance slightly varies from those in the papers.
- Breakfast models
- GTEA models
- EgoProceL models
- Epic-Kitchens models
@inproceedings{
lu2024fact,
title={{FACT}: Frame-Action Cross-Attention Temporal Modeling for Efficient Supervised Action Segmentation},
author={Zijia Lu and Ehsan Elhamifar},
booktitle={Conference on Computer Vision and Pattern Recognition 2024},
year={2024},
}