A PyTorch implementation of our paper Video summarization with u-shaped transformer. Published in Applied Intelligence.
This project is developed on Ubuntu 16.04 with CUDA 9.0.176.
git clone https://github.com/semchan/Uformer.git
Install python dependencies.
pip install -r requirements.txt
Download the pre-processed datasets into datasets/
folder, including TVSum, SumMe, OVP, and YouTube datasets.
- (Baidu Cloud) Link: https://pan.baidu.com/s/1a0iH8NWmwtdYKQvzWS62WQ Extraction Code: 1234
Now the datasets structure should look like
UFormer
└── datasets/
├── eccv16_dataset_ovp_google_pool5.h5
├── eccv16_dataset_summe_google_pool5.h5
├── eccv16_dataset_tvsum_google_pool5.h5
├── eccv16_dataset_youtube_google_pool5.h5
└── readme.txt
└── eval_models/
├── ab_ovp_youtube
├── anchor_based
├──ab_basic
├──augmented
├──canonical
└──transfer
└── anchor_free
└──canonical
To evaluate your anchor-based models, run
sh evaluate.sh
To train anchor-based attention model on TVSum and SumMe datasets with canonical settings, run
python train.py --model anchor-based --model-dir ./models/ab_basic --splits ./splits/tvsum.yml ./splits/summe.yml
We gratefully thank the below open-source repo, which greatly boost our research.
- Thank Part of the code is referenced from: DSNet. Thanks for their great work before.
- Thank KTS for the effective shot generation algorithm.
- Thank DR-DSN for the pre-processed public datasets.
- Thank VASNet for the training and evaluation pipeline.
If you find our codes or paper helpful, please consider citing.
@article{chen2022video,
title={Video summarization with u-shaped transformer},
author={Chen, Yaosen and Guo, Bing and Shen, Yan and Zhou, Renshuang and Lu, Weichen and Wang, Wei and Wen, Xuming and Suo, Xinhua},
journal={Applied Intelligence},
pages={1--17},
year={2022},
publisher={Springer}
}