简体中文 | English
We optimized TSM model and proposed PP-TSM in this repo. Without increasing the number of parameters, the accuracy of TSM was significantly improved in UCF101 and Kinetics-400 datasets. Please refer to Tricks on PP-TSM for more details.
Version | Sampling method | Top1 |
---|---|---|
Ours (distill) | Dense | 76.16 |
Ours | Dense | 75.69 |
mmaction2 | Dense | 74.55 |
mit-han-lab | Dense | 74.1 |
Version | Sampling method | Top1 |
---|---|---|
Ours (distill) | Uniform | 75.11 |
Ours | Uniform | 74.54 |
mmaction2 | Uniform | 71.90 |
mit-han-lab | Uniform | 71.16 |
Please refer to Kinetics400 data download and preparation doc k400-data
Please refer to UCF101 data download and preparation doc ucf101-data
Please download ResNet50_vd_ssld_v2 as pretraind model:
wget https://videotag.bj.bcebos.com/PaddleVideo/PretrainModel/ResNet50_vd_ssld_v2_pretrained.pdparams
and add path to MODEL.framework.backbone.pretrained
in config file as:
MODEL:
framework: "Recognizer2D"
backbone:
name: "ResNetTweaksTSM"
pretrained: your weight path
- If use ResNet101 as backbone, please download ResNet101_vd_ssld_pretrained.pdparams as pretraind model.
- Train PP-TSM on kinetics-400 scripts:
python3.7 -B -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" --log_dir=log_pptsm main.py --validate -c configs/recognition/pptsm/pptsm_k400_frames_uniform.yaml
- Train PP-TSM on kinetics-400 video data using scripts:
python3.7 -B -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" --log_dir=log_pptsm main.py --validate -c configs/recognition/pptsm/pptsm_k400_videos_uniform.yaml
- AMP is useful for speeding up training:
export FLAGS_conv_workspace_size_limit=800 #MB
export FLAGS_cudnn_exhaustive_search=1
export FLAGS_cudnn_batchnorm_spatial_persistent=1
python3.7 -B -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" --log_dir=log_pptsm main.py --amp --validate -c configs/recognition/pptsm/pptsm_k400_frames_uniform.yaml
- Train PP-TSM on kinetics-400 with dense sampling:
python3.7 -B -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" --log_dir=log_pptsm main.py --validate -c configs/recognition/pptsm/pptsm_k400_frames_dense.yaml
- Train PP-TSM on kinetics-400 with ResNet101 as backbone using dense sampling:
python3.7 -B -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" --log_dir=log_pptsm main.py --validate -c configs/recognition/pptsm/pptsm_k400_frames_dense_r101.yaml
- For uniform sampling, test accuracy can be found in training-logs by search key word
best
, such as:
Already save the best model (top1 acc)0.7454
- For dense sampling, test accuracy can be obtained using scripts:
python3 main.py --test -c configs/recognition/pptsm/pptsm_k400_frames_dense.yaml -w output/ppTSM/ppTSM_best.pdparams
Accuracy on Kinetics400:
backbone | distill | Sampling method | num_seg | target_size | Top-1 | checkpoints |
---|---|---|---|---|---|---|
ResNet50 | False | Uniform | 8 | 224 | 74.54 | ppTSM_k400_uniform.pdparams |
ResNet50 | False | Dense | 8 | 224 | 75.69 | ppTSM_k400_dense.pdparams |
ResNet50 | True | Uniform | 8 | 224 | 75.11 | ppTSM_k400_uniform_distill.pdparams |
ResNet50 | True | Dense | 8 | 224 | 76.16 | ppTSM_k400_dense_distill.pdparams |
ResNet101 | True | Uniform | 8 | 224 | 76.35 | ppTSM_k400_uniform_distill_r101.pdparams |
ResNet101 | False | Dense | 8 | 224 | 77.15 | ppTSM_k400_dense_r101.pdparams |
To get model architecture file ppTSM.pdmodel
and parameters file ppTSM.pdiparams
, use:
python3.7 tools/export_model.py -c configs/recognition/pptsm/pptsm_k400_frames_uniform.yaml \
-p data/ppTSM_k400_uniform.pdparams \
-o inference/ppTSM
- Args usage please refer to Model Inference.
python3.7 tools/predict.py --input_file data/example.avi \
--config configs/recognition/pptsm/pptsm_k400_frames_uniform.yaml \
--model_file inference/ppTSM/ppTSM.pdmodel \
--params_file inference/ppTSM/ppTSM.pdiparams \
--use_gpu=True \
--use_tensorrt=False
example of logs:
Current video file: data/example.avi
top-1 class: 5
top-1 score: 0.9907386302947998
we can get the class name using class id and map file data/k400/Kinetics-400_label_list.txt
. The top1 prediction of data/example.avi
is archery
.
-
Note: For models that combine N and T during calculation (such as TSN, TSM), when
use_tensorrt=True
, you need to specify thebatch_size
argument as batch_size*num_seg*num_crop.python3.7 tools/predict.py --input_file data/example.avi \ --config configs/recognition/pptsm/pptsm_k400_frames_uniform.yaml \ --model_file inference/ppTSM/ppTSM.pdmodel \ --params_file inference/ppTSM/ppTSM.pdiparams \ --batch_size 8 \ --use_gpu=True \ --use_tensorrt=True
- TSM: Temporal Shift Module for Efficient Video Understanding, Ji Lin, Chuang Gan, Song Han
- Distilling the Knowledge in a Neural Network, Geoffrey Hinton, Oriol Vinyals, Jeff Dean