Skip to content

Latest commit

 

History

History

siamese_rpn

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Siamrpn++: Evolution of Siamese Visual Tracking With Very Deep Networks

Abstract

Siamese network based trackers formulate tracking as convolutional feature cross-correlation between a target template and a search region. However, Siamese trackers still have an accuracy gap compared with state-of-the-art algorithms and they cannot take advantage of features from deep networks, such as ResNet-50 or deeper. In this work we prove the core reason comes from the lack of strict translation invariance. By comprehensive theoretical analysis and experimental validations, we break this restriction through a simple yet effective spatial aware sampling strategy and successfully train a ResNet-driven Siamese tracker with significant performance gain. Moreover, we propose a new model architecture to perform layer-wise and depth-wise aggregations, which not only further improves the accuracy but also reduces the model size. We conduct extensive ablation studies to demonstrate the effectiveness of the proposed tracker, which obtains currently the best results on five large tracking benchmarks, including OTB2015, VOT2018, UAV123, LaSOT, and TrackingNet.

Citation

@inproceedings{li2019siamrpn++,
  title={Siamrpn++: Evolution of siamese visual tracking with very deep networks},
  author={Li, Bo and Wu, Wei and Wang, Qiang and Zhang, Fangyi and Xing, Junliang and Yan, Junjie},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={4282--4291},
  year={2019}
}

Results and models

LaSOT

Note that the checkpoints from 10-th to 20-th epoch will be evaluated during training. You can find the best checkpoint from the log file.

We provide the best model with its configuration and training log.

Method Backbone Style Lr schd Mem (GB) Inf time (fps) Success Norm precision Precision Config Download
SiamRPN++ R-50 - 20e 7.54 50.0 50.4 59.6 49.7 config model | log
SiamRPN++
(FP16)
R-50 - 20e - - 50.4 59.6 49.2 config model | log

Note:

  • FP16 means Mixed Precision (FP16) is adopted in training.

UAV123

The checkpoints from 10-th to 20-th epoch will be evaluated during training. You can find the best checkpoint from the log file.

If you want to get better results, you can use the best checkpoint to search the hyperparameters on UAV123 following here. Experimentally, the hyperparameters search on UAV123 can bring around 1.0 Success gain.

The results below are achieved without hyperparameters search.

Method Backbone Style Lr schd Mem (GB) Inf time (fps) Success Norm Precision Precision Config Download
SiamRPN++ R-50 - 20e 7.54 - 60 77.3 80.3 config model | log

TrackingNet

The results of SiameseRPN++ in TrackingNet are reimplemented by ourselves. The best model on LaSOT is submitted to the evaluation server on TrackingNet Challenge. We provide the best model with its configuration and training log.

Method Backbone Style Lr schd Mem (GB) Inf time (fps) Success Norm precision Precision Config Download
SiamRPN++ R-50 - 20e 7.54 - 68.8 75.9 63.2 config model | log

OTB100

The checkpoints from 10-th to 20-th epoch will be evaluated during training. You can find the best checkpoint from the log file.

If you want to get better results, you can use the best checkpoint to search the hyperparameters on OTB100 following here. Experimentally, the hyperparameters search on OTB100 can bring around 1.0 Success gain.

Note: The results reported in the paper are 69.6 Success and 91.4 Precision. We train the SiameseRPN++ in the official pysot codebase and can not reproduce the same results. We only get 66.1 Success and 86.7 Precision by following the training and hyperparameters searching instructions of pysot, which are lower than those of the paper by 3.5 Succuess and 4.7 Precision respectively. Without hyperparameters search, we get 65.3 Success and 85.8 Precision. In our codebase, the results below are also achieved without hyperparameters search, close to the results reproduced in pysot in the same setting.

Method Backbone Style Lr schd Mem (GB) Inf time (fps) Success Norm Precision Precision Config Download
SiamRPN++ R-50 - 20e - - 64.9 82.4 86.3 config model | log

VOT2018

The checkpoints from 10-th to 20-th epoch will be evaluated during training. You can find the best checkpoint from the log file.

If you want to get better results, you can use the best checkpoint to search the hyperparameters on VOT2018 following here.

Note: The result reported in the paper is 0.414 EAO. We train the SiameseRPN++ in the official pysot codebase and can not reproduce the same result. We only get 0.364 EAO by following the training and hyperparameters searching instructions of pysot, which is lower than that of the paper by 0.05 EAO. Without hyperparameters search, we get 0.346 EAO. In our codebase, the results below are also achieved without hyperparameters search, close to the results reproduced in pysot in the same setting.

Method Backbone Style Lr schd Mem (GB) Inf time (fps) EAO Accuracy Robustness Config Download
SiamRPN++ R-50 - 20e - - 0.348 0.588 0.295 config model | log

Get started

1. Training

Due to the influence of parameters such as learning rate in default configuration file, we recommend using 8 GPUs for training in order to reproduce accuracy. You can use the following command to start the training.

# Training SiamRPN++ on ImageNetVID、ImageNetDET and coco dataset with following command
# The number after config file represents the number of GPUs used. Here we use 8 GPUs
./tools/dist_train.sh \
    configs/sot/siamese_fpn/siamese-rpn_r50_8xb28-20e_imagenetvid-imagenetdet-coco_test-lasot.py 8

The models tested on LaSOT, TrackingNet, UAV123 and VOT2018 have the same training settings. For OTB100, there are some unique training settings.

If you want to know about more detailed usage of train.py/dist_train.sh/slurm_train.sh, please refer to this document.

2. Testing and evaluation

2.1 Example on LaSOT, UAV123, OTB100 and VOT2018 datasets

# Example 1: Test on LaSOT testset
# The number after config file represents the number of GPUs used. Here we use 8 GPUs.
./tools/dist_test.sh \
    configs/sot/siamese_fpn/siamese-rpn_r50_8xb28-20e_imagenetvid-imagenetdet-coco_test-lasot.py 8 \
    --checkpoint ./checkpoints/siamese_rpn_r50_20e_lasot_20220420_181845-dd0f151e.pth

2.1 Example on TrackingNet dataset

If you want to get the results of the TrackingNet test set, please use the following command to generate result files that can be used for submission. It will be stored in ./results/siamese_rpn_trackingnet.zip, you can modify the saved path in test_evaluator of the config.

# Example 1: Test on TrackingNet testset
# We use the best checkpoint on LaSOT to test on the TrackingNet.
# The number after config file represents the number of GPUs used. Here we use 8 GPUs.
./tools/dist_test.sh \
    configs/sot/siamese_fpn/siamese-rpn_r50_8xb28-20e_imagenetvid-imagenetdet-coco_test-trackingnet.py 8 \
    --checkpoint ./checkpoints/siamese_rpn_r50_20e_lasot_20220420_181845-dd0f151e.pth

3.Inference

Use a single GPU to predict a video and save it as a video.

python demo/demo_sot.py \
    configs/sot/siamese_fpn/siamese-rpn_r50_8xb28-20e_imagenetvid-imagenetdet-coco_test-lasot.py \
    --checkpoint ./checkpoints/siamese_rpn_r50_20e_lasot_20220420_181845-dd0f151e.pth \
    --input demo/demo.mp4 \
    --output sot.mp4

If you want to know about more detailed usage of demo_sot.py, please refer to this document.