Skip to content

Latest commit

 

History

History

detection

Applying PoolFormer to Object Detection

Our detection implementation is based on MMDetection v2.19.0 and PVT detection. Thank the authors for their wonderful works.

For details see MetaFormer is Actually What You Need for Vision.

Note

Please note that we just simply follow the hyper-parameters of PVT which may not be the optimal ones for PoolFormer. Feel free to tune the hyper-parameters to get better performance.

Bibtex

@article{yu2021metaformer,
  title={MetaFormer is Actually What You Need for Vision},
  author={Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng},
  journal={arXiv preprint arXiv:2111.11418},
  year={2021}
}

Usage

Install MMDetection v2.19.0 from souce cocde,

or

pip install mmdet==2.19.0 --user

Apex (optional):

git clone https://github.com/NVIDIA/apex
cd apex
python setup.py install --cpp_ext --cuda_ext --user

If you would like to disable apex, modify the type of runner as EpochBasedRunner and comment out the following code block in the configuration files:

fp16 = None
optimizer_config = dict(
    type="DistOptimizerHook",
    update_interval=1,
    grad_clip=None,
    coalesce=True,
    bucket_size_mb=-1,
    use_fp16=True,
)

Dockerfile_mmdetseg is the docker file that I use to set up the environment for detection and segmentation. You can also refer to it.

Data preparation

Prepare COCO according to the guidelines in MMDetection v2.19.0.

Results and models on COCO

Method Backbone Pretrain Lr schd Aug box AP mask AP Config Download
RetinaNet PoolFormer-S12 ImageNet-1K 1x No 36.2 - config log & model
RetinaNet PoolFormer-S24 ImageNet-1K 1x No 38.9 - config log & model
RetinaNet PoolFormer-S36 ImageNet-1K 1x No 39.5 - config log & model
Mask R-CNN PoolFormer-S12 ImageNet-1K 1x No 37.3 34.6 config log & model
Mask R-CNN PoolFormer-S24 ImageNet-1K 1x No 40.1 37.0 config log & model
Mask R-CNN PoolFormer-S36 ImageNet-1K 1x No 41.0 37.7 config log & model

All the models can also be downloaded by BaiDu Yun (password: esac).

Evaluation

To evaluate PoolFormer-S12 + RetinaNet on COCO val2017 on a single node with 8 GPUs run:

FORK_LAST3=1 dist_test.sh configs/retinanet_poolformer_s12_fpn_1x_coco.py /path/to/checkpoint_file 8 --out results.pkl --eval bbox

To evaluate PoolFormer-S12 + Mask R-CNN on COCO val2017, run:

dist_test.sh configs/retinanet_poolformer_s12_fpn_1x_coco.py /path/to/checkpoint_file 8 --out results.pkl --eval bbox segm

Training

To train PoolFormer-S12 + RetinaNet on COCO train2017 on a single node with 8 GPUs for 12 epochs run:

FORK_LAST3=1 dist_train.sh configs/retinanet_poolformer_s12_fpn_1x_coco.py 8

To train PoolFormer-S12 + Mask R-CNN on COCO train2017:

dist_train.sh configs/mask_rcnn_poolformer_s12_fpn_1x_coco.py 8