Our detection implementation is based on MMDetection v2.19.0 and PVT detection. Thank the authors for their wonderful works.
For details see MetaFormer is Actually What You Need for Vision.
Please note that we just simply follow the hyper-parameters of PVT which may not be the optimal ones for PoolFormer. Feel free to tune the hyper-parameters to get better performance.
@article{yu2021metaformer,
title={MetaFormer is Actually What You Need for Vision},
author={Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng},
journal={arXiv preprint arXiv:2111.11418},
year={2021}
}
Install MMDetection v2.19.0 from souce cocde,
or
pip install mmdet==2.19.0 --user
Apex (optional):
git clone https://github.com/NVIDIA/apex
cd apex
python setup.py install --cpp_ext --cuda_ext --user
If you would like to disable apex, modify the type of runner as EpochBasedRunner
and comment out the following code block in the configuration files:
fp16 = None
optimizer_config = dict(
type="DistOptimizerHook",
update_interval=1,
grad_clip=None,
coalesce=True,
bucket_size_mb=-1,
use_fp16=True,
)
Dockerfile_mmdetseg
is the docker file that I use to set up the environment for detection and segmentation. You can also refer to it.
Prepare COCO according to the guidelines in MMDetection v2.19.0.
Method | Backbone | Pretrain | Lr schd | Aug | box AP | mask AP | Config | Download |
---|---|---|---|---|---|---|---|---|
RetinaNet | PoolFormer-S12 | ImageNet-1K | 1x | No | 36.2 | - | config | log & model |
RetinaNet | PoolFormer-S24 | ImageNet-1K | 1x | No | 38.9 | - | config | log & model |
RetinaNet | PoolFormer-S36 | ImageNet-1K | 1x | No | 39.5 | - | config | log & model |
Mask R-CNN | PoolFormer-S12 | ImageNet-1K | 1x | No | 37.3 | 34.6 | config | log & model |
Mask R-CNN | PoolFormer-S24 | ImageNet-1K | 1x | No | 40.1 | 37.0 | config | log & model |
Mask R-CNN | PoolFormer-S36 | ImageNet-1K | 1x | No | 41.0 | 37.7 | config | log & model |
All the models can also be downloaded by BaiDu Yun (password: esac).
To evaluate PoolFormer-S12 + RetinaNet on COCO val2017 on a single node with 8 GPUs run:
FORK_LAST3=1 dist_test.sh configs/retinanet_poolformer_s12_fpn_1x_coco.py /path/to/checkpoint_file 8 --out results.pkl --eval bbox
To evaluate PoolFormer-S12 + Mask R-CNN on COCO val2017, run:
dist_test.sh configs/retinanet_poolformer_s12_fpn_1x_coco.py /path/to/checkpoint_file 8 --out results.pkl --eval bbox segm
To train PoolFormer-S12 + RetinaNet on COCO train2017 on a single node with 8 GPUs for 12 epochs run:
FORK_LAST3=1 dist_train.sh configs/retinanet_poolformer_s12_fpn_1x_coco.py 8
To train PoolFormer-S12 + Mask R-CNN on COCO train2017:
dist_train.sh configs/mask_rcnn_poolformer_s12_fpn_1x_coco.py 8