Skip to content

[CVPR 2023] Official implementation of "SAP-DETR: Bridging the Gap between Salient Points and Queries-Based Transformer Detector for Fast Model Convergency "

License

Notifications You must be signed in to change notification settings

liuyang-ict/SAP-DETR

Repository files navigation

SAP-DETR

This is the official pytorch implementation of our CVPR 2023 paper SAP-DETR.

Authors: Yang Liu, Yao Zhang, Yixin Wang, Yang Zhang, Jiang Tian, Zhongchao Shi, Jianping Fan, Zhiqiang He

Update

[2023/3/17] Other variant models will go public soon.

[2023/2/28] Our work has been accepted by CVPR 2023.

Introduction

curve Abstract: Recently, the dominant DETR-based approaches apply central-concept spatial prior to accelerate Transformer detector convergency. These methods gradually refine the reference points to the center of target objects and imbue object queries with the updated central reference information for spatially conditional attention. However, centralizing reference points may severely deteriorate queries' saliency and confuse detectors due to the indiscriminative spatial prior. To bridge the gap between the reference points of salient queries and Transformer detectors, we propose SAlient Point-based DETR (SAP-DETR) by treating object detection as a transformation from salient points to instance objects. In SAP-DETR, we explicitly initialize a query-specific reference point for each object query, gradually aggregate them into an instance object, and then predict the distance from each side of the bounding box to these points. By rapidly attending to query-specific reference region and other conditional extreme regions from the image features, SAP-DETR can effectively bridge the gap between the salient point and the query-based Transformer detector with a significant convergency speed. Our extensive experiments have demonstrated that SAP-DETR achieves 1.4 times convergency speed with competitive performance. Under the standard training scheme, SAP-DETR stably promotes the SOTA approaches by 1.0 AP. Based on ResNet-DC-101, SAP-DETR achieves 46.9 AP.

sap

Main Results

We provide our models with R50 and R101 backbone for SAP-DETR.

The pre-trained model of SAP-DINO-DETR and SAP-Group-DETR is expected to go public soon.

Method Epochs Params (M) FLOPs (G) AP APS APM APL URL
SAP-DETR-R50 50 47 92 43.1 22.9 47.1 62.1 Baidu Netdisk
 Google Drive
SAP-DETR-DC5-R50 50 47 197 46.0 26.4 50.2 62.6 Baidu Netdisk
 Google Drive
SAP-DETR-R101 50 67 158 44.4 24.1 48.7 63.1 Baidu Netdisk
 Google Drive
SAP-DETR-DC5-R101 50 67 266 46.9 27.9 51.3 64.3 Baidu Netdisk
 Google Drive

Notes:

  • "DC5" means removing the stride in C5 stage of ResNet and add a dilation of 2 instead.
  • The results are sensitive to the batch size. We use 16 (4 images each GPU x 4 A100 GPUs) for ResNet-C5 and 8 (1 image each GPU x 8 A100 GPUs) for ResNet-C5.
  • The Params and FLOPs are measured by Detectron2 on NVIDIA A100 GPU.

Installation

We use the great DETR project as our codebase, hence no extra dependency is needed for our SAP-DETR.

We test our models under python=3.8,pytorch=1.9.0,cuda=11.4. Other versions might be available as well.

  1. Clone this repo
git clone https://https://github.com/liuyang-ict/SAP-DETR.git
cd SAP-DETR
  1. Install Pytorch and torchvision

Follow the instrction on https://pytorch.org/get-started/locally/.

# an example:
conda install -c pytorch pytorch torchvision
  1. Install pycocotools

Follow the instrction on https://github.com/cocodataset/cocoapi.

# an example:
conda install -c conda-forge pycocotools
  1. Install Detectron2

Follow the instrction on https://detectron2.readthedocs.io/en/latest/tutorials/install.html.

# an example:
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
  1. Install other needed packages
pip install -r requirements.txt

Data

Please download COCO 2017 dataset and organize them as following:

COCODIR/
  └── images/
    ├── train2017/
    ├── val2017/
  └── annotations/
  	├── instances_train2017.json
  	└── instances_val2017.json

Run

We use the standard SAP-DETR-R50 and SAP-DETR-DC5-R50 as examples for training and evalulation.

Replace the args '--coco_path' to your coco path before running.

Eval our pretrianed models

Download our SAP-DETR-R50 model checkpoint and perform the command below.

# for sap_detr_r50: 43.1 AP
sh ./scripts/test/r50-50epoch.sh

# for sap_detr_dc5_r50: 46.0 AP
sh ./scripts/test/r50-dc-50epoch.sh

Training your own models

You can also train our model on a single process:

# for sap_detr
python main.py -m sap_detr \
  --output_dir outputs/R50 \
  --batch_size 1 \
  --epochs 50 \
  --lr_drop 40 \
  --num_select 300 \
  --num_queries 306 \
  --warmup_iters 1000 \
  --meshgrid_refpoints_xy \
  --bbox_embed_diff_each_layer \
  --newconvinit \
  --sdg \
  --coco_path /path/to/your/COCODIR  # replace the args to your COCO path

Distributed Run

As the training is time consuming, we suggest to train the model on multi-device.

Training with multi-processes on a single node:

# for sap_detr: 43.1 AP
sh ./scripts/train/r50-50epoch.sh

# for sap_detr_dc5_r50: 46.0 AP
sh ./scripts/train/r50-dc-50epoch.sh

Training with multi-processes on multiple nodes:

For example, the command for training SAP-DETR on 2 nodes of each with 4 GPUs is as following:

On node 0:

MASTER_ADDR=<IP address of node 0> NODE_RANK=0 GPUS_PER_NODE=4 ./tools/run_dist_launch.sh 8 ./scripts/train-mn/r50-50epoch.sh

On node 1:

MASTER_ADDR=<IP address of node 0> NODE_RANK=1 GPUS_PER_NODE=4 ./tools/run_dist_launch.sh 8 ./scripts/train-mn/r50-50epoch.sh

Citation

@inproceedings{liu2023sap,
  title={Sap-detr: Bridging the gap between salient points and queries-based transformer detector for fast model convergency},
  author={Liu, Yang and Zhang, Yao and Wang, Yixin and Zhang, Yang and Tian, Jiang and Shi, Zhongchao and Fan, Jianping and He, Zhiqiang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={15539--15547},
  year={2023}
}

About

[CVPR 2023] Official implementation of "SAP-DETR: Bridging the Gap between Salient Points and Queries-Based Transformer Detector for Fast Model Convergency "

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published