Skip to content

Latest commit

 

History

History
124 lines (108 loc) · 4.57 KB

pretrain.md

File metadata and controls

124 lines (108 loc) · 4.57 KB

Pre-training

The models are summarized in the following table. Note that the performance reported is "raw", without any fine-tuning. For each dataset, we report the class-agnostic box AP@50, which measures how well the model finds the boxes mentioned in the text. All performances are reported on the respective validation sets of each dataset.

Backbone GQA Flickr Refcoco Url
Size
AP AP R@1 AP Refcoco R@1 Refcoco+ R@1 Refcocog R@1
1 R101 58.9 75.6 82.5 60.3 72.1 58.0 55.7 model 3GB
2 ENB3 59.5 76.6 82.9 57.6 70.2 56.7 53.8 model 2.4GB
3 ENB5 59.9 76.4 83.7 61.8 73.4 58.8 57.1 model 2.7GB

The config file for pretraining is configs/pretrain.json and looks like:

{
    "combine_datasets": ["flickr", "mixed"],
    "combine_datasets_val": ["flickr", "gqa", "refexp"],
    "coco_path": "",
    "vg_img_path": "",
    "flickr_img_path": "",
    "refexp_ann_path": "mdetr_annotations/",
    "flickr_ann_path": "mdetr_annotations/",
    "gqa_ann_path": "mdetr_annotations/",
    "num_queries": 100,
    "refexp_dataset_name": "all",
    "GT_type": "separate",
    "flickr_dataset_path": ""
}
  • Download the original Flickr30k image dataset from : Flickr30K webpage and update the flickr_img_path to the folder containing the images.
  • Download the original Flickr30k entities annotations from: Flickr30k annotations and update the flickr_dataset_path to the folder with annotations.
  • Download the gqa images at GQA images and update vg_img_path to point to the folder containing the images.
  • Download COCO images Coco train2014. Update the coco_path to the folder containing the downloaded images.
  • Download our pre-processed annotations that are converted to coco format (all datasets present in the same zip folder for MDETR annotations): Pre-processed annotations and update the flickr_ann_path, gqa_ann_path and refexp_ann_path to this folder with pre-processed annotations.

Script to run pre-training

Resnet101

This command will reproduce the training of the resnet 101.

python run_with_submitit.py --dataset_config configs/pretrain.json  --ngpus 8 --nodes 4 --ema

To run on a single node with 8 gpus

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --dataset_config configs/pretrain.json --ema

Timm backbones

We provide an interface to the Timm Library. Most stride 32 models that support the "features_only" mode should be supported out of the box. That includes Resnet variants as well as EfficientNets. Simply use --backbone timm_modelname where the modelname is taken from the list of Timm model here.

In the paper, we train an efficientnet-b3 with noisy-student pre-training as follows (note that the learning rate of the backbone is slightly different):

python run_with_submitit.py --dataset_config configs/pretrain.json  --ngpus 8 --nodes 4 --ema --backbone timm_tf_efficientnet_b3_ns --lr_backbone 5e-5

To run on a single node with 8 gpus

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --dataset_config configs/pretrain.json --ema --backbone timm_tf_efficientnet_b3_ns --lr_backbone 5e-5