Skip to content

GluonCV 0.7.0 Release

Compare
Choose a tag to compare
@Jerryzcn Jerryzcn released this 22 Apr 00:16
· 197 commits to master since this release
18f8ab5

Highlights

GluonCV 0.7 added our latest backbone network: ResNeSt, and the derived models for semantic segmentation and object detection. We achieve significant performance improvement on all three tasks.

Image Classification

GluonCV now provides the state-of-art image classification backbones that can be used by various downstream tasks. Our ResNeSt outperforms EfficientNet in accuracy-speed trade-off as shown in the following figures. You can now swap in our new ResNeSt in your research or product to get immediate performance improvement. Checkout the detail in our paper: ResNeSt: Split Attention Network

Here is a comparison between ResNeSt and EfficientNet. The average latency is computed using a single V100 on a p3dn.24xlarge machine with a batch size of 16.

resnest_vs_efficientnet

Model input size top-1 acc (%) avg latency (ms)  
SENet_154 224x224 81.26 5.07 previous
ResNeSt50 224x224 81.13 1.78 v0.7
ResNeSt101 256x256 82.83 3.43 v0.7
ResNeSt200 320x320 83.90 9.49 v0.7
ResNeSt269 416x416 84.54 19.50 v0.7

Object Detection

We add two new ResNeSt based Faster R-CNN model. Noted that our model is trained using 2x learning rate schedule instead of the 1x schedule used in our paper. Our two new models are 2-4% higher on COCO mAP than our previous best model “faster_rcnn_fpn_resnet101_v1d_coco”. Notebly, our ResNeSt-50 based model has a 4.1% higher mAP than our previous ResNet-101 based model.

Model Backbone mAP  
Faster R-CNN ResNet-101 40.8 previous
Faster R-CNN ResNeSt-50 42.7 v0.7
Faster R-CNN ResNeSt-101 44.9 v0.7

Semantic Segmentation

We add ResNeSt-50 and ResNeSt-101 based DeepLabV3 for semantic segmentation task on ADE20K dataset. Our new models are 1-2.8% higher than our previous best. Similar to our detection result, ResNeSt-50 performs better than ResNet-101 based model. DeepLabV3 with ResNeSt-101 backbone achieves a new state-of-the-art of 46.9 mIoU on ADE20K validation set, which outperform previous best by more than 1%.

Model Backbone pixel Accuracy mIoU  
DeepLabV3 ResNet-101 81.1 44.1 previous
DeepLabV3 ResNeSt-50 81.2 45.1 v0.7
DeepLabV3 ResNeSt-101 82.1 46.9 v0.7

Bug fixes and Improvements

  • Instructions for achieving 25.7 min Mask R-CNN training.
  • Fix R-CNNs export