GluonCV 0.5.0 Release

Highlights

GluonCV v0.5.0 added Video Action Recognition models, added AlphaPose, added MobileNetV3, added VPLR semantic segmentation models for driving scenes, added more Int8 quantized models for deployment, and we also included multiple usability improvements.

New Models released in 0.5

Model	Metric	0.5
vgg16_ucf101	UCF101 Top-1	83.4
inceptionv3_ucf101	UCF101 Top-1	88.1
inceptionv3_kinetics400	Kinetics400 Top-1	72.5
alpha_pose_resnet101_v1b_coco	OKS AP (with flip)	76.7/92.6/82.9
MobileNetV3_Large	ImageNet Top-1	75.32
MobileNetV3_Small	ImageNet Top-1	67.72
deeplab_v3b_plus_wideresnet_citys	Cityscapes mIoU	83.5

New application: Video Action Recognition

https://gluon-cv.mxnet.io/model_zoo/action_recognition.html

Video Action Recognition in GluonCV is a complete application set, including model definition, training scripts, useful loss and metric functions. We also included some pre-trained models and usage tutorials.

Model	Pre-Trained Dataset	Clip Length	Num of Segments	Metric	Dataset	Accuracy
vgg16_ucf101	ImageNet	1	1	Top-1	UCF101	81.5
vgg16_ucf101	ImageNet	1	3	Top-1	UCF101	83.4
inceptionv3_ucf101	ImageNet	1	1	Top-1	UCF101	85.6
inceptionv3_ucf101	ImageNet	1	3	Top-1	UCF101	88.1
inceptionv3_kinetics400	ImageNet	1	3	Top-1	Kinetics400	72.5

The tutorial for how to prepare UCF101 and Kinetics400 dataset: https://gluon-cv.mxnet.io/build/examples_datasets/ucf101.html and https://gluon-cv.mxnet.io/build/examples_datasets/kinetics400.html .

The demo for using the pre-trained model to predict human actions: https://gluon-cv.mxnet.io/build/examples_action_recognition/demo_ucf101.html.

The tutorial for how to train your own action recognition model: https://gluon-cv.mxnet.io/build/examples_action_recognition/dive_deep_ucf101.html.

More state-of-the-art models (I3D, SlowFast, etc.) are coming in the next release. Stay tuned.

New model: AlphaPose

https://gluon-cv.mxnet.io/model_zoo/pose.html#alphapose

Model	Dataset	OKS AP	OKS AP (with flip)
alpha_pose_resnet101_v1b_coco	COCO Keypoint	74.2/91.6/80.7	76.7/92.6/82.9

The demo for using the pre-trained AlphaPose model: https://gluon-cv.mxnet.io/build/examples_pose/demo_alpha_pose.html.

New model: MobileNetV3

https://gluon-cv.mxnet.io/model_zoo/classification.html#mobilenet

Model	Dataset	Top-1	Top-5	Top-1 (original paper)
MobileNetV3_Large	ImageNet	75.3	92.3	75.2
MobileNetV3_Small	ImageNet	67.7	87.5	67.4

New model: Semantic Segmentation VPLR

https://gluon-cv.mxnet.io/model_zoo/segmentation.html#cityscapes-dataset

Model	Pre-Trained Dataset	Dataset	mIoU	iIoU
deeplab_v3b_plus_wideresnet_citys	ImageNet, Mapillary Vista	Cityscapes	83.5	64.4

Improving Semantic Segmentation via Video Propagation and Label Relaxation ported in GluonCV. State-of-the-art method on several driving semantic segmentation benchmarks (Cityscapes, CamVid and KITTI), and generalizes well to other scenes.

New model: More Int8 quantized models

https://gluon-cv.mxnet.io/build/examples_deployment/int8_inference.html
Below CPU performance is benchmarked on AWS EC2 C5.12xlarge instance with 24 physical cores.
Note that you will need nightly build of MXNet to properly use these new features.

Model	Dataset	Batch Size	C5.12xlarge FP32	C5.12xlarge INT8	Speedup	FP32 Acc	INT8 Acc
FCN_resnet101	VOC	1	5.46	26.33	4.82	97.97%	98.00%
PSP_resnet101	VOC	1	3.96	10.63	2.68	98.46%	98.45%
Deeplab_resnet101	VOC	1	4.17	13.35	3.20	98.36%	98.34%
FCN_resnet101	COCO	1	5.19	26.22	5.05	91.28%	90.96%
PSP_resnet101	COCO	1	3.94	10.60	2.69	91.82%	91.88%
Deeplab_resnet101	COCO	1	4.15	13.56	3.27	91.86%	91.98%

For segmentation models, the accuracy metric is pixAcc. Usage of int8 quantized model is identical to standard GluonCV models, simple use suffix _int8.

Bug fixes and Improvements

RCNN added automatic mix precision and horovod integration. Close to 4x improvements in training throughput on 8 V100 GPU.
RCNN added multi-image per device support.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GluonCV 0.5.0 Release

GluonCV 0.5.0 Release

Highlights

New Models released in 0.5

New application: Video Action Recognition

New model: AlphaPose

New model: MobileNetV3

New model: Semantic Segmentation VPLR

New model: More Int8 quantized models

Bug fixes and Improvements