DeepLab v3

Use case : Semantic Segmentation

Model description

DeepLabv3 was specified in "Rethinking Atrous Convolution for Semantic Image Segmentation" paper by Google. It is composed by a backbone (encoder) that can be a Mobilenet V2 (width parameter alpha) or a ResNet-50 or 101 for example followed by an ASPP (Atrous Spatial Pyramid Pooling) as described in the paper.

ASPP applies on encoder outputs several parallel dilated convolutions with various dilation rates. This technique helps capturing longer range context without increasing too much the number of parameters. The multi-scale design of the ASPP has proved to be receptive at the same time to details and greater contextual information.

So far, we have only considered Mobilenet V2 encoder.

Network information

Network Information	Value
Framework	TensorFlow Lite
Quantization	int8
Provenance	https://www.tensorflow.org/lite/examples/segmentation/overview
Paper	https://arxiv.org/pdf/1706.05587

The models are quantized using tensorflow lite converter.

Network inputs / outputs

For an image resolution of NxM and P classes

Input Shape	Description
(1, N, M, 3)	Single NxM RGB image with UINT8 values between 0 and 255

Output Shape	Description
(1, N, M, 21)	Per-class confidence for P=21 classes in FLOAT32

Recommended platforms

Platform	Supported	Recommended
STM32L0	[]	[]
STM32L4	[]	[]
STM32U5	[]	[]
STM32H7	[]	[]
STM32MP1	[]	[]
STM32MP2	[x]	[x]

Performances

Training

To train the deeplab_v3 with backbone MobileNet v2 model with pretrained weights, from scratch or fine-tune it on your own dataset, you need to configure the user_config.yaml file following the tutorial under the src section.

As an example, deeplab_v3_mobilenetv2_05_16_512_fft.yaml file is used to train on COCO 2017 + PASCAL VOC 2012 dataset. You can copy its content in the user_config.yaml file provided under the src section to reproduce the results presented below.

Deployment

To deploy your trained model, you need to configure the same user_config.yaml file following the tutorial.

Metrics

Measures are done with default STM32Cube.AI configuration with enabled input / output allocated option.

Reference MPU inference time based on COCO 2017 + PASCAL VOC 2012 segmentation dataset 21 classes (see Accuracy for details on dataset)

Model	Dataset	Format	Resolution	Quantization	Board	Execution Engine	Frequency	Inference time (ms)	%NPU	%GPU	X-LINUX-AI version	Framework
DeepLabV3 per tensor (no ASPP)	COCO 2017 + PASCAL VOC 2012	Int8	257x257x3	per-tensor	STM32MP257F-DK2	NPU/GPU	1500 MHz	52.75	99.2	0.80	v5.1.0	OpenVX
DeepLabV3 per channel	COCO 2017 + PASCAL VOC 2012	Int8	512x512x3	per-channel **	STM32MP257F-DK2	NPU/GPU	1500 MHz	806.12	8.73	91.27	v5.1.0	OpenVX
DeepLabV3 mixed precision	COCO 2017 + PASCAL VOC 2012	Int8 & float32	512x512x3	per-channel **	STM32MP257F-DK2	NPU/GPU	1500 MHz	894.56	7.67	92.33	v5.1.0	OpenVX

** To get the most out of MP25 NPU hardware acceleration, please use per-tensor quantization

Accuracy with COCO 2017 + PASCAL VOC 2012

Dataset details: link, License Database Contents License (DbCL) v1.0 , Number of classes: 21, Number of images: 11530 Please note, that the following accuracies are evaluated on Pascal VOC 2012 validation set (val.txt), and with a preprocessing resize with interpolation method 'bilinear'. Moreover, IoU are averaged on all classes including background.

Model Description	Resolution	Format	Accuracy	Averaged IoU
DeepLabV3 per tensor (no ASPP)	257x257x3	Int8	88.6%	59.33%
DeepLabV3 float precision	512x512x3	Float	93.29%	73.44%
DeepLabV3 per channel	512x512x3	Int8	91.3%	67.32%
DeepLabV3 mixed precision	512x512x3	Int8/Float	92.83%	71.93%

Retraining and code generation

DeepLabV3 per tensor: This model, which does not include ASPP (Atrous Spatial Pyramid Pooling), was downloaded from the TensorFlow DeepLabV3 page onKaggle.
DeepLabV3 float precision: This model is the result of using the deeplab_v3_mobilenetv2_05_16_512_fft.yaml configuration file to train the model on the COCO 2017 + PASCAL VOC 2012 dataset.
DeepLabV3 Per channel: This model is quantized per channel version of DeepLabV3 float precision. It is generated using the quantization service with the the quantization_config.yaml configuration file.
DeepLabV3 mixed precision: This model is a mixed precision version of the DeepLabV3 float precision. The backbone is fully quantized to 8 bits, while the ASPP head remains partially in float precision. Some layers were too sensitive to 8-bit quantization, resulting in unacceptable accuracy degradation. Therefore, we instructed TFLite not to quantize those specific layers.

Demos

Integration in a simple example

Please refer to the generic guideline here

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

DeepLab v3

Use case : Semantic Segmentation

Model description

Network information

Network inputs / outputs

Recommended platforms

Performances

Training

Deployment

Metrics

Reference MPU inference time based on COCO 2017 + PASCAL VOC 2012 segmentation dataset 21 classes (see Accuracy for details on dataset)

Accuracy with COCO 2017 + PASCAL VOC 2012

Retraining and code generation

Demos

Integration in a simple example

Files

README.md

Latest commit

History

README.md

File metadata and controls

DeepLab v3

Use case : Semantic Segmentation

Model description

Network information

Network inputs / outputs

Recommended platforms

Performances

Training

Deployment

Metrics

Reference MPU inference time based on COCO 2017 + PASCAL VOC 2012 segmentation dataset 21 classes (see Accuracy for details on dataset)

Accuracy with COCO 2017 + PASCAL VOC 2012

Retraining and code generation

Demos

Integration in a simple example