Back | Next | Contents
Semantic Segmentation
The third deep learning capability we're highlighting in this tutorial is semantic segmentation. Semantic segmentation is based on image recognition, except the classifications occur at the pixel level as opposed to classifying entire images as with image recognition. This is accomplished by convolutionalizing a pre-trained image recognition model (like Alexnet), which turns it into a fully-convolutional segmentation model capable of per-pixel labelling. Useful for environmental sensing and collision avoidance, segmentation yields dense per-pixel classification of many different potential objects per scene, including scene foregrounds and backgrounds.
The segNet
object accepts as input the 2D image, and outputs a second image with the per-pixel classification mask overlay. Each pixel of the mask corresponds to the class of object that was classified.
note: see the DIGITS semantic segmentation example for more background info on segmentation.
As an example of image segmentation, we'll work with an aerial drone dataset that separates ground terrain from the sky. The dataset is in First Person View (FPV) to emulate the vantage point of a drone in flight and train a network that functions as an autopilot guided by the terrain that it senses.
To download and extract the dataset, run the following commands from the host PC running the DIGITS server:
$ wget --no-check-certificate https://nvidia.box.com/shared/static/ft9cc5yjvrbhkh07wcivu5ji9zola6i1.gz -O NVIDIA-Aerial-Drone-Dataset.tar.gz
HTTP request sent, awaiting response... 200 OK
Length: 7140413391 (6.6G) [application/octet-stream]
Saving to: ‘NVIDIA-Aerial-Drone-Dataset.tar.gz’
NVIDIA-Aerial-Drone-Datase 100%[======================================>] 6.65G 3.33MB/s in 44m 44s
2017-04-17 14:11:54 (2.54 MB/s) - ‘NVIDIA-Aerial-Drone-Dataset.tar.gz’ saved [7140413391/7140413391]
$ tar -xzvf NVIDIA-Aerial-Drone-Dataset.tar.gz
The dataset includes various clips captured from flights of drone platforms, but the one we'll be focusing on in this tutorial is under FPV/SFWA
. Next we'll create the training database in DIGITS before training the model.
First, navigate your browser to your DIGITS server instance and choose to create a new Segmentation Dataset
from the drop-down in the Datasets tab:
In the dataset creation form, specify the following options and paths to the image and label folders under the location where you extracted the aerial dataset:
- Feature image folder:
NVIDIA-Aerial-Drone-Dataset/FPV/SFWA/720p/images
- Label image folder:
NVIDIA-Aerial-Drone-Dataset/FPV/SFWA/720p/labels
- set
% for validation
to 1% - Class labels:
NVIDIA-Aerial-Drone-Dataset/FPV/SFWA/fpv-labels.txt
- Color map: From text file
- Feature Encoding:
None
- Label Encoding:
None
Name the dataset whatever you choose and click the Create
button at the bottom of the page to launch the importing job. Next we'll create the new segmentation model and begin training.
Next | Generating Pretrained FCN-Alexnet
Back | Running the Live Camera Detection Demo
© 2016-2019 NVIDIA | Table of Contents