Pytorch-Segmentation-Detection
is a library for dense inference and training of Convolutional Neural Networks (CNNs) on Images for Segmentation and Detection.
The aim of the library is to provide/provide a simplified way to:
- Converting some popular general/medical/other Image Segmentation and Detection Datasets into easy-to-use for training format (Pytorch's dataloader).
- Training routine with on-the-fly data augmentation (scaling, color distortion).
- Training routine that is proved to work for particular model/dataset pair.
- Evaluating Accuracy of trained models with common accuracy measures: Mean IOU, Mean pix. accuracy, Pixel accuracy, Mean AP.
- Model files that were trained on a particular dataset with reported accuracy (models that were trained using this library with reported training routine and not models that were converted from Caffe or other framework)
- Model definitions (like FCN-32s and others) that use weights initializations from Image Classification models like
VGG that are officially provided by
Pytorch/Vision
library.
So far, the library contains an implementation of FCN-32s (Long et al.), Resnet-18-8s, Resnet-34-8s (Chen et al.) image segmentation models in Pytorch
and Pytorch/Vision
library with training routine, reported accuracy,
trained models for PASCAL VOC 2012 dataset. To train these models on your data, you will have
to write a dataloader
for your dataset.
Models for Object Detection will be released soon.
This code requires:
-
Some libraries which can be acquired by installing Anaconda package.
Or you can install
scikit-image
,matplotlib
,numpy
usingpip
. -
Clone the library:
git clone --recursive https://github.com/warmspringwinds/pytorch-segmentation-detection
And use this code snippet before you start to use the library:
import sys
# update with your path
# All the jupyter notebooks in the repository already have this
sys.path.append("/your/path/pytorch-segmentation-detection/")
sys.path.insert(0, '/your/path/pytorch-segmentation-detection/vision/')
Here we use our pytorch/vision fork, which might be merged and futher merged in a future. We have added it as a submodule to our repository.
- Download segmentation or detection models that you want to use manually (links can be found below).
Implemented models were tested on Restricted PASCAL VOC 2012 Validation dataset (RV-VOC12) and trained on the PASCAL VOC 2012 Training data and additional Berkeley segmentation data for PASCAL VOC 12. It was important to test models on restricted Validation dataset to make sure no images in the validation dataset were seen by model during training.
The code to acquire the training and validating the model is also provided in the library.
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
Here you can find models that were described in the paper "DeepLab: Semantic Image Segmentation with Deep
Convolutional Nets, Atrous Convolution, and Fully Connected CRFs" by Chen et al. We trained and tested
Resnet-18-8s
, Resnet-34-8s
against PASCAL VOC 2012 dataset.
You can find all the scripts that were used for training and evaluation here.
Qualitative results:
This code has been used to train networks with this performance:
Model | Test data | Mean IOU | Mean pix. accuracy | Pixel accuracy | Inference time (512x512 px. image) | Model Download Link |
---|---|---|---|---|---|---|
Resnet-18-8s (ours) | RV-VOC12 | 59.0 | in prog. | in prog. | 28 ms. | Dropbox |
Resnet-34-8s (ours) | RV-VOC12 | 68.0 | in prog. | in prog. | 50 ms. | Dropbox |
Resnet-50-8s (ours) | RV-VOC12 | in prog. | in prog. | in prog. | in prog | in prog. |
Resnet-101-8s (ours) | RV-VOC12 | in prog. | in prog. | in prog. | in prog | in prog. |
Resnet-101-16s (orig) | RV-VOC11 | 69.0 | n/a | n/a | 180 ms. |
Implemented models were trained on Endovis 2017 segmentation dataset and the sequence number 3 was used for validation and was not included in training dataset.
The code to acquire the training and validating the model is also provided in the library.
Additional Qualitative results can be found on this youtube playlist.
Model | Test data | Mean IOU | Mean pix. accuracy | Pixel accuracy | Inference time (512x512 px. image) | Model Download Link |
---|---|---|---|---|---|---|
Resnet-9-8s | Seq # 3 * | 96.1 | in prog. | in prog. | 13.3 ms. | Dropbox |
Resnet-18-8s | Seq # 3 | 96.0 | in prog. | in prog. | 28 ms. | Dropbox |
Resnet-34-8s | Seq # 3 | in prog. | in prog. | in prog. | 50 ms. | in prog. |
Resnet-9-8s network was tested on the 0.5 reduced resoulution (512 x 640).
Qualitative results (on validation sequence):
Model | Test data | Mean IOU | Mean pix. accuracy | Pixel accuracy | Inference time (512x512 px. image) | Model Download Link |
---|---|---|---|---|---|---|
Resnet-18-8s | Seq # 3 | 81.0 | in prog. | in prog. | 28 ms. | Dropbox |
Resnet-34-8s | Seq # 3 | in prog. | in prog. | in prog. | 50 ms. | in prog |
Qualitative results (on validation sequence):
The dataset contains video sequences recorded in street scenes from 50 different cities, with high quality pixel-level annotations of 5 000
frames. The annotations contain 19
classes which represent cars, road, traffic signs and so on.
Model | Test data | Mean IOU | Mean pix. accuracy | Pixel accuracy | Inference time (512x512 px. image) | Model Download Link |
---|---|---|---|---|---|---|
Resnet-18-8s | Validation set | 60.0 | in prog. | in prog. | 28 ms. | Dropbox |
Resnet-34-8s | Validation set | 69.1 | in prog. | in prog. | 50 ms. | Dropbox |
Qualitative results (on validation sequence):
Whole sequence can be viewed here.
We demonstrate applications of our library for a certain tasks which are being ported/ has already been ported to mobile devices:
-
Surgical Robotic Tools Segmentation (see below)
If you used the code for your research, please, cite the paper:
@article{pakhomov2017deep,
title={Deep Residual Learning for Instrument Segmentation in Robotic Surgery},
author={Pakhomov, Daniil and Premachandran, Vittal and Allan, Max and Azizian, Mahdi and Navab, Nassir},
journal={arXiv preprint arXiv:1703.08580},
year={2017}
}
During implementation, some preliminary experiments and notes were reported: