Self-supervisedly train potentially well-performing segmentation networks in minutes:
Above figure illustrates segmentation performance on a subset of the test set for MoNuSeg. In the bottom row, blue indicates correctly segmented nuclei, green indicates false negatives, and red indicates false positives.
Training is done by contrasting patches against each other – iteratively increasing separation between positive and negative patches. Notice the divergence in patch confidences in the following figure – we conjecture that these trends illustrate the fundamental mechanisms making this framework work.
Check the preprint for further details about the behaviour, performance, and structure of the framework.
This repository contains the source code for framework. It is able to self-supervisedly train convolutional neural networks to segment images – or at least learn to recognise features. Primarily developed for and tested on nuclei segmentation in histopathological images.
The framework consists of a couple of components, and for each of these multiple different implementations can be used interchangeably. This project has some "magic" behind the scenes to facilitate fast and easy development, exploration, and testing of new implementations for these components. This has made development more efficient, but requires you to follow some processes when researching new implementations.
Following these procedures automatically makes the implementations selectable on the command line. For instance, when training, the sampler to use can be specified by the --sampler
command line option, where any available sampler can be selected by its class name, e.g., --sampler EntropySampler
. Furthermore, any arguments to its __init__
method can be specified on the command line as well – without any additional code. Note that this requires using type hints.
A couple of dataset implementations are provided. To use this framework for your own data, create a new Dataset
in a file inside the src/data/
directory, and decorate the class with
@register_dataset(DatasetType.UNLABALLED_DATASET)
class MoNuSegDataset(Dataset):
...
or DatasetType.LABALLED_DATASET
for datasets including labels (used for e.g. validation).
Any such decorated Dataset
class in a file inside the src/data/
directory will automatically be selectable on the command line by its class name on the --dataset
and --validation_dataset
options, respectively. And any parameters to their __init__
methods also automatically becomes settable on the command line, such as e.g. --image_directory
and --crop_size
for the MoNuSegDataset
.
Note that, when e.g. using both a
--dataset
and a--validation_dataset
, any parameters to their__init__
methods with identical names will get the same value. To bypass this issue, just name parameters distinctly; that is the reasonMoNuSegDataset
has animage_directory
parameter whileMoNuSegValidationDataset
has adirectory
parameter. This is a nonideal situation, but nonetheless the current situation.
Sampling of patches is a fundamental step in this framework. To create a new sampler, simply add a new class in src/sampling.py
which inherits from Sampler
. Any such classes will automatically be selectable on the command line by its class name on the --sampler
option.
To create a new similarity measure, simply add a new class in src/similarity
which inherits from SimilarityMeasure
. Any such classes will automatically be selectable on the command line by its class name on the --similarity_measure
option.
To add another confidence network, simply add its class to the tuple of AVAILABLE_CONFIDENCE_NETWORKS
. These will be selectable on the command line by their class name on the --confidence_network
option.
To add another featuriser network, simply add its class to the tuple of AVAILABLE_FEATURISER_NETWORKS
. These will be selectable on the command line by their class name on the --featuriser_network
option.
Clone the repository, and install the package with
pip install -e .
inside a virtual environment. The -e
(editable) flag is optional. Do include the dot.
The
openslide-python
package requiresopenslide-tools
, installable with e.g.apt install openslide-tools
(on Debian/Ubunutu based systems). Alternatively, comment the line out, if you are not going to be using such datasets anyways.
Alternatively, and probably recommendably, use the Docker image. See instructions below.
Installation adds the train
and evaluate
commands, whose usage are described below.
Train models using the framework with the train
command, specifiying options such as e.g.
train --accelerator gpu --in_channels 3 --out_channels 4 --sampler EntropySampler --batch_size 10 --crop_size 300 --dataset MoNuSegDataset --image_directory data/monuseg/ --similarity_measure MeanSquaredError --max_epochs 25 --patch_size 50 --validation_dataset MoNuSegValidationDataset --directory data/monuseg_test/
Evaluate trained models with the evaluate
command
evaluate --dataset MoNuSegValidationDataset --directory data/monuseg_test/ --versions 0 1 2 3
where --dataset
specifies the test set, and --versions
lists the version numbers to evaluate.
These version numbers correspond to the numbers you see in the
logs/lightning_logs/version_X
. Just by specifying the numbers the corresponding models are automatically loaded and evaluated.
To be able to train on NVIDIA GPUs, make sure that nvidia-container-toolkit
is installed.
Build the image with
sudo docker build .
Run the container interactively with e.g.
sudo docker run -it -v $(pwd)/data:/home/data -v $(pwd)/logs:/home/logs --gpus all --shm-size 8G <image ID>
where directories containing data and logs, respectively, are mounted as volumes. This eliminates the need to copy data and logs, respectively.
Contributions, including new datasets and novel samplers, similarity measures, and so on are welcome. Just please adhere somewhat to flake8
style when creating PRs.