This repository contains the source code for getting started on UniToPatho dataset by using pyEDDL/pyECVL
Preliminary KPI measurements are availabe at
and also other dependencies
pip3 install pandas numpy opencv-python pyyaml scikit-learn matplotlib wandb tqdm openslide_python scikit-image gdown seaborn
Download and extract UniToPatho
This script will generate all files needed in the DeepHealth Toolkit Dataset Format
python3 --folder <dataset_path>/unitopath-public/ --val_set --balance --gen_adenoma
usage: [-h] --folder FOLDER [--trainlist TRAINLIST]
[--testlist TESTLIST] [--balance] [--val_set]
[--gen_800_224] [--seed SEED] [--bal_idx BAL_IDX]
optional arguments:
-h, --help show this help message and exit
--folder FOLDER Unitopatho folder
--trainlist TRAINLIST specific wsi set for train (default empty)
--testlist TESTLIST specific wsi set for test (default test_wsi.txt)
--balance balance training set
--val_set create validation set
--gen_800_224 create a 224px version of 800micrometer dataset (it takes some time)
--gen_HG_LG create yml for hg lg only, 2 classes, for 800
--gen_adenoma create yml for adenoma type only, 3 classes, for 7000
--seed SEED seed for data balancing
--bal_idx BAL_IDX less represented class index for dataset balancing (default 3)
Only if you want to generate 'onnx_models' directory by yourself.
Inference Pipline proposed in UniToPatho, a labeled histopathological dataset for colorectal polyps classification and adenoma dysplasia grading
It will generate 3 different .csv
files by follwing the inference pipeline in the paper.
python3 --folder <dataset_path>/unitopath-public/
python3 -u <dataset_path>/unitopath-public/ --gpu 1 --temp_folder ''
usage: [-h] [--batch-size INT]
[--fullres-batch-size INT]
[--gpu GPU [GPU ...]]
[--temp_folder TEMP_FOLDER]
colorectal polyp classification inference for UniToPatho.
positional arguments:
INPUT_DATASET path to UnitoPatho
optional arguments:
-h, --help show this help message and exit
--batch-size INT batch-size for 224x224 resolution images
--fullres-batch-size batch-size for full resolution images
--gpu GPU [GPU ...] `--gpu 1 1` to use two GPUs
--temp_folder temporary folder for inference speedup (slow down the first run, high storage demand ), default none
--lsb multi-gpu update frequency, default 1
--mem allows full_mem, mid_mem, low_mem
It will generate the resulting confusion matrix image (.pdf
) and the metric results from .csv
python3 -u <dataset_path>/unitopath-public/
usage: [-h] [--threshold THRESHOLD]
positional arguments:
INPUT_DATASET path to UnitoPatho
optional arguments:
-h, --help show this help message and exit
--threshold THRESHOLD
threshold for high-grade dysplasia inference
python3 -u --gpu 1 --name 'my_first_run' --pretrain 18 <dataset_path>/unitopath-public/7000_224
usage: [-h] [--epochs INT]
[--batch-size INT]
[--momentum MOMENTUM]
[--lr LR]
[--weight-decay WEIGHT_DECAY]
[--val_epochs INT]
[--gpu GPU [GPU ...]]
[--checkpoints DIR]
[--name NAME]
[--pretrain PRETRAIN]
[--input-size INPUT_SIZE]
[--seed SEED]
[--yml-name YML_NAME]
[--ckpts RESUME_PATH]
colorectal polyp classification training example.
positional arguments:
INPUT_DATASET path to the dataset
optional arguments:
-h, --help show this help message and exit
--epochs INT number of training epochs
--batch-size INT batch-size
--momentum SGD momentum
--lr LR learning rate
--weight-decay weight-decay
--val_epochs INT validation set inference each (default=1) epochs
--gpu GPU [GPU ...] `--gpu 1 1` to use two GPUs
--checkpoints DIR if set, save checkpoints in this directory
--name NAME run name
--pretrain PRETRAIN use pretrained resnet network: default=18, allows 50 and -1 (resnet 18 not pretrained)
--input-size 224 px or original size
--seed SEED training seed
--yml-name YML_NAME yml name (default=deephealth-uc2-7000_224_balanced_adenoma.yml )
--ckpts RESUME_PATH resume trining from a checkpoint
--wandb enable wandb logs
--lsb multi-gpu update frequency, default 1
--mem allows full_mem, mid_mem, low_mem
python3 -u --gpu 1 --ckpts checkpoints/<checkpoint_name>.bin --pretrain 18 <dataset_path>/unitopath-public/7000_224
usage: [-h]
[--batch-size INT]
[--gpu GPU [GPU ...]]
[--pretrain PRETRAIN]
[--yml-name YML_NAME]
[--input-size INPUT_SIZE]
colorectal polyp classification inference example.
positional arguments:
INPUT_DATASET path to the dataset
optional arguments:
-h, --help show this help message and exit
--ckpts checkpoint path
--batch-size INT batch-size
--gpu GPU [GPU ...] `--gpu 1 1` to use two GPUs
--pretrain PRETRAIN use pretrained resnet network: default=18, allows 50 and -1 (resnet 18 not pretrained)
--yml-name YML_NAME yml name (default=deephealth-uc2-7000_224_balanced_adenoma.yml )
--input-size 224 px or original size
--lsb multi-gpu update frequency, default 1
--mem allows full_mem, mid_mem, low_mem
Annotation files are associated to the slide only if they are named <slide_name>.ndpi.ndpa
python3 -u --jobs 4 --size 4000 --pxsize 224 --extension png --ROIs <annotation_path> --ROIs <slides_path> <output_path>
Note: --ROIs
argument is optional: if not provided, slides will be clipped where tissue is detected
usage: [-h] [--ROIs ROIS] [--extension EXTENSION]
[--jobs JOBS] [--size SIZE] [--pxsize PXSIZE]
[--subset SUBSET]
data output
positional arguments:
data dataset path
output Output path
optional arguments:
-h, --help show this help message and exit
--ROIs ROIS Path for meatadata folder
--extension output image format (default='png')
--jobs JOBS Number of parallel jobs
--size SIZE Crop size (in μm)
--pxsize PXSIZE Crop size (in px) ( fullres -1)
--subset SUBSET subset of the slide (text file descriptor, default test_wsi.txt )