Skip to content

Commit

Permalink
document nnunet; add if guard
Browse files Browse the repository at this point in the history
  • Loading branch information
Karl5766 committed Dec 10, 2024
1 parent 25dc0fd commit 1289d4b
Show file tree
Hide file tree
Showing 5 changed files with 120 additions and 20 deletions.
101 changes: 101 additions & 0 deletions docs/GettingStarted/nnunet.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
.. _nnunet:

Check warning on line 1 in docs/GettingStarted/nnunet.rst

View workflow job for this annotation

GitHub Actions / build

document isn't included in any toctree

nn-UNet
#######

Overview
********

nn-UNet is a UNet based library designed to segment medical images, refer to
`github <https://github.com/MIC-DKFZ/nnUNet>`_ and the following citation:

- Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring
method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203-211.

nn-UNet is easiest to use with their command line interface with three commands :code:`nnUNetv2_plan_and_preprocess`,
:code:`nnUNetv2_train` and :code:`nnUNetv2_predict`.

For :code:`cvpl_tools`, :code:`cvpl_tools/nnunet/cli.py` provides two
wrapper command line interface commands :code:`train` and :code:`predict` that simplify the three commands into
two and hides unused parameters for SPIMquant workflow.

:code:`cvpl_tools/nnunet` needs torch library and :code:`pip install nnunetv2`. GPU is automatically used when
:code:`nnUNetv2_train` and :code:`nnUNetv2_predict` are called directly or indirectly through :code:`train` and
:code:`predict` and when you have a GPU available on the computer.

For those unfamiliar, nn-UNet has the following quirks:

- Residual encoder is available for nnunetv2 but we prefer without it since it costs more to train

- Due to limited training data, 2d instead of 3d_fullres mode is used in :code:`cvpl_tools`

- It trains on images pairs of input size (C, Y, X) and output size (Y, X) where C is number of color channels
(1 in our case), and Y, X are spatial coordinates; specifically, N pairs of images will be provided as training
set and a 80%-20% split will be done for train-validation split which is automatically done by nnUNet. It should
be noted in our case we draw Z images from a single scan volume (C, Z, Y, X), so a random split will have
training set distribution correlated with validation set generated by nnUNet, but such thing is hard to avoid

- The algorithm is not scale-invariant, meaning during prediction, if we zoom the input image by a factor of 2x or
0.5x we get much worse output results. For best results, use the same input/output image sizes as the training
phase. In our mousebrain lightsheet dataset, we downsample the original >200GB dataset by a factor of (4, 8, 8)
before running the nnUNet for training or prediction.

- The algorithm supports the following epochs, useful for small-scale training in our case:
`link <https://github.com/MIC-DKFZ/nnUNet/blob/master/nnunetv2/training/nnUNetTrainer/variants/training_length/nnUNetTrainer_Xepochs.py>`_
if you input number of epochs not listed in this page to the :code:`predict` command, an error will occur

- nn-UNet supports 5-fold ensemble, which is to run :code:`nnUNetv2_train` command 5 times each on a different
80%-20% split to obtain 5 models to ensemble the prediction. This does not require rerun :code:`nnUNetv2_plan_and_preprocess`
and is supported by the :code:`--fold` argument of :code:`cvpl_tools`' :code:`train` command so
you don't need to run it 5 times. If you finish training all folds, you may use the :code:`--fold` argument of
:code:`cvpl_tools`' :code:`predict` command to specify :code:`all` for better accuracy after ensemble or
:code:`0` to specify using the first fold trained for comparison.

- Running the nn-UNet's command :code:`nnUNetv2_train` or :code:`cvpl_tools`' :code:`train` generates one
:code:`nnUNet_results` folder, which contains a model (of size a few hundred MBs) and a folder of results
including a loss/DICE graph and a log file containing training losses per epoch and per class. The
same model file is used later for prediction.


Negative Masking for Mouse-brain Lightsheet
*******************************************

In this section, we focus primarily on the usage of nn-UNet within :code:`cvpl_tools`. This part of the
library is designed with handling mouse-brain lightsheet scans in mind. These scans are large (>200GB)
volumes of scans in the format of 4d arrays of data type np.uint16 which is of shape (C, Z, Y, X). An
example is in the google storage bucket
"gcs://khanlab-lightsheet/data/mouse_appmaptapoe/bids/sub-F4A1Te3/micr/sub-F4A1Te3_sample-brain_acq-blaze4x_SPIM.ome.zarr"
with an image shape of (3, 1610, 9653, 9634).

The objective of our algorithm is to quantify the locations and sizes of beta-amyloid plaques in a volume
of lightsheet scan like the above, which appear as small-sized round-shaped bright spots in the image
volume, and can be detected using a simple thresholding method.

Problem comes, however, since the scanned mouse brain edges areas are as bright as the plaques, they
will be marked as false positives. These edges are relatively easier to detect by a UNet algorithm, which
results in the following segmentation workflow we use:

1. For N mousebrain scans M1, ..., MN we have at hand, apply bias correction to smooth out within image brightness
difference caused by imaging artifacts

2. Then select one of N scans, say M1

2. Downsample M1 and use a GUI to paint a binary mask, which contains 1 on regions of edges and 0 on plaques and
elsewhere

3. Split the M1 volume and its binary mask annotation vertically to Z slices, and train an nnUNet model on these slices

4. Above produces a model that can predict negative masks on any mousebrain scans of the same format; for the rest N-1
mouse brains, they are down-sampled and we use this model to predict on them to obtain their corresponding negative
masks

5. These masks are used to remove edge areas of the image before we apply thresholding to find plaque objects.
Algorithmically, we compute M' where :code:`M'[z, y, x] = M[z, y, x] * (1 - NEG_MASK[z, y, x]`) for each
voxel location (z, y, x); then, we apply threshold on M' and take connected component of value of 1 as individual
plaque objects; their centroid locations and sizes (in number of voxels) are summarized in a numpy table and
reported

In this next part, we discuss the annotation part 2, training part 3 and prediction part 4.

TODO
****
6 changes: 2 additions & 4 deletions docs/GettingStarted/result_caching.rst
Original file line number Diff line number Diff line change
Expand Up @@ -66,10 +66,8 @@ A cache directory can be a child directory of a cache root directory or other ca
Tips
****
- when writing a process function that cache to a single location, receive a cache_url object as a keyed
item :code:`context_args["cache_url"]` which can be None if we don't want to write to disk
- Dask duplicates some computation twice because it does not support on-disk caching directly, using cache
files in each step can avoid this issue and help speedup computation.
- when writing a process function that cache to a single location, pass a cache_url object via
:code:`context_args["cache_url"]`, or pass None if we don't want to write to disk
- cache the images in a viewer-readable format. For OME-ZARR a flat image chunking scheme is
suitable for 2D viewers like Napari. Re-chunking when loading back to memory may be slower but is usually
not a big issue.
6 changes: 3 additions & 3 deletions docs/GettingStarted/setting_up_the_script.rst
Original file line number Diff line number Diff line change
Expand Up @@ -125,12 +125,12 @@ log_stderr.txt files under your working directory.
CacheDirectory
**************

Different from Dask's temporary directory, cvpl_tool.tools.fs provides intermediate result
Different from Dask's temporary directory, cvpl_tools.tools.fs provides intermediate result
caching APIs. A multi-step segmentation pipeline may produce many intermediate results, for some of them we
may discard once computed, and for the others (like the final output) we may want to cache them on the disk
for access later without having to redo the computation. In order to cache the result, we need a fixed path
that do not change across program executions. The :code:`cvpl_tool.tools.fs.cdir_init` and
:code:`cvpl_tool.tools.fs.cdir_commit` and ones used to commit and check if the result exist or needs to be
that do not change across program executions. The :code:`cvpl_tools.tools.fs.cdir_init` and
:code:`cvpl_tools.tools.fs.cdir_commit` and ones used to commit and check if the result exist or needs to be
computed from scratch.

In a program, we may cache hierarchically, where there is a root cache directory that is created or loaded
Expand Down
Binary file added docs/assets/image_to_list_of_centroids.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
27 changes: 14 additions & 13 deletions src/cvpl_tools/examples/mousebrain_processing.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,19 +98,20 @@ def main(run_nnunet: bool = True, run_coiled_process: bool = True):
if run_nnunet is False:
return

pred_args = {
"cache_url": NNUNET_CACHE_DIR,
"test_im": SECOND_DOWNSAMPLE_CORR_PATH,
"test_seg": None,
"output": NNUNET_OUTPUT_TIFF_PATH,
"dataset_id": 1,
"fold": '0',
"triplanar": False,
"penalize_edge": False,
"weights": None,
"use_cache": False,
}
triplanar.predict_triplanar(pred_args)
if not RDirFileSystem(NNUNET_OUTPUT_TIFF_PATH).exists(''):
pred_args = {
"cache_url": NNUNET_CACHE_DIR,
"test_im": SECOND_DOWNSAMPLE_CORR_PATH,
"test_seg": None,
"output": NNUNET_OUTPUT_TIFF_PATH,
"dataset_id": 1,
"fold": '0',
"triplanar": False,
"penalize_edge": False,
"weights": None,
"use_cache": False,
}
triplanar.predict_triplanar(pred_args)

if run_coiled_process is False:
return
Expand Down

0 comments on commit 1289d4b

Please sign in to comment.