Skip to content

Commit

Permalink
update docs: proper inline codeblocks
Browse files Browse the repository at this point in the history
  • Loading branch information
Karl5766 committed Aug 20, 2024
1 parent 1a09548 commit 993b2ba
Show file tree
Hide file tree
Showing 7 changed files with 101 additions and 90 deletions.
4 changes: 2 additions & 2 deletions docs/API/napari_zarr.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ cvpl_tools/napari/zarr.py

View source at `zarr.py <https://github.com/khanlab/cvpl_tools/blob/main/src/cvpl_tools/napari/zarr.py>`_.

For OME ZARR images, **add_ome_zarr_array_from_path** can be used generally. If an image has an
For OME ZARR images, :code:`add_ome_zarr_array_from_path` can be used generally. If an image has an
associated label OME_ZARR file(s) in the "[image_ome_zarr]/labels/label_name" path, then the
image and label(s) can be opened together with a single **add_ome_zarr_group_from_path** call.
image and label(s) can be opened together with a single :code:`add_ome_zarr_group_from_path` call.

.. rubric:: APIs

Expand Down
6 changes: 3 additions & 3 deletions docs/API/ome_zarr_io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ cvpl_tools/ome_zarr/io.py

View source at `io.py <https://github.com/khanlab/cvpl_tools/blob/main/src/cvpl_tools/ome_zarr/io.py>`_.

Read and Write: For reading OME ZARR image, use **load_zarr_group_from_path** to open a zarr group in
read mode and then use **dask.array.from_zarr** to create a dask array from the group. For writing OME
Read and Write: For reading OME ZARR image, use :code:`load_zarr_group_from_path` to open a zarr group in
read mode and then use :code:`dask.array.from_zarr` to create a dask array from the group. For writing OME
ZARR image, we assume you have a dask array and would like to write it as a .zip or a directory. In
such cases, **write_ome_zarr_image** directly writes the dask array onto disk.
such cases, :code:`write_ome_zarr_image` directly writes the dask array onto disk.

.. rubric:: APIs

Expand Down
9 changes: 5 additions & 4 deletions docs/API/seg_process.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,13 @@ cvpl_tools/im/seg_process.py

View source at `seg_process.py <https://github.com/khanlab/cvpl_tools/blob/main/src/cvpl_tools/im/seg_process.py>`_.

Q: Why are there two baseclasses SegProcess and BlockToBlockProcess? When I define my own pipeline, which class
should I be subclassing from?
Q: Why are there two baseclasses :code:`SegProcess` and :code:`BlockToBlockProcess`? When I define my own pipeline,
which class should I be subclassing from?

A: BlockToBlockProcess is a wrapper around SegProcess for code whose input and output block sizes are the same.
A: :code:`BlockToBlockProcess` is a wrapper around :code:`SegProcess` for code whose input and output block sizes
are the same.
For general processing whose output are list of centroids, or when input shape of any block is not the same as
output shape of that block, then BlockToBlockProcess is suitable for that purpose.
output shape of that block, use :code:`BlockToBlockProcess`.

.. rubric:: APIs

Expand Down
32 changes: 17 additions & 15 deletions docs/GettingStarted/ome_zarr.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,12 @@ Viewing of ome_zarr file

Viewing of ome_zarr in a directory or as a zip file.

1.Open Napari with command **napari**
1.Open Napari with command :code:`napari`

2.Open the command widget with button at the bottom left corner of the window.

3.After that, type in the command window to invoke functions that add more images as layers.
To view an ome-zarr file this way with **cvpl_tools**, use the command
To view an ome-zarr file this way with :code:`cvpl_tools`, use the command

::

Expand All @@ -40,10 +40,10 @@ To view an ome-zarr file this way with **cvpl_tools**, use the command
zarr_group = zarr.open(store, mode='r')
cvpl_zarr.add_ome_zarr_group(viewer, zarr_group, dict(name="displayed_name_in_ui"))
- An extra argument is_label can be passed into the function via kwargs dictionary. This is a
boolean value that specifies whether to use viewer.add_labels (if True) or viewer.add_image
(if False) function. This is useful for displaying instance segmentaion masks, where each
segmented object has a distinct color.
- An extra argument is_label can be passed into the function via :code:`kwargs` dictionary.
This is a boolean value that specifies whether to use :code:`viewer.add_labels`
(if :code:`True`) or :code:`viewer.add_image` (if :code:`False`) function. This is useful for
displaying instance segmentaion masks, where each segmented object has a distinct color.

Similarly, you can open a zip, or an image with multiple labels this way.

Expand Down Expand Up @@ -79,20 +79,22 @@ Above + denotes collapsed folder and - denotes expanded folder. A few things to
is not a standard ZARR directory and contains no **.zarray** meta file. Loading an OME ZARR
image as ZARR will crash, if you forget to specify **0/** subfolder as the path to load
- When saved as a zip file instead of a directory, the directory structure is the same except that
the root is zipped. Loading a zipped OME ZARR, cvpl_tools uses ZipStore's features to directly reading
individual chunks without having to unpack the entire zip file. However, writing to a ZipStore is
not supported, due to lack of support by either Python's zarr or the ome-zarr library.
the root is zipped. Loading a zipped OME ZARR, cvpl_tools uses :code:`ZipStore`'s features to
directly reading individual chunks without having to unpack
the entire zip file. However, writing to a :code:`ZipStore` is not supported, due to lack of
support by either Python's :code:`zarr` or the :code:`ome-zarr` library.
- An HPC system like Compute Canada may work better with one large files than many small files,
thus the result should be zipped. This can be done by first writing the folder to somewhere
that allows creating many small files and then zip the result into a single zip in the target
directory
- As of the time of writing (2024.8.14), ome-zarr library's Writer class has a `double computation
issue <https://github.com/ome/ome-zarr-py/issues/392>`_. To temporary patch this for our
use case, I've added a **write_ome_zarr_image** function to write a dask array as an OME ZARR
- As of the time of writing (2024.8.14), ome-zarr library's :code:`Writer` class has a
`double computation issue <https://github.com/ome/ome-zarr-py/issues/392>`_. To temporary patch
this for our use case, I've added a :code:`write_ome_zarr_image`
function to write a dask array as an OME ZARR
file. This function also adds support for reading images stored as a **.zip** file.

See the API page for `cvpl_tools.ome_zarr.io.py <API/ome_zarr_io>`_ for how to read and write OME
ZARR files if you want to use **cvpl_tools** for such tasks. This file provides two functions
**load_zarr_group_from_path** and **write_ome_zarr_image** which allows you to read and write OME
See the API page for cvpl_tools.ome_zarr.io.py for how to read and write OME
ZARR files if you want to use :code:`cvpl_tools` for such tasks. This file provides two functions
:code:`load_zarr_group_from_path` and :code:`write_ome_zarr_image` which allows you to read and write OME
ZARR files, respectively.

71 changes: 37 additions & 34 deletions docs/GettingStarted/segmentation_pipeline.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,9 @@ that are hard to debug and requires tens of minutes or hours if we need to rerun
The SegProcess Class
********************

The **SegProcess** class in module **cvpl_tools.im.seg_process** provides a convenient way for us to define
a step in multi-step image processing pipeline for distributed, interpretable and cached image data analysis.
The :code:`SegProcess` class in module :code:`cvpl_tools.im.seg_process` provides a convenient way for us
to define a step in multi-step image processing pipeline for distributed, interpretable and cached image
data analysis.

Consider a function that counts the number of cells in a 3d-block of brightness map:

Expand Down Expand Up @@ -57,21 +58,22 @@ results makes sure computation is done only once, which is necessary when we wor
on hundreds of GBs of data.

SegProcess is designed to address these issues, with the basic idea to integrate visualization as
part of the cell_count function, and cache the result of each step into a file in a **CacheDirectory**.
part of the cell_count function, and cache the result of each step into a file in a :code:`CacheDirectory`.

The class supports the following use cases:

1. dask-support. Inputs are expected to be either numpy array, dask array, or cvpl.im.ndblock.NDBlock
objects. In particular, dask.Array and NDBlock are suitable for parallel or distributed image processing
workflows.
1. dask-support. Inputs are expected to be either numpy array, dask array, or
:code:`cvpl.im.ndblock.NDBlock` objects. In particular, dask.Array and NDBlock are suitable for
parallel or distributed image processing workflows.

2. integration of Napari. **forward()** function of a SegProcess object has a viewer attribute that
2. integration of Napari. :code:`forward()` function of a SegProcess object has a viewer attribute that
defaults to None. By passing a Napari viewer to this parameter, the forward process will add intermediate
images or centroids to the Napari viewer for easier debugging.
images or centroids to the Napari viewer for easier debugging. Then after forward process finishes, we
call :code:`viewer.show()` to display all added images

3. intermediate result caching. **CacheDirectory** class provides a hierarchical caching directory,
where each **forward()** call will either create a new directory or load from existing cache directory
based on the **cid** parameter passed to the function.
3. intermediate result caching. :code:`CacheDirectory` class provides a hierarchical caching directory,
where each :code:`forward()` call will either create a new directory or load from existing cache directory
based on the :code:`cid` parameter passed to the function.

Now we discuss how to define such a pipeline.

Expand All @@ -81,10 +83,10 @@ Extending the Pipeline
The first step of building a pipeline is to break a segmentation algorithm down to steps that process the
image in different formats. As an example, we may implement a pipeline as IN -> BS -> OS -> CC, where:

- IN - Input Image (np.float32) between min=0 and max=1, this is the brightness dask image as input
- BS - Binary Segmentation (3d, np.uint8), this is the binary mask single class segmentation
- OS - Ordinal Segmentation (3d, np.int32), this is the 0-N where contour 1-N each denotes an object; also single class
- CC - Cell Count Map (3d, np.float32), a cell count number (estimate, can be float) for each block
- IN - Input Image (:code:`np.float32`) between min=0 and max=1, this is the brightness dask image as input
- BS - Binary Segmentation (3d, :code:`np.uint8`), this is the binary mask single class segmentation
- OS - Ordinal Segmentation (3d, :code:`np.int32`), this is the 0-N where contour 1-N each denotes an object; also single class
- CC - Cell Count Map (3d, :code:`np.float32`), a cell count number (estimate, can be float) for each block

Mapping from IN to BS comes in two choices. One is to simply take threshold > some number as cells and the
rest as background. Another is to use a trained machine learned algorithm to do binary segmentation. Mapping
Expand All @@ -104,15 +106,16 @@ We can then plan the processing steps we need to define as follows:
4. watershed_inst_segmentation (BS -> OS)
5. cell_cnt_from_inst (OS -> CC)

How do we go from this plan to actually code these steps? Subclassing **SegProcess** is the recommended way.
This means to create a subclass that defines the **forward()** method, which takes arbitrary inputs
How do we go from this plan to actually code these steps? Subclassing :code:`SegProcess` is the recommended way
(although one may argue we don't need OOP here).
This means to create a subclass that defines the :code:`forward()` method, which takes arbitrary inputs
and two optional parameters: cid and viewer.

- cid specifies the subdirectory under the cache directory (set by the **set_tmpdir** method of the base
class) to save intermediate files. If not provided (cid=None),
- cid specifies the subdirectory under the cache directory (set by the :code:`set_tmpdir` method of the base
class) to save intermediate files. If not provided (:code:`cid=None`),
then the cache will be saved in a temporary directory that will be removed when the CacheDirectory is
closed. If provided, this cache file will persist. Within the forward() method, you should use
self.tmpdir.cache() and self.tmpdir.cache_im() to create cache files:
closed. If provided, this cache file will persist. Within the :code:`forward()` method, you should use
:code:`self.tmpdir.cache()` and :code:`self.tmpdir.cache_im()` to create cache files:

.. code-block:: Python
Expand All @@ -126,13 +129,12 @@ and two optional parameters: cid and viewer.
result = compute_result(im)
save(cache_path.path, result)
result = load(cache_path.path)
# ...
return result
- The viewer parameter specifies the napari viewer to display the intermediate results. If not provided
(viewer=None), then no computation will be done to visualize the image. Within the forward() method, you
should use viewer.add_labels(), lc_interpretable_napari() or temp_directory.cache_im() while passing in
viewer_args argument to display your results:
(:code:`viewer=None`), then no computation will be done to visualize the image. Within the forward() method, you
should use :code:`viewer.add_labels()`, :code:`lc_interpretable_napari()` or :code:`temp_directory.cache_im()`
while passing in :code:`viewer_args` argument to display your results:

.. code-block:: Python
Expand All @@ -147,19 +149,20 @@ and two optional parameters: cid and viewer.
))
return result
viewer_args is a parameter that allows us to visualize the saved results as part of the caching function. The
reason we need this is that displaying the saved result often requires a different (flatter) chunk size for
fast loading of cross-sectional image, and also requires downsampling for zooming in/out of larger images.
:code:`viewer_args` is a parameter that allows us to visualize the saved results as part of the caching
function. The reason we need this is that displaying the saved result often requires a different (flatter)
chunk size for fast loading of cross-sectional image, and also requires downsampling for zooming in/out of
larger images.

Running the Pipeline
********************

See `Boilerplate Code <GettingStarted/boilerplate>`_ to understand boilerplate code used below. It's required
to do the following example.
See `Setting Up the Script <GettingStarted/setting_up_the_script>`_ to understand boilerplate code used below.
It's required to understand the following example.

Now we have defined a ExampleSegProcess class, the next step is to write our script that uses the pipeline to
segment an input dataset. Note we need a dask cluster and a temporary directory setup before running the
forward() method.
Now we have defined a :code:`ExampleSegProcess` class, the next step is to write our script that uses the pipeline
to segment an input dataset. Note we need a dask cluster and a temporary directory setup before running the
:code:`forward()` method.

.. code-block:: Python
Expand Down
Loading

0 comments on commit 993b2ba

Please sign in to comment.