-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
8 changed files
with
192 additions
and
14 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,131 @@ | ||
.. _result_caching: | ||
|
||
Result Caching | ||
############## | ||
|
||
Overview | ||
******** | ||
In many cases it's useful to cache some of the intermediate results instead of discarding all the computation results | ||
all at once. Think of the following cases where you may have encountered when writing a long-running image processing | ||
workflow: | ||
|
||
1. The cell density for each region in the scan is computed but the number does not match up with what's expected, | ||
so you want to display a heatmap in a graphical viewer showing cell density. The final results you got is text | ||
output in the console, requiring redo the computation to display. | ||
|
||
2. Some error occurs and you need to find out why a step in the computation causes the issue, but it's rather | ||
difficult to understand what went wrong without displaying some intermediate results to aid debugging. | ||
|
||
3. Graphically showing the the algorithm works step-by-step will be very help in identifying causes of | ||
issues, but requires saving all the results onto disk and chunked in a viewer-friendly format. | ||
|
||
In all cases above, caching all the intermediate results help reduce headaches and risks of unknown errors coming | ||
from the difficulty of debugging in an image processing and distributed computing environment. The basic strategy | ||
we use to overcome these is to cache all the results inside a directory tree. Each step saves all its | ||
intermediate and final results onto a node in the tree. The node's children are directories saved by its | ||
sub-steps. | ||
|
||
Here, the outputs of a processing step (function) may contain intermediate images (such as .ome.zarr), log files | ||
(.txt) and graphs generated by plotting libraries. | ||
|
||
We describe the CacheDirectory interface in details below. | ||
|
||
CacheRootDirectory | ||
****************** | ||
Every cache directory tree starts with a CacheRootDirectory node at its root, which is the only node of that class in | ||
the tree. In order to create a cache directory tree you need to create a CacheRootDirectory node, as follows: | ||
|
||
.. code-block:: Python | ||
with imfs.CacheRootDirectory( | ||
f'path/to/root', | ||
remove_when_done=False, | ||
read_if_exists=True) as temp_directory: | ||
cache_dir = temp_directory.cache_subdir(cid='test') | ||
This creates two directories 'path/to/root' and 'path/to/root/dir_cache_test' on the first run, | ||
the naming of the subfolder indicates that it is :code:`dir` a directory and :code:`cache` a | ||
persistent cache instead of a temporary folder in that location. | ||
The next time the program is run, it will not create new folders but directly read from existing ones. | ||
|
||
When :code:`remove_when_done=True` and :code:`read_if_exists=False`, we get a pure temporary cache directory that | ||
will be deleted when the program finishes. The next time the program is run we always create a new one. | ||
|
||
CacheDirectory | ||
************** | ||
A CacheDirectory makes up a node in the cache directory tree that can contain zero or | ||
more CacheDirectory and CachePath instances as its children. CacheRootDirectory is a | ||
subclass of CacheDirectory. | ||
|
||
When we create a CacheDirectory object, the directory is created if not exists, otherwise the | ||
cache is read from file on disk. To know whether the directory is created anew, | ||
use the attribute :code:`cache_dir.exists`. To create sub-directory, | ||
use the following format: | ||
|
||
.. code-block:: Python | ||
sub_cache_path = cache_dir.cache_subpath(cid='subpath1') # leaf node | ||
sub_cache_dir = cache_dir.cache_subdir(cid='subdir1') # non-leaf node | ||
Similarly, use :code:`sub_cache_path.exists` to determine if the path exists or not. Note even | ||
though CachePath class is named path instead of directory, it is a location representing a leaf node, | ||
that most often points to a directory instead of a file in the file system. | ||
|
||
CachePointer | ||
************ | ||
CachePointer is a struct containing two attributes: A parent directory and a cid indicating where | ||
under this directory the pointer points to. Both CachePath and CachePointer references a location | ||
where file or directory may or may not exist, but CachePointer is designed to be flexible that | ||
you can decide whether to create a CacheDirectory node or a non-CacheDirectory (leaf) node. Below | ||
shows equivalent ways to create cache files and folders: | ||
|
||
.. code-block:: Python | ||
sub_cache_path = cache_dir.cache_subpath(cid='subpath2') | ||
# Equivalently | ||
cptr = cache_dir.cache(cid='subpath2') | ||
sub_cache_path = cptr.subpath() | ||
sub_cache_dir = cache_dir.cache_subdir(cid='subdir2') | ||
# Equivalently | ||
cptr = cache_dir.cache(cid='subdir2') | ||
sub_cache_path = cptr.subdir() | ||
It may seem unnecessary to create a CachePointer instance just to defer the decision of whether to create | ||
a CachePath or a CacheDirectory child, but it comes in handy when you want to design the interface for a | ||
function where the caller does not need to care whether you want a leaf node or a non-leaf node. | ||
|
||
.. code-block:: Python | ||
# implementation 1 | ||
def compute(im, cptr): | ||
result = (im + 1) * 3 | ||
cache_path = cptr.subpath() | ||
if not cache_path.exists: | ||
result.save(cache_path.abs_path) | ||
return load(cache_path.abs_path) | ||
# implementation 2 (functionally equivalent but creates two sub-directories) | ||
def compute(im, cptr): | ||
cache_dir = cptr.subdir() | ||
im2 = plus_one(im=im, cptr=cache_dir.cache('plus_one')) | ||
im3 = times_three(im=im2, cptr=cache_dir.cache('times_three')) | ||
return im3 | ||
result = compute(im=input_im, temp_directory.cache(cid='compute')) | ||
# DISPLAY RESULT... | ||
Tips | ||
**** | ||
- when writing a compute function that cache to a single location, receive a CachePointer object instead of | ||
a CachePath or CacheDirectory object. This brings flexibility as it's up to the callee to decide whether | ||
a sub-path or a sub-directory is needed and you may even decide | ||
to not create the directory at all if no cache is needed, separating the function's implementation | ||
from its interface. | ||
- Dask duplicates some computation twice because it does not support on-disk caching directly, using cache | ||
files in each step can avoid this issue and help speedup computation. | ||
- cache the images in a viewer-readable format. For OME-ZARR a flat image chunking scheme is | ||
suitable for 2D viewers like Napari. Rechunking when loading back to memory may be slower but is usually | ||
not a big issue. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
This file was deleted.
Oops, something went wrong.