Comparison of OME-Zarr libs #407

will-moore · 2024-11-25T16:57:28Z

Some discussion about potential changes to ome-zarr-py at #402 inspired me to check out other OME-Zarr libs to understand alternative ways of structuring things...

Work in progres....

ngff-zarr

https://github.com/thewtex/ngff-zarr
Testing example at https://ngff-zarr.readthedocs.io/en/latest/quick_start.html

import ngff_zarr as nz
import numpy as np
data = np.random.randint(0, 256, int(1e6)).reshape((1000, 1000))
multiscales = nz.to_multiscales(data)
nz.to_ngff_zarr('example.ome.zarr', multiscales)

Pyramid generation is separate from writing to zarr 👍 Pyramid shapes are (1000,1000) and (500,500).
1 line to generate pyramid, 1 line to write to zarr
We get array at example.ome.zarr/scale0/image/.zarray with example.ome.zarr/scale0/.zattrs for xarray _ARRAY_DIMENSIONS
nz.to_multiscales(image, scale_factors=[2,4,8], chunks=64) generates a Multiscales data object with data as dask delayed pyramid.
Can't pass in e.g. a 4D image with shape (1, 512, 512, 512) since it fails to downsample - trying to downsample all dimensions?
No support for multi-C or multi-T images??
Automatic axes metadata for zyx (all space) no units etc.

pydantic-ome-ngff

https://github.com/janeliascicomp/pydantic-ome-ngff

from pydantic_ome_ngff.v04.multiscale import MultiscaleGroup
from pydantic_ome_ngff.v04.axis import Axis
import numpy as np
import zarr

axes = [
    Axis(name='y', unit='nanometer', type='space'),
    Axis(name='x', unit='nanometer', type='space')
]
arrays = [np.zeros((512, 512)), np.zeros((256, 256))]

group_model = MultiscaleGroup.from_arrays(
    axes=axes,
    paths=['s0', 's1'],
    arrays=arrays,
    scales=[ [1.25, 1.25], [2.5, 2.5] ],
    translations=[ [0.0, 0.0], [1.0, 1.0] ],
    chunks=(64, 64),
    compressor=None)

store = zarr.DirectoryStore('min_example2.zarr', dimension_separator='/')
stored_group = group_model.to_zarr(store, path="")
# no data (chunks) has been written to these arrays, you must do that separately.
stored_group['s0'] = arrays[0]
stored_group['s1'] = arrays[1]

We have full control over metadata - e.g. Axis types and downsampling by different factors in various dimensions etc.
No help with actually downsampling arrays - lib just helps with metadata creation & validation
But flexible in how we write the data to arrays. E.g. could do a plane at a time etc.

ome-zarr-py

import numpy as np
import zarr
from ome_zarr.io import parse_url
from ome_zarr.writer import write_image

data = np.random.default_rng(0).poisson(lam=10, size=(10, 256, 256)).astype(np.uint8)
store = parse_url("test_ngff_image.zarr", mode="w").store
root = zarr.group(store=store)
write_image(image=data, group=root, axes="zyx", storage_options=dict(chunks=(1, 64, 64)))

write_image() automatically does pyramid generation -> multiscales, down to "thumbnail" 👍
But only downsamples in 2D (x and y) 👎
Not easy to write pixel sizes. Scale starts at [1, 1, 1, 1, 1]
Axes created automatically: 'type' inferred by name. No units.

ngff-writer

https://github.com/aeisenbarth/ngff-writer/
Not up to date. Supports OME-Zarr v0.3

import dask.array as da
import numpy as np
from dask_image.imread import imread
from ngff_writer.array_utils import to_tczyx
from ngff_writer.writer import open_ngff_zarr

with open_ngff_zarr(
    store="output_minimum.zarr",
    dimension_separator="/",
    overwrite=True,
) as f:
    channel_paths = ["well0.ome.tiff", "well1.ome.tiff", "well2.ome.tiff"]
    collection = f.add_collection(name="well1")
    collection.add_image(
        image_name="microscopy1",
        array=to_tczyx(da.concatenate(imread(p) for p in channel_paths), axes_names=("c", "y", "x")),
        channel_names=["brightfield", "GFP", "DAPI"],
    )

transformation is stored as custom attribute in JSON - Doesn't support OME-Zarr v0.4.
Saves 5D data.
Good dask support for resizing. NB: ngff_writer/dask_utils resize() is copied into ome-zarr-py.
Non-standard 'collection' etc.
Generates omero section for channel names.

Others

https://github.com/CBI-PITT/stack_to_multiscale_ngff - Python based command like tool - E.g TIFFs to OME-Zarr

python ~/stack_to_multiscale_ngff/stack_to_multiscale_ngff/builder.py '/path/to/tiff/stack/channel1' 
'/path/to/tiff/stack/channel2' '/path/to/tiff/stack/channel3' '/path/to/output/multiscale.omehans' --scale 1 1 0.280 0.114 
0.114 --origionalChunkSize 1 1 1 1024 1024 --finalChunkSize 1 1 64 64 64 --fileType tif

https://github.com/bioio-devs/bioio - uses https://github.com/bioio-devs/bioio-ome-zarr which uses ome-zarr-py.

forum.image.sc discussions

Useful to see what the community is needing and the solutions they find. Searching image.sc
https://forum.image.sc/search?q=write%20ome-zarr

https://forum.image.sc/t/writing-tile-wise-ome-zarr-with-pyramid-size/85063/ Solution is to:
- Use zarr lib to create array
- Write chunks/slices till done
- Downsample to pyramid. I used omero_zarr.raw_pixels import downsample_pyramid_on_disk
- Manually construct metadata, then write to zarr. I used write_multiscales_metadata(root, datasets, axes=axes) but that doesn't really do much for you!
- Similar approach discussed at https://forum.image.sc/t/creating-an-ome-zarr-dynamically-from-tiles-stored-as-a-series-of-images-list-of-centre-positions-using-python/81657/12
https://forum.image.sc/t/how-do-i-save-an-image-in-zarr-format-using-python-and-retain-my-size-metadata/103627/8 - Solution code at https://gist.github.com/odinsbane/3f5aa3ec3b4de768b656afdc0aaa7530 uses ome_zarr.writer.write_multiscale()
https://forum.image.sc/t/slow-writing-to-persistent-zarr-array/59556/6 - Manually writing Zarr for napari, reading and writing a TIFF -> slice at at time. Not using OME-Zarr. stacked_image[:,:,x,y] = np.array(Image.open(img_list[x])).astype("uint16")
Java: Writing ImagePlus to OME-Zarr. Supports v0.4 OME-Zarr.
https://forum.image.sc/t/write-sparse-ome-zarr-or-ome-tiff-with-spatial-offset/101906 - Sept 2024. "No solution"
https://forum.image.sc/t/parallel-writing-into-ome-zarr-during-image-analysis/94863 - Using ImgLib2
High performance acquisition writing https://forum.image.sc/t/ngff-ome-zarr-how-fast-can-you-write-it/74303/2 and https://forum.image.sc/t/ome-zarr-chunking-questions/66794/34 (long thread!)
https://forum.image.sc/t/downsampling-data-in-z-axis-for-ome-zarr-creation/104143 (ome-zarr-py doesn't support Z downsampling) - Solution is to use Webknossos library. "it would be great if we could write the OME-zarr not from file only but also from RAM (eg numpy array)."
Memory issues of big stacks e.g. 400+ planes of 1GB each: https://forum.image.sc/t/issues-storing-image-stacks-in-ome-zarrarr/84282 No accepted solution
Storing Points/Polygons https://forum.image.sc/t/tool-for-adding-labels-group-to-ome-ngff/72839
Add timepoints to existing Plate OME-Zarr https://forum.image.sc/t/best-approach-for-appending-to-ome-ngff-datasets/89070 - Suggested solution: create multi-T plate initially, then fill out the T-tiles later. "I think it would be helpful to many to add helper HCS writer scripts (like the one above) to ome-zarr-py."
https://forum.image.sc/t/ome-ngff-writer-for-python/58153 -> https://github.com/aeisenbarth/ngff-writer

The text was updated successfully, but these errors were encountered:

will-moore · 2024-12-13T10:33:06Z

Thinking about what ome-zarr-py should look like, following release of zarr-python v3...
(NB: looking at updating ome-zarr-py to use zarr v3 and support OME-Zarr v0.5 at #404)

Some random thoughts:
Store creation was previously handled inside parse_url() created stores that were format-specific (which was mostly about dimension separators I think). But now, dimension separators are specified at array creation.
I don't think we should wrap our own store creation inside parse_url(). Just let users work with vanilla zarr to create their own stores. Otherwise we duplicate zarr's handling of which store to create, Local vv Remote, zip store, memory store etc.
Docs at https://github.com/zarr-developers/zarr-python/blob/main/docs/guide/storage.rst#implicit-store-creation encourage Implicit store creation.

We need to address scaling - we have some scaling that supports dask and 3D downsampling and others that don't. Also, python-based validation is something we need to support (several requests from the community) - Do we include pydantic-ome-ngff/ome-zarr-models-py as a dependency?

How do we define the "API" that is (for example) consumed by napari-ome-zarr? It's kinda based on the napari reader API but with a few differences (I think)?

What are the prime functions of ome-zarr-py? (and what alternatives exist)

Generating metadata (ome-zarr-models-py)
Validation (ome-zarr-models-py)
Writing / manipulating arrays (ngff-zarr - only 3D support?, ngff-writer - not maintained)
Graph traversal - e.g. handling bioformats2raw or Plate structure or Image -> labels
- Use this "graph traversal" logic for providing a list of nodes -> layers for napari-ome-zarr

It seems most of the "solutions" for OME-Zarr creation from image.sc above are based on using ome-zarr-py for metadata generation, but handling array writing themselves. (similar strategy in omero-cli-zarr). If we adopt ome-zarr-models-py for metadata creation then we don't need ome-zarr-py so much.

Validation should be handled by ome-zarr-models-py.

We do need some fully n-dimensional, dask-compatible tool for scaling: E.g. Take a single-dataset OME-Zarr and build the pyramid, downsampling in x,y,z (not c, t etc).

What are the "graph traversal" functionalities / API that we need? Is this mostly needed for napari-ome-zarr or are there other consumers of this?
In our various docs at the moment, we mostly just show how to grab the first item:

reader = Reader(parse_url(url))
nodes = list(reader())
image_node = nodes[0]
dask_data = image_node.data

Every time I come back to ome-zarr-py and need to refresh my memory, it takes a while to grok how all the Node, Spec, Reader, ZarrLocation classes etc. work together. Either we need to document this better or maybe it can be simplified in some way?

cc @joshmoore @dstansby

will-moore · 2024-12-17T13:28:17Z

Discussion with @joshmoore @jburel notes at https://docs.google.com/document/d/13dmZLaozQ6VOu41bJROfDhmmsbfSCYtVx_sWScKdMhk/edit?tab=t.0

Summary:

OME-Zarr 0.5
- Assume that Zarr v3 #404 (and napari-ome-zarr) can be made to work without too much effort. I.e., little change
OME-Zarr 0.6 and beyond
- Temporarily (?) downprioritizing ome-zarr-py
- Try to get napari-ome-zarr using ome-zarr-models-py (also look at the ergonomic transform classes from SpatialData which should be extracted to a new library)
- Evaluate ngff-zarr for internal purposes
- If a method is missing:
  - Either suggest it for ngff-zarr
  - Or: start building helpers
- If ngff-zarr is 3D only, suggest to support the full API.
  - EDIT: See Handle downsampling to_multiscales with channel dimension thewtex/ngff-zarr#125

cudmore mentioned this issue Dec 20, 2024

Handle downsampling to_multiscales with channel dimension thewtex/ngff-zarr#125

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparison of OME-Zarr libs #407

Comparison of OME-Zarr libs #407

will-moore commented Nov 25, 2024 •

edited

Loading

will-moore commented Dec 13, 2024

will-moore commented Dec 17, 2024 •

edited

Loading

Comparison of OME-Zarr libs #407

Comparison of OME-Zarr libs #407

Comments

will-moore commented Nov 25, 2024 • edited Loading

ngff-zarr

pydantic-ome-ngff

ome-zarr-py

ngff-writer

Others

forum.image.sc discussions

will-moore commented Dec 13, 2024

will-moore commented Dec 17, 2024 • edited Loading

will-moore commented Nov 25, 2024 •

edited

Loading

will-moore commented Dec 17, 2024 •

edited

Loading