Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image labeling at FOV level #64

Closed
tcompa opened this issue Jun 6, 2022 · 52 comments
Closed

Image labeling at FOV level #64

tcompa opened this issue Jun 6, 2022 · 52 comments
Assignees

Comments

@tcompa
Copy link
Collaborator

tcompa commented Jun 6, 2022

Hi @gusqgm and @jluethi, let's discuss here the details of the image labeling task.

@jluethi
Copy link
Collaborator

jluethi commented Jun 7, 2022

To provide a simple first implementation of how to do this (more complicated, custom versions will follow later from @gusqgm):

Let's first implement nuclear segmentation using the cellpose library. Can be installed as pip install cellpose, installs some dependencies like PyTorch, but should just work via pip.

It works well to segment nuclei in the Pelkmans lab test sets I provided. Here is example code on how to use it:

from cellpose import utils
from cellpose import models

def segment_nuclei(fn, input_channel, cellprob_th=0, d=40, pyr_level=1, anisotropy=3):
    use_GPU = models.use_gpu()
        # Get the relevant channel (make this something the user can select). In our case, it's the DAPI channel (e.g. C01 in the original data). 
    # Example code is to load it from a hdf5 file that's saved at fn, loading the channel input_channel and a given pyramid level
    with h5py.File(fn, 'a') as f:
        dapi_dset= f[input_channel][str(pyr_level)]
        model = models.Cellpose(gpu=use_GPU, model_type='nuclei')
        mask, flows, styles, diams = model.eval(dapi_dset, channels=[0,0],
                                                diameter=d, anisotropy=anisotropy, do_3D=True,
                                                net_avg=False, augment=False,
                                                cellprob_threshold=cellprob_th)

    # TODO: Save mask to whereever the output should go

Probably best to load e.g. pyramid level 1 (let's make this an option, but start with 1 for the moment). Processing full res images through cellpose poses quite a high memory demand to the GPU, which is why we typically used downsampled images here.

And the label mask saved under mask above should then be saved to the OME-Zarr Labels folder, as described in the spec here: https://ngff.openmicroscopy.org/latest/

@jluethi
Copy link
Collaborator

jluethi commented Jun 14, 2022

Regarding saving images: I've tested this and it works well to save images site by site. If I have a label image, I can save it to an existing OME-Zarr file by doing this:

from ome_zarr.writer import write_labels

# Create label image somehow
label_img = mask  # e.g. the mask from above

# Path to the field of view we want to save to
file_path = f'/PATH/TO/PLATE/20200812-CardiomyocyteDifferentiation14-Cycle1.zarr/B/03/{fov}'

store = parse_url(file_path, mode="w").store
root = zarr.group(store=store)
label_name = "label_image"
label_axes=["z", "y", "x"]  # could change if e.g. a 2D image was processed => can we infer this?

write_labels(label_image, group=root, name=label_name, axes=label_axes)

This creates the necessary folder in the zarr file and writes the label image to disk with pyramids already. We may want to parse some additional parameters about the pyramids etc. (e.g. the coarsening factors), but the ome-zarr writer function for labels seems like a good start overall.

@jluethi
Copy link
Collaborator

jluethi commented Jun 14, 2022

We can view label images in the napari viewer using the napari-ome-zarr plugin:
Screenshot 2022-06-14 at 16 20 01

Unfortunately, the visualization of labels does not seem to be working for HCS plates at the moment (see ongoing discussion here: ome/ome-zarr-py#65 (comment)). But we can probably just proceed as planned for the moment and just check the label images by looking at single fovs.

@jluethi
Copy link
Collaborator

jluethi commented Jun 14, 2022

Also, all of this will be quite a bit simplified once we use multi-fovs (each site saved to an individual fov), see #36 and here for the visualization issues #66
=> Let's make sure we push this through first, to avoid building an overly complex labeling task that needs to infer sites and save labels in chunks or such

@jluethi jluethi moved this from Ready to In Progress (current sprint) in Fractal Project Management Jun 28, 2022
@tcompa
Copy link
Collaborator Author

tcompa commented Jun 29, 2022

Quick question: which versions of cellpose (1 or 2) and python are you using?

@jluethi
Copy link
Collaborator

jluethi commented Jun 29, 2022

Let's go for version 2 now. We mostly used version 1 before, but version 2 has great improvements and we're certainly more interested in this going forward :)

@tcompa
Copy link
Collaborator Author

tcompa commented Jun 29, 2022

Ok, thanks.

I guess the bit of code in this issue was working with v1, right? Because for instance models.use_GPU() doesn't seem to exist (probably replaced by core.use_gpu(), I don't know if it's related to v1/v2 or to other updates on their side).
@jluethi, is this code something that you can quickly run, to make sure it has no errors?

@jluethi
Copy link
Collaborator

jluethi commented Jun 29, 2022

@tcompa Ah, yes, can be that there were some changes in that setting. If you have code that runs, I can test it over lunch quickly or later tonight. Otherwise, I can look into providing an updated example that I verified separately by tomorrow. Verifying that it runs on a GPU is trickier, as I can't do that with my local GPU, so same for whether the "runs on GPU" option works.

@tcompa
Copy link
Collaborator Author

tcompa commented Jun 29, 2022

@tcompa Ah, yes, can be that there were some changes in that setting. If you have code that runs, I can test it over lunch quickly or later tonight. Otherwise, I can look into providing an updated example that I verified separately by tomorrow.

At the moment I'm only dealing with installing cellpose correctly, and making sure it runs on the GPU partition and sees the GPU. Anything related to code actually doing something is for later.

@tcompa
Copy link
Collaborator Author

tcompa commented Jun 29, 2022

It works well to segment nuclei in the Pelkmans lab test sets I provided. Here is example code on how to use it:

from cellpose import utils
from cellpose import models

def segment_nuclei(fn, input_channel, cellprob_th=0, d=40, pyr_level=1, anisotropy=3):
    use_GPU = models.use_gpu()
        # Get the relevant channel (make this something the user can select). In our case, it's the DAPI channel (e.g. C01 in the original data). 
    # Example code is to load it from a hdf5 file that's saved at fn, loading the channel input_channel and a given pyramid level
    with h5py.File(fn, 'a') as f:
        dapi_dset= f[input_channel][str(pyr_level)]
        model = models.Cellpose(gpu=use_GPU, model_type='nuclei')
        mask, flows, styles, diams = model.eval(dapi_dset, channels=[0,0],
                                                diameter=d, anisotropy=anisotropy, do_3D=True,
                                                net_avg=False, augment=False,
                                                cellprob_threshold=cellprob_th)

    # TODO: Save mask to whereever the output should go

Probably best to load e.g. pyramid level 1 (let's make this an option, but start with 1 for the moment). Processing full res images through cellpose poses quite a high memory demand to the GPU, which is why we typically used downsampled images here.

We'd need some more details:

  1. What is the expected shape of dapi_dset? 2D or 3D? Single image or any size (e.g. merged FOVs)?
  2. Could you please provide some meaningful values for the arguments of model.eval (i.e. d, anisotropy, cellprob_th)?

@tcompa
Copy link
Collaborator Author

tcompa commented Jun 29, 2022

(no worries,

We'd need some more details:

1. What is the expected shape of `dapi_dset`? 2D or 3D? Single image or any size (e.g. merged FOVs)?

2. Could you please provide some meaningful values for the arguments of `model.eval` (i.e. `d`, `anisotropy`, `cellprob_th`)?

I am just sticking with the defaults for the moment.

Next question is where to store label metadata in the zarr file. If we segment level 1, for instance, we should associate some coordinateTransformation (notably scale) to the output mask. I need to look into the specs, to see how to do it.

@tcompa
Copy link
Collaborator Author

tcompa commented Jun 29, 2022

Just reporting my observations: write_labels actually creates a pyramid (for the labels) by itself, and the relevant scale transformations are defined in plate.zarr/B/03/0/labels/label_image/.zattrs. The calculation of these shapes is in https://github.com/ome/ome-zarr-py/blob/313e12690fd95d4f199ef3aba9d9412730012c53/ome_zarr/format.py#L256-L266.

We should understand:

  1. Whether this is in line with our pyramids, or something is different.
  2. How to propagate a prefactor from write_labels down to generate_coordinate_transformation, so that the final scales match with the ones of the image at the corresponding level. This is if we segment a pyramid level which is not the highest-resolution one.

By now I modified the .zattrs file by hand, and obtained this

Screenshot from 2022-06-29 15-12-01

tcompa added a commit that referenced this issue Jun 29, 2022
@tcompa
Copy link
Collaborator Author

tcompa commented Jun 29, 2022

I'm adding a prototype task image_labeling, with 5d45e9a.

To run it (outside Fractal) I use this script

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=16
#SBATCH --partition=gpu
#SBATCH --time=10:00:00

date
echo
poetry run python image_labeling.py -z /data/active/fractal/tests/Temporary_data_UZH_1_well_2x2_sites_singlefov/20200812-CardiomyocyteDifferentiation14-Cycle1.zarr/B/03/0/
echo
date

The segmentation of the highest-resolution level (of a single 2x2 well, single channel --> shape 10, 4320, 5120) took around 10 minutes on the GPU.

@tcompa
Copy link
Collaborator Author

tcompa commented Jun 29, 2022

We should now get started with the discussion of what should be wrapped around this skeleton code. What inputs from the user? What possible outputs? What other behaviors should be supported?

@jluethi
Copy link
Collaborator

jluethi commented Jun 30, 2022

We'd need some more details:

  1. What is the expected shape of dapi_dset? 2D or 3D? Single image or any size (e.g. merged FOVs)?
  2. Could you please provide some meaningful values for the arguments of model.eval (i.e. d, anisotropy, cellprob_th)?
  1. Both interesting questions where the answer may vary. We'd like to be able to use those networks both on 3D data as well as on 2D data like MIPs. Typically, the single cells would be segmented in 3D for each original FOV. But we would use the same (or a similar) model with slightly different parameters to segment organoids (larger structures) in the MIPs across sites.
  2. d (diameter), cellprob_th (cell probablity threshold) and potentially 1 or 2 extra factors (will investigate) should definitely be user-exposed inputs and we'd run with the defaults it we don't get anything. Anisotropy is the ratio of x & y pixel sizes to z pixel sizes and we can get this from the metadata (scale). Not sure though if it needs to be rounded to an integer or can be a float (will investigate)

The question of at what resolution level the network should be run is a very good one! I actually ran the 3D model per site at pyramid level 1, because level 0 was too large for the GPU I had back then. This is something we may want to do for GPU memory or performance reasons at different instances, because we often don't need too fine a segmentation anyway and pyramid level 1 or even 2 may be detailed enough. => interesting that you ran at level 1, can we expose this as an option?

Also, I think we may just save the labels at the lowest pyramid level that we have (e.g. at level 1 if we don't have labels for level 0). For visualization, that probably works with the scale parameters set correctly. Not sure though if this will be a headache during analysis or if that is easily solved, so we may get back to this later.

write_labels actually creates a pyramid

Let's see if that is good enough for what we need or whether we'll need to have our own implementation.
If we need to do it separately, let's be aware that creating a pyramid is different/trickier for label images than for intensity images. The downscaling function for a label image consisting of distinct integer labels shouldn't be the same as for an intensity image. Thus, if we can rely on the in-built functionality, that may be useful.
We would need to parse our pyramid parameters to it though (i.e. downscaling factor, number of pyramid levels), not fully sure whether they will be directly taken as an input.

Your output already looks very promising! Did you run this on the MIP or on a single slice?

@tcompa
Copy link
Collaborator Author

tcompa commented Jun 30, 2022

Your output already looks very promising! Did you run this on the MIP or on a single slice?

It ran on the 3D image, with shape 10, 4320, 5120, via:

    # Select ZYX data for level 0 and channel 0
    dapi_dset = da.from_zarr(zarrurl + "/0")[0]

    use_gpu = core.use_gpu()
    model = models.Cellpose(gpu=use_gpu, model_type="nuclei")
    mask, flows, styles, diams = model.eval(
        dapi_dset, channels=[0, 0], do_3D=True, net_avg=False, augment=False
    )

@jluethi
Copy link
Collaborator

jluethi commented Jun 30, 2022

Neat! Impressive that the current cluster GPUs can handle this :)
Generally, e.g. for the 9x8 cases, we'll need to run per fov for 3D models.

I have a meeting now, but will check for more of the details in the afternoon, so you should have more details on this by tomorrow

@tcompa
Copy link
Collaborator Author

tcompa commented Jun 30, 2022

Ok for the rest.

Briefly:

  1. We should expose the choice of which channel/level to use.
  2. We should support both 3D or 2D input.
  3. We should expose the cellpose parameters.
  4. We may compute anisotropy ourselves.

We should study the choice between custom or ome_zarr writing a bit further. If it wasn't for the trickier coarsening of labeling images, I'd be in favor of writing the pyramid by ourselves (which gives us full control on the scale transformations). In that case, there is no problem in doing labeling on (say) level 2, and then propagating it up and down in the pyramid with the correct scales.
In the ome_zarr version I don't see any control we could have, so we may have to first write and then modify the scale parameters by hand. I think they don't even let us specify a coarsening factor, meaning that as soon as it is different from 2 their writer could be wrong.

More later.

@jluethi
Copy link
Collaborator

jluethi commented Jun 30, 2022

Great summary @tcompa !

My first view of ome_zarr pyramidal code doesn't actually show any special treatment for label image downsampling, so maybe they don't handle it well either. I'll have a look at this later

@tcompa
Copy link
Collaborator Author

tcompa commented Jun 30, 2022

My first view of ome_zarr pyramidal code doesn't actually show any special treatment for label image downsampling, so maybe they don't handle it well either. I'll have a look at this later

write_labels

write_labels(labels, .., scaler, .., coordinatetransformations, ..) calls:

  1. _create_mip(labels, fmt, scaler, axes);
  2. write_multiscale_labels(mip, .., coordinatetransformations, ..).

_create_mip

The first call of _create_mip is unclear to me, at the moment. I think that it creates a pyramid), rather than a MIP. Probably it's just a naming issue, and this is what I would rather call create_pyramid. Let's see how it looks like:

The scale.nearest call within _create_mip (which creates a pyramid using skimage.transform.resize and cvs2.INTER_NEAREST) uses the __nearest and _by_plane methods of the Scaler class. The Scaler class also has attributes

  • self.downscale (coarsening factor in XY plane);
  • self.max_layer (number of pyramid levels);

Thus a possible way to proceed is to define an ome_zarr scaler with the correct attributes:

our_scaler = ome_zarr.scaler.Scaler()
our_scaler.downscale = 3   # for instance
our-scaler.max_layer = 4    # for instance

write_multiscale_labels

Once the parameters for pyramid creation are set correctly set, then we still have to check coordinatetransformations.
write_multiscale_labels calls

  1. write_multiscale(pyramid, .., coordinate_transformations=coordinate_transformations, ..);
  2. write_label_metadata(.., **({} if label_metadata is None else label_metadata)).

write_multiscale has an argument:

    :type coordinate_transformations: 2Dlist of dict, optional
    :param coordinate_transformations:
        List of transformations for each path.
        Each list of dicts are added to each datasets in order and must include a
        'scale' transform.

which we could use if our segmentation was not performed at the highest-resolution level. The question is: how would it mix with the default scale transformations? Would it just override it?
Answer:

    if coordinate_transformations is None:
        shapes = [data.shape for data in pyramid]
        coordinate_transformations = fmt.generate_coordinate_transformations(shapes)

Thus it seems that in our image_labeling we can construct the correct set of scale transformation (taking into account the level on which we did segmentation), and let it propagate through ome_zarr functions. This is to be checked, of course.

Links:
https://github.com/ome/ome-zarr-py/blob/master/ome_zarr/writer.py
https://github.com/ome/ome-zarr-py/blob/master/ome_zarr/scale.py

First impression

It seems that we could achieve everything while sticking with ome_zarr, but it's not clear whether they have a special pyramid-creation procedure for labels. I think the last line of this method

    def __nearest(self, plane: np.ndarray, sizeY: int, sizeX: int) -> np.ndarray:
        """Apply the 2-dimensional transformation."""
        return resize(
            plane,
            output_shape=(sizeY // self.downscale, sizeX // self.downscale),
            order=0,
            preserve_range=True,
            anti_aliasing=False,
        ).astype(plane.dtype)

is the only one that matters: if labels are encoded in a very small int, the resulting downsampled levels will be cast back to the same type. This we could obviously also enforce in our pyramids.

If this is correct, then let's pick the one that is simpler or that gives us more control on what's happening.

@tcompa
Copy link
Collaborator Author

tcompa commented Jun 30, 2022

(previous comment went through too early, now updated)

@jluethi
Copy link
Collaborator

jluethi commented Jun 30, 2022

Thanks for the detailed analysis @tcompa

First thoughts on this:
In the end, the ome-zarr implementation runs cvs2.INTER_NEAREST with anti_aliasing=False using the scikit image resizing. If we can easily do this or something equivalent, great :) Let's get to a first implementation we can test, we will notice very fast if labels are miss-assigned between pyramid levels!

If you think we can quickly have our own version, I have nothing against going that route. Would make it easier to scale eventually, if we want to write very large label images.

If our custom version with a similar setup is hard to implement, the way you describe passing the parameters to ome-zarr-py also sounds like a valid way to go so we can test the label image display :)

@tcompa
Copy link
Collaborator Author

tcompa commented Jul 4, 2022

As of 8e2998a, there is a (very preliminary) version of Fractal integration. In principle it works (see examples/example_uzh_1_well_2x2_sites.sh), but there are ongoing discussions concerning scalability (both for memory and GPU resources).

I would not test this on the 23-wells yet, but probably one well with 9x8 sites should work.

In principle the task works both for 3D and 2D (MIP) data, but at the moment Fractal always runs on the 3D images.

More updates later.

@tcompa
Copy link
Collaborator Author

tcompa commented Jul 5, 2022

Since memory usage for the labeling task is going to be an important issue here, let's keep some reference info here.
A line which is quite memory-expensive, in cellpose, is https://github.com/MouseLand/cellpose/blob/main/cellpose/core.py#L599:

yf = np.zeros((3, self.nclasses, imgs.shape[0], imgs.shape[1], imgs.shape[2]), np.float32)

As a reference, this could look like this:

$ python
Python 3.9.12 (main, Apr  5 2022, 06:56:58) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np

>>> x = np.random.randint(0, 100000, size=(3, 3, 10, 2160, 2560)).astype(np.float32)
>>> print(f"dype={x.dtype}, size={x.size}, size*itemsize={x.size*x.itemsize/1000000} GB")
dype=float32, size=497664000, size*itemsize=1990.656 GB

>>> x = np.random.randint(0, 100000, size=(3, 3, 15, 2160, 2560)).astype(np.float32)
>>> print(f"dype={x.dtype}, size={x.size}, size*itemsize={x.size*x.itemsize/1000000} GB")
dype=float32, size=746496000, size*itemsize=2985.984 GB

---- EDIT: more info

While running a single-well 9x8 case, I'm observing a moderate use of GPU memory, and an intensive use of CPU memory. This is a bit confusing, and requires more thoughts.

CPU memory:
Screenshot from 2022-07-05 10-23-46

GPU memory:
Screenshot from 2022-07-05 10-25-04

@jluethi
Copy link
Collaborator

jluethi commented Jul 5, 2022

@tcompa Are you processing a single original FOV (i.e. a single 2160x2560 chunk) and get those results?
That is indeed interesting. It's not too worrying if cellpose also needs CPU memory though. I think it does quite a bit of computation to process different network outputs like flow maps & probability maps into label images. If that requires memory on the CPU-side, so be it :)

Plus, the memory usage may be quite cellpose specific. So as long as it runs, that should do the trick for the moment I'd think.

If we heavily rely on it afterwards in production, we can think about optimizing it further.

@tcompa
Copy link
Collaborator Author

tcompa commented Jul 5, 2022

Quick update on labeling/parallelization/memory (probably best to discuss this tomorrow)

TL;DR

  1. We are currently running cellpose sequentially over sites, which is possibly slower but safer memory-wise;
  2. We are currently storing the array of labels for a whole well, non-lazily. Memory-wise, this scales badly for larger wells, but at the moment we don't have a better way (due to a mix of relabeling needs and the previous point).

We can probably live with point 1, but point 2 doesn't look good. More work is needed.

More details

We are testing a single well with 9x8 sites in a single-FOV scheme.
[Note: the single-FOV approach is the only one that we are currently supporting, as of recent decisions, and things won't change unless we explicitly change our mind (that is, we are not running anything multi-FOV).]

For labeling, at the moment we go through the 72 sites sequentially, i.e. there are never two cellpose calculations running at the same time.
Early experiments with more flexible approaches (letting dask organize the 72 parallel executions, through map_blocks or similar, including dask/dask#2768) led to memory errors within cellpose (for instance in the line mentioned above), and our obvious guess is that this was due to simultaneous execution of more than one cellpose calculation.

At the moment, we are not able to generate the array of labels of a whole well in a lazy way, and that's the reason for the large CPU-memory usage. For a well of 72 sites with 19 Z planes, this array takes around 30 GB (for uint32 labels). Which is still doable on our current RAM (64 G), but not really scalable. A quick mitigation is to switch back to uint16 labels, and only use uint32 when the number of labels becomes larger than 65k. But we would be just pushing the problem a bit further, not solving it.

If we dropped relabeling (which is not an option, in our understanding), we could try to construct this array lazily, by just:

  1. Execute cellpose for chunk 1/72 (first column of 3D data), and store it on disk;
  2. Execute cellpose for chunk 2/72 (second column of 3D data), and store it on disk;
  3. ...

This would still have two problems (1. concurrent execution of cellpose calculations can lead to memory errors, 2. lack of relabeling), but at least we would not be building the whole 9x8x19x2160x2560 array.

@jluethi
Copy link
Collaborator

jluethi commented Jul 5, 2022

Great summary @tcompa, let's discuss in detail tomorrow. Some short notes:

If we dropped relabeling (which is not an option, in our understanding)

Dropping relabeling would be a pity, the relabeling makes this quite useful. But maybe it's still worth it to save non-relabeled label values first per chunk, potentially running on multiple nodes. And then have a collection task that changes the label images to relabeled? Would be IO overhead, but typically label images are fairly small on disk.

In that case, we can then either run this job on CPU nodes with high memory (much cheaper than blocking GPU nodes and we would only have high memory demand for a short time) or eventually figure out ways to do lazy relabeling, e.g. based on the logging information about counts of objects per fov.

Let's discuss tomorrow whether that's worth the complexity.

@tcompa
Copy link
Collaborator Author

tcompa commented Jul 7, 2022

Some thoughts as of yesterday's Fractal meeting:

  1. A first useful update would be to separate labeling and relabeling.
  2. The next required use case is to run segmentation on a whole well, for 2D data (e.g. MIP). This will require:
    • A different structure of the task (re: what we parallelize over). See also point 3.
    • The possibility of working at lower-resolution levels (most likely).
  3. If/when we implement the ROI definition as in Field of view parallelization via OME-NGFF ROI tables fractal-tasks-core#24, the task can be restructured in a way that includes both use cases (parallelized over sites, or directly on the whole well, at whatever resolution level), and opens the door to parallelization over arbitrary ROIs.

@tcompa
Copy link
Collaborator Author

tcompa commented Jul 7, 2022

  • Missing: we still need to limit the number of simultaneous Cellpose calls on the same GPU, but how to do it from dask is currently unclear.

Probably useful:
https://docs.dask.org/en/stable/scheduler-overview.html#configuring-the-schedulers

Here's a minimal working example on a CPU:

mport time
import numpy as np
import dask
import dask.array as da
from concurrent.futures import ThreadPoolExecutor


n_threads = 4

def heavy_cellpose_function(block, block_info=None):
    print(f"Now running heavy cellpose function for chunk {block_info[None]['chunk-location']}")
    time.sleep(10)
    return np.ones_like(block, dtype=np.uint16) * np.random.randint(10)
x = da.zeros((8, 12), chunks=(4, 3), dtype=np.uint16)

# Extract total number of chunks
n_chunks = (x.shape[0] // x.chunksize[0]) * (x.shape[1] // x.chunksize[1])
print(f"Number of chunks: {n_chunks}")
print(f"Number of threads: {n_threads}")
print()

t0 = time.perf_counter()
with dask.config.set(pool=ThreadPoolExecutor(n_threads)):
    y = x.map_blocks(heavy_cellpose_function, chunks=(4, 3), meta=np.array((), dtype=np.uint16))
    y.to_zarr("data.zarr")
t1 = time.perf_counter()
print()
print(f"Elapsed time: {t1 - t0}")
print()

with output:

Number of chunks: 8
Number of threads: 4

Now running heavy cellpose function for chunk (0, 0)
Now running heavy cellpose function for chunk (0, 1)
Now running heavy cellpose function for chunk (0, 2)
Now running heavy cellpose function for chunk (0, 3)
Now running heavy cellpose function for chunk (1, 0)
Now running heavy cellpose function for chunk (1, 1)
Now running heavy cellpose function for chunk (1, 2)
Now running heavy cellpose function for chunk (1, 3)

Elapsed time: 20.044721081001626

Now we should check whether it also works on the GPU.

@tcompa
Copy link
Collaborator Author

tcompa commented Jul 7, 2022

Still on "running cellpose for several sites at the same time on the GPU":

We consider sites of shape (10,2160,2560), and we limit the number of concurrent cellpose executions as in #64 (comment).

We run the test on a 2x2 well, but we only count the sites which are actually run in parallel (i.e. if we have 4=3+1 sites and run them in batches of 3, we only look at the timing for the first three and we ignore the last one). This is the only relevant measure when scaling to large wells.

batch_size time_per_site time_per_site / batch_size Speed-up factor
1 ~4 min ~4 min 1
2 ~5 min ~2.5 min ~1.6
3 ~6 min ~2 min ~2
4 MemoryError - -

This suggests that:

  1. It is beneficial to add a little bit of same-GPU parallelism.
  2. This wont' scale to anything more than small batches of 2-3 sites. And for larger datasets (i.e. more Z planes), the maximum allowed batch_size could even be one.

tcompa added a commit that referenced this issue Jul 7, 2022
* Temporarily remove relabeling;
* Switch to dask.array.map_blocks;
* Limit number of simultaneous cellpose executions;
* Make sure that cellpose is not called more than once per site during pyramid creation (ref #97);
* Use np.max as aggregation function (ref #97).
@jluethi
Copy link
Collaborator

jluethi commented Jul 7, 2022

Great summary of the discussion at yesterday's meeting @tcompa :)

This suggests that:

  1. It is beneficial to add a little bit of same-GPU parallelism.
  2. This wont' scale to anything more than small batches of 2-3 sites. And for larger datasets (i.e. more Z planes), the maximum allowed batch_size could even be one.

To me, this would suggest that parallelization is interesting, but needs to either be a user parameter (easy) or we need to be able to estimate the load (hard). I'd go for a user parameter to start with. Then we can collect some experience what works well and maybe come up with good heuristics for a given model or such.
Also, really good if this is something that can be varied in this task, because a user may e.g. decide to run cellpose at pyramid layer 1 (once this becomes possible using e.g. the ROI approach described above) and then could potentially run ~10 sites at once (given 10 Z layers) etc.

=> The flexibility is conceptually interesting, when and how we'll use it remains a bit of an open question though that we'll figure out more later I'd assume

@tcompa tcompa changed the title Image labeling task Image labeling task - first (per FOV) usecase Jul 13, 2022
@tcompa tcompa changed the title Image labeling task - first (per FOV) usecase Image labeling at FOV level Jul 13, 2022
@tcompa
Copy link
Collaborator Author

tcompa commented Jul 15, 2022

As of 60a2722, the first image-labeling use case is roughly complete. It works for 2D or 3D images, always at the per-FOV level, and it has a relabeling=True/False argument. The labeling happens lazily over all FOV columns, and we can limit the number of concurrent cellpose executions with a num_threads argument. Relabeling is not done in a memory-efficient way, at the moment.

There are limits related to memory usage (different for labeling and relabeling) and runtimes (related to how many FOVs can be treated in parallel at the same time), which are better discussed in a new issue:

Issues related to the next use cases (whole-well segmentation in 2D and ROI-based scheme) are:

I'm closing this one.

@tcompa tcompa closed this as completed Jul 15, 2022
Repository owner moved this from In Progress (current sprint) to Done in Fractal Project Management Jul 15, 2022
tcompa added a commit that referenced this issue Jul 18, 2022
@jluethi jluethi moved this from Done to Done Archive in Fractal Project Management Oct 5, 2022
jacopo-exact pushed a commit that referenced this issue Nov 16, 2022
Better failure when task name already exists (closes #64)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants