-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Image labeling at FOV level #64
Comments
To provide a simple first implementation of how to do this (more complicated, custom versions will follow later from @gusqgm): Let's first implement nuclear segmentation using the cellpose library. Can be installed as It works well to segment nuclei in the Pelkmans lab test sets I provided. Here is example code on how to use it:
Probably best to load e.g. pyramid level 1 (let's make this an option, but start with 1 for the moment). Processing full res images through cellpose poses quite a high memory demand to the GPU, which is why we typically used downsampled images here. And the label mask saved under |
Regarding saving images: I've tested this and it works well to save images site by site. If I have a label image, I can save it to an existing OME-Zarr file by doing this:
This creates the necessary folder in the zarr file and writes the label image to disk with pyramids already. We may want to parse some additional parameters about the pyramids etc. (e.g. the coarsening factors), but the ome-zarr writer function for labels seems like a good start overall. |
We can view label images in the napari viewer using the napari-ome-zarr plugin: Unfortunately, the visualization of labels does not seem to be working for HCS plates at the moment (see ongoing discussion here: ome/ome-zarr-py#65 (comment)). But we can probably just proceed as planned for the moment and just check the label images by looking at single fovs. |
Also, all of this will be quite a bit simplified once we use multi-fovs (each site saved to an individual fov), see #36 and here for the visualization issues #66 |
Quick question: which versions of cellpose (1 or 2) and python are you using? |
Let's go for version 2 now. We mostly used version 1 before, but version 2 has great improvements and we're certainly more interested in this going forward :) |
Ok, thanks. I guess the bit of code in this issue was working with v1, right? Because for instance |
@tcompa Ah, yes, can be that there were some changes in that setting. If you have code that runs, I can test it over lunch quickly or later tonight. Otherwise, I can look into providing an updated example that I verified separately by tomorrow. Verifying that it runs on a GPU is trickier, as I can't do that with my local GPU, so same for whether the "runs on GPU" option works. |
At the moment I'm only dealing with installing cellpose correctly, and making sure it runs on the GPU partition and sees the GPU. Anything related to code actually doing something is for later. |
We'd need some more details:
|
(no worries,
I am just sticking with the defaults for the moment. Next question is where to store label metadata in the zarr file. If we segment level |
Just reporting my observations: We should understand:
By now I modified the |
I'm adding a prototype task To run it (outside Fractal) I use this script
The segmentation of the highest-resolution level (of a single 2x2 well, single channel --> shape |
We should now get started with the discussion of what should be wrapped around this skeleton code. What inputs from the user? What possible outputs? What other behaviors should be supported? |
The question of at what resolution level the network should be run is a very good one! I actually ran the 3D model per site at pyramid level 1, because level 0 was too large for the GPU I had back then. This is something we may want to do for GPU memory or performance reasons at different instances, because we often don't need too fine a segmentation anyway and pyramid level 1 or even 2 may be detailed enough. => interesting that you ran at level 1, can we expose this as an option? Also, I think we may just save the labels at the lowest pyramid level that we have (e.g. at level 1 if we don't have labels for level 0). For visualization, that probably works with the scale parameters set correctly. Not sure though if this will be a headache during analysis or if that is easily solved, so we may get back to this later.
Let's see if that is good enough for what we need or whether we'll need to have our own implementation. Your output already looks very promising! Did you run this on the MIP or on a single slice? |
It ran on the 3D image, with shape
|
Neat! Impressive that the current cluster GPUs can handle this :) I have a meeting now, but will check for more of the details in the afternoon, so you should have more details on this by tomorrow |
Ok for the rest. Briefly:
We should study the choice between custom or More later. |
Great summary @tcompa ! My first view of |
write_labels
_create_mipThe first call of The
Thus a possible way to proceed is to define an
write_multiscale_labelsOnce the parameters for pyramid creation are set correctly set, then we still have to check
which we could use if our segmentation was not performed at the highest-resolution level. The question is: how would it mix with the default
Thus it seems that in our Links: First impressionIt seems that we could achieve everything while sticking with
is the only one that matters: if labels are encoded in a very small If this is correct, then let's pick the one that is simpler or that gives us more control on what's happening. |
(previous comment went through too early, now updated) |
Thanks for the detailed analysis @tcompa First thoughts on this: If you think we can quickly have our own version, I have nothing against going that route. Would make it easier to scale eventually, if we want to write very large label images. If our custom version with a similar setup is hard to implement, the way you describe passing the parameters to |
As of 8e2998a, there is a (very preliminary) version of Fractal integration. In principle it works (see I would not test this on the 23-wells yet, but probably one well with 9x8 sites should work. In principle the task works both for 3D and 2D (MIP) data, but at the moment Fractal always runs on the 3D images. More updates later. |
Since memory usage for the labeling task is going to be an important issue here, let's keep some reference info here.
As a reference, this could look like this:
---- EDIT: more info While running a single-well 9x8 case, I'm observing a moderate use of GPU memory, and an intensive use of CPU memory. This is a bit confusing, and requires more thoughts. |
@tcompa Are you processing a single original FOV (i.e. a single 2160x2560 chunk) and get those results? Plus, the memory usage may be quite cellpose specific. So as long as it runs, that should do the trick for the moment I'd think. If we heavily rely on it afterwards in production, we can think about optimizing it further. |
Quick update on labeling/parallelization/memory (probably best to discuss this tomorrow) TL;DR
We can probably live with point 1, but point 2 doesn't look good. More work is needed. More detailsWe are testing a single well with 9x8 sites in a single-FOV scheme. For labeling, at the moment we go through the 72 sites sequentially, i.e. there are never two cellpose calculations running at the same time. At the moment, we are not able to generate the array of labels of a whole well in a lazy way, and that's the reason for the large CPU-memory usage. For a well of 72 sites with 19 Z planes, this array takes around 30 GB (for uint32 labels). Which is still doable on our current RAM (64 G), but not really scalable. A quick mitigation is to switch back to uint16 labels, and only use uint32 when the number of labels becomes larger than 65k. But we would be just pushing the problem a bit further, not solving it. If we dropped relabeling (which is not an option, in our understanding), we could try to construct this array lazily, by just:
This would still have two problems (1. concurrent execution of cellpose calculations can lead to memory errors, 2. lack of relabeling), but at least we would not be building the whole |
Great summary @tcompa, let's discuss in detail tomorrow. Some short notes:
Dropping relabeling would be a pity, the relabeling makes this quite useful. But maybe it's still worth it to save non-relabeled label values first per chunk, potentially running on multiple nodes. And then have a collection task that changes the label images to relabeled? Would be IO overhead, but typically label images are fairly small on disk. In that case, we can then either run this job on CPU nodes with high memory (much cheaper than blocking GPU nodes and we would only have high memory demand for a short time) or eventually figure out ways to do lazy relabeling, e.g. based on the logging information about counts of objects per fov. Let's discuss tomorrow whether that's worth the complexity. |
Some thoughts as of yesterday's Fractal meeting:
|
Probably useful: Here's a minimal working example on a CPU:
with output:
Now we should check whether it also works on the GPU. |
Still on "running cellpose for several sites at the same time on the GPU": We consider sites of shape We run the test on a 2x2 well, but we only count the sites which are actually run in parallel (i.e. if we have 4=3+1 sites and run them in batches of 3, we only look at the timing for the first three and we ignore the last one). This is the only relevant measure when scaling to large wells.
This suggests that:
|
Great summary of the discussion at yesterday's meeting @tcompa :)
To me, this would suggest that parallelization is interesting, but needs to either be a user parameter (easy) or we need to be able to estimate the load (hard). I'd go for a user parameter to start with. Then we can collect some experience what works well and maybe come up with good heuristics for a given model or such. => The flexibility is conceptually interesting, when and how we'll use it remains a bit of an open question though that we'll figure out more later I'd assume |
As of 60a2722, the first image-labeling use case is roughly complete. It works for 2D or 3D images, always at the per-FOV level, and it has a There are limits related to memory usage (different for labeling and relabeling) and runtimes (related to how many FOVs can be treated in parallel at the same time), which are better discussed in a new issue: Issues related to the next use cases (whole-well segmentation in 2D and ROI-based scheme) are:
I'm closing this one. |
Better failure when task name already exists (closes #64)
Hi @gusqgm and @jluethi, let's discuss here the details of the image labeling task.
The text was updated successfully, but these errors were encountered: