-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZarrReader performance improvements #64
Conversation
Conflicting PR. Removed from build BIOFORMATS-push#94. See the console output for more details.
--conflicts |
Conflicting PR. Removed from build BIOFORMATS-push#22. See the console output for more details.
|
Updated ZarrReader on idr-testing as described at IDR/idr-metadata#672 (comment) (after having used |
Today's build took 4 mins for setId on idr0025 Plate 3: IDR/idr-metadata#672 (comment) |
Conflicting PR. Removed from build BIOFORMATS-push#99. See the console output for more details.
|
Testing "black" images OMEZarrReader using build from 11th Sept when I first noticed it (today's didn't include this PR)... Testing on
Tried viewing a Plate Then I tried viewing a different Plate where I'd run mkngff at the same time but not viewed the images (no memo file created). Plate Memo file created just now...
But this image is black, even with the rendering settings adjusted to be bright: Also, the black images are rendering very slowly - ~25 seconds, compared to Plate |
Discussed this morning - improved logging, comparison between memo file generation on import and after mkngff. Update today's build of ZarrReader on Then tried viewing 3rd Plate from idr0025: http://localhost:1080/webclient/?show=image-3260835
|
First attempt to view the Image above gave this in webclient...
But on next attempt it worked and Image is NOT black! |
712022 ms is 12 minutes. Previous plates on idr0125 have taken 95 minutes for Plate 1 and 4 minutes also on Plate 3 on idr-testing server. |
Comparing regular import (no chunks) and timing of memo file creation with after mkngff... On idr0125-pilot (we already have NGFF data without chunks) - update ZarrReader as above...
But err logs have:
|
Reverted ZarrReader jar to e.g.
Blank images (no chunks) are viewable. I see several "saved memo" events in the logs. Times are 1.5 - 3.5 seconds.
Rendering of image planes in webclient takes 200-300 ms. Now update symlink as omero-server user... using IDR/idr-metadata#647 (comment)
Image can be viewed as normal - "Save to All" generates thumbs for the Plate. Then removed the memo file...
Trying to view the Image in webclient:
|
EDIT - Ignore this failure. See #64 (comment) Will try
Viewing image...
Blitz.log
Checked that we only have a single unique original file path/name:
|
EDIT - Ignore this failure. See #64 (comment) Try on Plate 2
(took 8 minutes)...
Not working...
|
EDIT - Ignore this failure. See #64 (comment)
On idr0125-pilot... still trying with Try with first 4 Plates from first screen... idr0011.csv
Something wrong with
Try with 2 more plates from idr0011...
|
EDIT - Ignore this failure. See #64 (comment) Looking at the Blitz log for the last exception above (viewing http://localhost:1080/webclient/?show=well-1227572 from
Looking at the code... Here, |
Discussed Friday's testing at IDR meeting today - I realise that the released So I will return to testing |
Using todays merge-ci build of ZarrReader: https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-push/617/console
Testing memo file generation by viewing previously unviewed mkngff plate from idr0033: |
Viewing image from 2nd plate: http://localhost:1080/webclient/?show=image-3230447
|
Updated the proxy |
Testing whether that last commit fixes issue with mismatching pixel values on multi-well plates (that contain OME/METADATA.ome.xml) as reported IDR/idr-metadata#675 (comment) |
The Channels issue above seems to be fixed 👍 However, ManagedRepo paths issue remains (see #64 (comment))
|
@will-moore, with that last commit I believe the file path issue should be resolved. It seems that |
Updated ZarrReader on idr-testing to today's build:
In webclient, checking managed repo paths I see eg:
This removed the |
@dgault @joshmoore @sbesson - What's the next steps towards getting this PR merged? If we are ready to get this merged and released then it would be great to get that ticked off the TODO list. |
It was the build from 13th Nov, if that helps. But I'll import a few images with the current build again. |
Based on the latest state of the conversation, it looks like no more changes are planned, is that correct? If so, any reason not to follow the standard approach i.e. assign a set of reviewers, get the PR approved, merge and release. My understanding is that between yourself and @dominikl, there has been a lot of functional review (and you should probably be reviewers). Do we also want to run some code review? Proposing we add this discussion to the upcoming planning meeting as I would also consider a ZarrReader tag as essential if we decide to spin up |
Yeah I believe this should be ready for a final review now. The one area that I can think of changing is the naming of the new options (https://github.com/ome/ZarrReader/pull/64/files#diff-71931eb54d3ee0b5a673044b8cdcdb8fa1bee03c583e3fc916e1b6b47a7dd7bbR93-R98). Currently as they have been added rather ad-hoc there is a mixture zarrreader and omezarr prefixes. So we probably want to devcide which we are keeping. Personally I would opt for OMEZarr as the preferred prefix. |
Following the discussion today I have updated the options names to all use the @will-moore, this will mean you will be required to add the following the the bfoptions file:
|
Updated the Then updated the ZarrReader to today's build to include last commits above. Regenerated memo files for regular plate and Looks good! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functionally this is working for me. LGTM 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performed a review of the latest state of the changes proposed here. I found no conceptual issue that would constitute a blocker but a few questions emerged as I was reading through the code:
- the signature of several private utility methods (
parseResolutionCount
,parseOmeroMetadata
,parseLabels
,parseImageLabels
) is updated to take the an attributes map as an extra input. For all these APIs, it looks like the first argumentroot
is no longer used by the API and could be completed removed from the signature? Since these are internal methods, this can be handled as a follow-up cleanup / patch release if necessary - the
parsePlate
utility also takes anattr
map as an extra argument but this is now the first argument rather than the last, is this done on purpose? - the extra
openZarr
boolean added to the newsetSeries/setResolution
APIs conceptually looks similar toisThisType(String id, boolean open)
but I share some of the wider concerns in https://github.com/ome/ZarrReader/pull/64/files#r1350073451. Given the nature of the format, I would expect the reader to internalize the decision of whether to open attribute files or not to be able to fetch additional metadata while trying to keep the number of reads minimal. In other terms I think theomezarr.quick_read
is a requirement in order to have this used at all for IDR but I wonder whether the mid-term goal is to have this option removed completely and just let the reader do the right thing - looking at the new options and their default value, my undertanding is that the two largest backwards-incompatible changes in the default behavior of the reader are that XML annotations for the JSON metadata are not created and label images are not treated as series anymore. Is that correct? I think both of these assumptions are completely acceptable to meet the IDR requirements but they should probably be communicated in the release notes to avoid surprises.
Yeah that parameter is no longer needed and can be safely removed from those method signatures
No, there is no specific reason for that, it can safely be moved to be the last parameter
I agree that the handling here is certainly not ideal and Im open to alternative suggestions. Unfortunately even in the quick read scenario we still need some of the calls to open the zarr and some to ignore it, so there has to be some logic handling that decision. The number of calls to When you say have the option removed, do you mean
Yes that is correct, the handling of the labels is probably the biggest functional change and should definitely be covered in the release notes |
Yes that's what I was referring to. Definitely not a blocker given the IDR timeline but I think that post-release, it might be useful to come back to this use case and understand what makes Should this alongside the API feedback be captured as issues and addressed in follow-up releases? |
Opened an Issue to capture the feedback here: #68 |
Getting this merged as it seems everyone is happy with the testing and handling the API feedback as part of the follow up issue. |
Initial performance improvements for ZarrReader related to #63
This PR is an initial PR for testing purposes, in its current state it looks to reduce the initFile time by about 50%, hopefully further improvements can be made.
Based on a subset of idr0036 the initial profiling looked as below,
The updated profiling should look more like this (the Zarrutils.toJSON should also be gone). If we can manage to remove some of the remaining JZarr calls then this may be reduced further:
The initial fixes here target the below areas: