-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add Zeiss CZI format #396
add Zeiss CZI format #396
Conversation
DCO signed off ✔️All commits have been signed off. You have certified to the terms of the Developer Certificate of Origin, version 1.1. In particular, you certify that this contribution has not been developed using information obtained under a non-disclosure agreement or other license terms that forbid you from contributing it under the GNU Lesser General Public License, version 2.1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! This is a large one, and will take multiple rounds of detailed review before it's ready to merge. You may want to consider removing the JXR support from this PR to keep things simple, and then adding JXR in a followup PR once the initial code lands. It's okay to stick with combining them into one PR if you'd like, but note that larger PRs are harder to land.
Also, OpenSlide doesn't currently have any drivers that pack fluorescence data into ARGB. We may eventually define new API for fluorescence support (#42) but for now I think I'd prefer not to add special-case behavior for a single vendor driver. If you'd like, you can move that code to a followup PR as well, though I can't promise it'll merge.
I've done an initial pass to point out some structural and formatting things I noticed, but haven't yet tried to read the code in detail. As a next step, please go through your code and systematically address those comments. In particular, its error handling and memory management will need to be adapted to consistently use OpenSlide's conventions. Please post a comment in the PR when it's ready for another round of review.
Can you provide some sample slides (at least one each of uncompressed and JXR) that we can redistribute as part of our test suite? I'd strongly prefer not to merge code this complex without test cases.
For the record, there's additional context about format documentation in this thread. |
Thank you for the review!
Can we keep JXR? Because CZI is not practical to use without JXR. As you can see in the submitted slide examples, a 66MB slide with JXR grows beyond 1GB if save as uncompressed.
I updated the PR to remove processing grayscale image. A side effect is it no longer reads the Plate1-Blue-A-xx.czi in openslide-testdata.
I uploaded few slides, their name starts with 10x_two_scenes. The uploader's name may be different because it requires google account. |
Yes, that's the plan. I'm just saying that you could split the PR into two, to make the reviews easier. I'm okay either way though.
Yes, but in the last question on the first page of the form, you didn't give us permission to redistribute the samples. Without that permission, we can't add the samples to openslide-testdata and can't use them in test cases. |
I re-submitted the same set of slides, this time with permission to redistribute. |
Great, thanks. I'll try to get those uploaded within the next few days, and then you'll be able to use them to create test cases. |
I've now uploaded the samples here as How are the changes going? Remember that the code will need to be converted to |
I have added 'GError' error handling. The newly added test cases failed to pass check because the build machine does not have libjxr. Is there a way to conditionally bypass zeiss driver test? |
I'm still seeing a number of places where functions are returning
#407 is the right way to handle this. Thanks for submitting it. |
Oh, sorry, I didn't read this properly. Yes, for optional libraries, you'll need to set |
Just FYI, I gathered a list of publicly available CZI files here if you want to test potential issues while reading czi format. |
Is this still in active development, or abandoned? |
It's been a while since I last looked at it. The last pull request was to add JPEG XR to GitHub CI, and it somehow got stuck. Two labs are using this patch, including support for fluorescence, and it works as expected. So if anyone needs to work with Zeiss scanner, they can build it independently. |
This driver add support for CZI format generated by Zeiss microscope. CZI format stores whole slide in many smaller tiles, or subblocks in CZI's term. The size of these tiles can exceed 2000x2000 pixels. Each tile has a associated directory entry, which describes its location, level 0 size, real tile size, the channel it was taken etc. A CZI file can be pyramid or non-pyramid. Openslide can read non-pyramid CZI, albeit much slower than the same slide in pyramid format. A CZI file can embed other files, such as CZI file or JPG. CZI call them attachments. This driver reads three of them: SlidePreview attachment as macro image in openslide associated images, Label attachment as label, and Thumbnail attachment as thumbnail. CZI stores image tile in JPEG XR or uncompressed. One can save CZI as uncompressed, which is simply stream of pixel bytes. The size is more than ten times larger than its JPEG XR encoded counterpart. The SlidePreview attachment somehow is stored in uncompressed format. CZI may use JPEG, LZW or ZSTD, however, none of files I saw uses any of them, therefor these decoders are not included. Images pixel can be: - BGR24(8bits per RGB color): used by bright field - BGR48(16 bits per RGB color): SlidePreview is BGR48 uncompressed - GRAY16: 16 bits gray image, used by fluorescence and TIE - GRAY8: Zeiss may have an option to generate 8 bits gray image but I haven't tested it. This driver convert BGR48 and GRAY16 into 8 bits per color (or channel) by keeping the most significant 8 bits. Zeiss may use 12 or 14 bits in GRAY16, this driver reads the effective pixel bits from XML metadata and convert pixels accordingly. At most three Gray channels can be combined into a pseudo ARGB image, the alpha channel is unused. This driver follows fluorescence microscopy convention when combine gray channels, i.e. the first gray channel to blue color, second gray channel to green, third to red. After detect samples on a slide, Zeiss captures each sample as separated scene. Because each image tile has a start x and y, openslide can show these multi-scenes whole slide even without knowing which scene a tile belongs. Nevertheless, this driver records the scene id when read the subblock directory entry. The JPEG XR decoder is from jxrlib. It is included in CentOS, Debian and Ubuntu. Because CentOS7, Debian 10 and 11, Ubuntu < 22 are all missing pkg-config file, an autogen.sh script is included to generate libjxr.pc for configure. jxrlib may be unavailable on some platforms. The configure script generates Makefile based on presence of jxrlib. It only builds JPEG XR decoder and zeiss driver when jxrlib is found. Signed-off-by: Wei Chen <[email protected]>
Signed-off-by: Wei Chen <[email protected]>
Signed-off-by: Wei Chen <[email protected]>
Signed-off-by: Wei Chen <[email protected]>
Signed-off-by: Wei Chen <[email protected]>
Signed-off-by: Wei Chen <[email protected]>
Signed-off-by: Wei Chen <[email protected]>
Signed-off-by: Wei Chen <[email protected]>
With range properties, user can skip empty area on a multi-scenes slide. Signed-off-by: Wei Chen <[email protected]>
Deepzoom may miss some scenes at max zoom out if scenes on a slide have different pyramid levels. Signed-off-by: Wei Chen <[email protected]>
Signed-off-by: Wei Chen <[email protected]>
Input of quickhash-1 is the first 544 bytes of CZI file. It is the file header, which has primary file GUID, file GUID, file part number, attachment directory position, metadata position, subblock directory position etc.al. The file header is quite unique. Signed-off-by: Wei Chen <[email protected]>
- test openslide_open(). - test openslide_read_region() on both scenes and on highest and lowest resolution. - test quickhash-1. Signed-off-by: Wei Chen <[email protected]>
Signed-off-by: Wei Chen <[email protected]>
They're only used on the bottom level, where they're equal to tw/th. Signed-off-by: Benjamin Gilbert <[email protected]>
Signed-off-by: Benjamin Gilbert <[email protected]>
Prep for next commit. Signed-off-by: Benjamin Gilbert <[email protected]>
Call it directly from create_czi(), rather than mutating the subblocks after create_czi() returns them. Signed-off-by: Benjamin Gilbert <[email protected]>
Always g_strndup() fixed-length strings from on-disk structures to avoid potential read overruns when including those strings in error messages. In particular, do this for att.name even though we currently don't reference it from a format string. Exclude magic checks from this rule. check_magic() already does safe in-place string comparison and it's called from many places. Signed-off-by: Benjamin Gilbert <[email protected]>
Add XML attributes and text nodes from the AttachmentInfos, DisplaySetting, Information, and Scaling elements of ImageDocument.Metadata. The Experiment and HardwareSetting elements are too verbose, and pertain more to the environment and its configuration than to the slide itself. The CustomAttributes element doesn't seem all that useful, and contains multi-line matrix values that aren't a good fit for OpenSlide properties. Omit "ImageDocument.Metadata" from property names, since it would make the names longer without disambiguating anything. All XML metadata elements are under ImageDocument.Metadata. Detect lists of identically-named elements using an heuristic and one special case; rename these elements using a unique identifier selected from their attributes by another heuristic. If we can't find a unique identifier, or if we find any identically-named elements after doing the renames, don't add properties for the offending elements. We can modify the heuristics if problems arise, and this avoids properties from multiple elements clobbering each other, or properties being added under unsupportable names. Signed-off-by: Benjamin Gilbert <[email protected]>
g_ascii_strto[u]ll is hard to use correctly: callers have to clear errno beforehand, check it afterward, and also pass and check endptr. Add wrappers to handle all that. While _openslide_parse_double() can return NAN on parse failure, integer parsers don't have a nice sentinel value, so return a boolean and write the parsed value through an out-argument. Currently all callers that want signed also want base 10, but one caller wants unsigned base 16, and alternate bases make more sense for unsigned anyway. Support alternate bases for unsigned but not for signed. Hamamatsu VMS depended on the ability to ignore trailing garbage, so we now need some extra code to explicitly reject the garbage. Signed-off-by: Benjamin Gilbert <[email protected]>
Read standard properties and metadata values from vendor properties. This is documented to be the preferred approach, since it ensures the underlying raw data is available from properties. Don't assume that the first objective in the Objectives list is the relevant one; dereference Information.Image.ObjectiveSettings.ObjectiveRef instead. Drop the XML XPath helpers we added. We don't use them anymore, and they're probably only useful for parsing values directly out of XML metadata, which we want to discourage. Signed-off-by: Benjamin Gilbert <[email protected]>
For openslide/openslide#396. Co-authored-by: Wei Chen <[email protected]> Signed-off-by: Benjamin Gilbert <[email protected]>
Prep commits cherry-picked into #597. |
Make it clearer that they're byte arrays, not strings. Signed-off-by: Benjamin Gilbert <[email protected]>
Signed-off-by: Benjamin Gilbert <[email protected]>
Other dimensions might plausibly be missing without loss of functionality, but subblocks need at least a width and a height. Signed-off-by: Benjamin Gilbert <[email protected]>
Don't do it as a side-effect of create_levels(). Signed-off-by: Benjamin Gilbert <[email protected]>
Signed-off-by: Benjamin Gilbert <[email protected]>
Signed-off-by: Benjamin Gilbert <[email protected]>
Signed-off-by: Benjamin Gilbert <[email protected]>
Signed-off-by: Benjamin Gilbert <[email protected]>
Signed-off-by: Benjamin Gilbert <[email protected]>
I think this is ready to go in. @iewchen, can you give it a final check? |
For openslide/openslide#396. Co-authored-by: Wei Chen <[email protected]> Signed-off-by: Wei Chen <[email protected]> Signed-off-by: Benjamin Gilbert <[email protected]>
Great! I can confirm it works. And squash merge sounds good. |
Thanks for the driver! And thanks for your patience through the long review/revision process. Now that the basic structure is in place, additional functionality should hopefully be easier to land. I'd suggest Zstandard next, since JXR has external dependencies which will need some additional work. I'm still interested in SIMD as well, though that may also be a more involved process. |
For openslide/openslide#396. Co-authored-by: Wei Chen <[email protected]> Signed-off-by: Wei Chen <[email protected]> Signed-off-by: Benjamin Gilbert <[email protected]>
For openslide/openslide#396. Co-authored-by: Wei Chen <[email protected]> Signed-off-by: Wei Chen <[email protected]> Signed-off-by: Benjamin Gilbert <[email protected]>
Add support for the CZI format generated by Zeiss microscopes. CZI stores a whole slide in many smaller tiles, or subblocks in CZI's term. The size of these tiles can exceed 2000x2000 pixels. Each tile has a associated directory entry, which describes its location, level 0 size, real tile size, color channel etc. A CZI file can be pyramidal or non-pyramidal, and OpenSlide can read both. A CZI file can embed other files, such as a CZI file or JPEG. CZI calls these attachments. This driver reads three of them as associated images: SlidePreview as the macro image, Label as the label, and Thumbnail as the thumbnail. CZI stores image tiles in JPEG XR, Zstandard, or uncompressed. This PR only supports uncompressed, which is simply a stream of pixel samples. The resulting size is more than ten times larger than its JPEG XR encoded counterpart. CZI may also use JPEG or LZW, but none of the files I saw use those. Image pixels can be: - BGR24 (8 bits per RGB color): used by brightfield - BGR48 (16 bits per RGB color): SlidePreview is BGR48 uncompressed - GRAY16: 16 bits gray image, used by fluorescence and TIE - GRAY8: Zeiss may have an option to generate 8 bits gray image but I haven't tested it. This driver supports BGR24 and BGR48, converting BGR48 into 8 bits per color by keeping the most significant 8 bits. After detecting samples on a slide, Zeiss captures each sample as a separated scene. Because each image tile has a start x and y, OpenSlide can show these multi-scene slides even without knowing to which scene a tile belongs. Nevertheless, this driver records the scene ID when reading the subblock directory entry. Scenes may have different numbers of pyramid levels, so this driver only exposes levels where image data is available in all scenes. Because decoded tiles can be around 20 MiB, the default cache size is only large enough for one tile at a time, which isn't enough to prevent thrashing. This driver resizes the default cache to hold at least two tiles. Signed-off-by: Wei Chen <[email protected]> Signed-off-by: Benjamin Gilbert <[email protected]> Co-authored-by: Benjamin Gilbert <[email protected]>
This driver add support for CZI format generated by Zeiss microscope.
CZI format stores whole slide in many smaller tiles, or subblocks in CZI's term. The size of these tiles can exceed 2000x2000 pixels. Each tile has a associated directory entry, which describes its location, level 0 size, real tile size, the channel it was taken etc. A CZI file can be pyramid or non-pyramid. Openslide can read non-pyramid CZI, albeit much slower than the same slide in pyramid format.
A CZI file can embed other files, such as CZI file or JPG. CZI call them attachments. This driver reads three of them: SlidePreview attachment as macro image in openslide associated images, Label attachment as label, and Thumbnail attachment as thumbnail.
CZI stores image tile in JPEG XR or uncompressed. One can save CZI as uncompressed, which is simply stream of pixel bytes. The size is more than ten times larger than its JPEG XR encoded counterpart. The SlidePreview attachment somehow is stored in uncompressed format. CZI may use JPEG, LZW or ZSTD, however, none of files I saw uses any of them, therefor these decoders are not included. Images pixel can be:
BGR24(8bits per RGB color): used by bright field
BGR48(16 bits per RGB color): SlidePreview is BGR48 uncompressed
GRAY16: 16 bits gray image, used by fluorescence and TIE
GRAY8: Zeiss may have an option to generate 8 bits gray image but I haven't tested it.
This driver convert BGR48 and GRAY16 into 8 bits per color (or channel) by keeping the most significant 8 bits. Zeiss may use 12 or 14 bits in GRAY16, this driver reads the effective pixel bits from XML metadata and convert pixels accordingly.
At most three Gray channels can be combined into a pseudo ARGB image, the alpha channel is unused. This driver follows fluorescence microscopy convention when combine gray channels, i.e. the first gray channel to blue color, second gray channel to green, third to red.
After detect samples on a slide, Zeiss captures each sample as separated scene. Because each image tile has a start x and y, openslide can show these multi-scenes whole slide even without knowing which scene a tile belongs. Nevertheless, this driver records the scene id when read the subblock directory entry.
The JPEG XR decoder is from jxrlib. It is included in CentOS, Debian and Ubuntu. Because CentOS7, Debian 10 and 11, Ubuntu < 22 are all missing pkg-config file, an autogen.sh script is included to generate libjxr.pc for configure. jxrlib may be unavailable on some platforms. The configure script generates Makefile based on presence of jxrlib. It only builds JPEG XR decoder and zeiss driver when jxrlib is found.