Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow reading and writing compressed tiles #3992

Merged
merged 30 commits into from
Dec 1, 2023

Conversation

melissalinkert
Copy link
Member

Includes changes in #3911, for easier testing. https://github.com/melissalinkert/bioformats/compare/dicom-dual-personality...melissalinkert:bioformats:precompressed-tiles?expand=1 shows difference between what is here and what is in #3911.

This adds interfaces (ICompressedTileReader, ICompressedTileWriter) and limited implementation (SVSReader, DicomWriter) so that compressed data stored in an input dataset can be directly transferred to an output dataset with decompressing and recompressing. The -precompressed option in bfconvert turns this feature on, when supported by the input and output formats. components/formats-gpl/utils/PrecompressedExample.java and components/formats-gpl/utils/PrecompressedConversionExample.java are also simpler examples of how the API is meant to work.

Most of the reader wrappers explicitly prohibit use of precompressed tiles, as most perform operations on the pixel data.

Right now, the only supported input format is SVS. I have only tested with JPEG input data so far, JPEG-2000 is next. The only supported output format is DICOM, with sequential writing (TILED_FULL) and JPEG compression. Attempting to convert any other combination of formats using precompressed tiles should throw an exception.

I would expect both of these commands to produce a readable DICOM dataset:

  • bfconvert -noflat -precompressed -compression JPEG CMU-1.svs test.dcm
  • javac PrecompressedConversionExample.java && java -mx1g PrecompressedConversionExample CMU-1.svs test-example.dcm

More testing with a wider range of SVS datasets is required, and on my to-do list along with JPEG-2000 support as noted above.

There are a number of rough edges and places for improvement, for which opinions or suggestions are welcome:

  • The checks for compatibility between the input data/reader options and output data/writer options should be improved. See primarily FormatTools.canUsePrecompressedTiles(...) and DicomWriter.saveCompressedBytes(...).
  • ICompressedTileReader.openCompressedBytes(...) takes XY indexes in terms of the tile column and row index - not pixel XY coordinates. This was meant to indicate that the reader's native tile layout has to be respected, and an arbitrary region of the image can't be requested. This is inconsistent with all other tile handling methods in IFormatReader and IFormatWriter, though.
  • Variable compressed tile dimensions aren't handled very nicely at the moment. bfconvert and the examples should switch between precompressed and "normal" tile conversion as needed (provided the reader/writer allow for precompressed tiles in the first place), but there is definitely room for improvement.
    • DicomWriter.saveCompressedBytes(...) has a note about this, as label/macro/other extra images aren't meant to be saved tile-wise.
    • DicomWriter.setTileSizeX(int) also has a note about this, as the tile size can only be set once for the whole dataset right now. We probably want to be able to pick different tile sizes for each pyramid resolution.
    • bfconvert -precompressed ... should warn if a particular image or tile was converted using normal decompression/recompression
  • There is a big assumption that the compressed tiles provided to saveCompressedBytes(...) match what the writer expects, both in terms of compression type and dimensions. I don't know of a great way to defend against storing tiles that are the wrong size, JPEG-2000 when they should be JPEG, etc. There is also a potential issue of the last row and column being a bit weird - if the reader handles tile padding differently from how the writer expects it, that could result in converted images that look wrong.
    • For JPEG/JPEG-2000 we can probably introspect the tile width and height without a full tile decompression. That wouldn't be true if we were to implement this feature for something like TIFF + LZW, though.
    • If, for example, the reader does not pad the last column of tiles to the full tile width, should the writer throw an exception? Should conversion fall back to normal decompression/recompression for those tiles only?

As the draft status suggests, this is not ready for merging. I expect to add more commits here based on the notes above and any comments.

/cc @fedorov, @dclunie for feedback

Also required update to TiffCompression API to get the underlying Codec instance.
This doesn't work quite yet, but gives a general idea of how the API currently works.
Needs some additional compatibility checks to make sure input data's compression
type matches output compression type.
This updates the DICOM writer to enforce the same tile size for the whole dataset
at the {set,get}TileSize{X,Y} level. setId and save*Bytes are not currently set up
to allow different tile sizes for different images (though we could change that).

If bfconvert encounters an input image whose tile size does not match the output tile size,
then precompressed tiles are not used.
Seems to work in initial test with bfconvert on a JPEG-2000-compresed SVS,
but may require additional changes following further testing.
@dclunie
Copy link

dclunie commented Jun 18, 2023

Sometimes the assumption that compressed tiles are valid is not reliable - some SVS files seem to have zero length frames for some tiles (which is illegal TIFF IMHO, but it happens) and this causes DICOM readers to get out of sync, so the zero-length JPEG bitstream for that tile needs to be replaced with a genuine "blank" JPEG bitstream (e.g., a totally white tile synthesized with the right frame size).

If not corrected this results in tearing of the image as one zooms into higher resolution.

See this example:

https://storage.cloud.google.com/dac-wsi-conversion-source-v2/TCGA/TCGA-GBM/0de078d1-27d2-43fc-8173-ee9cf8e42e42/TCGA-02-0001-01Z-00-DX1.83fce43e-42ac-4dcd-b156-2908e75f2e47.svs

([email protected] should have access to that bucket)

PS. I have also occasionally seen in SVS files some invalid JPEG bitstreams that don't start with an SOI marker segment, but this doesn't seem to bother your converter, at least when the output is viewed in QuPath. See for example:

https://storage.cloud.google.com/idc-source-icdc-glioma/GLIOMA01-i_8A0A.svs

Not sure how you are handling the insertion of missing JPEG tables for those bitstreams for such frames.

Apart from those issues I have not encountered other scenarios where the compressed bitstreams in the TIFF SVS) tiles do not match expectations.

@melissalinkert
Copy link
Member Author

Thanks for looking at this, @dclunie.

See this example:

https://storage.cloud.google.com/dac-wsi-conversion-source-v2/TCGA/TCGA-GBM/0de078d1-27d2-43fc-8173-ee9cf8e42e42/TCGA-02-0001-01Z-00-DX1.83fce43e-42ac-4dcd-b156-2908e75f2e47.svs

([email protected] should have access to that bucket)

I don't have access there, or to https://storage.cloud.google.com/idc-source-icdc-glioma/GLIOMA01-i_8A0A.svs. As with previous datasets, you will need to add individual email addresses (we can't actually log in as [email protected]).

@dclunie
Copy link

dclunie commented Jun 20, 2023

Oops ... try [email protected] now ...

@melissalinkert
Copy link
Member Author

That's great, thank you - I am able to see both files now.

@snoopycrimecop
Copy link
Member

Conflicting PR. Removed from build BIOFORMATS-push#69. See the console output for more details.
Possible conflicts:

--conflicts

Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Carried out some functional testing using the standard CMU-1.svs sample file from OpenSlide as the input and the command-line bfconvert utility primarily.

Confirmed that using any other format than DICOM as the output e.g. TIFF, OME-TIFF fails with a meaningful exception when passing -precompressed. The same behavior happens when selecting any other input format which does not support precompressed tile reading.

The following table reports the conversion time and DICOM filesets size when converting CMU-1.svs into a WSI DICOM with various options

Execution time Size
-precompressed -noflat -compression JPEG 21.32s 181M
-precompressed -noflat -compression JPEG-2000 16.601s 182M
-noflat -compression JPEG -tilex 256 -tiley 256 160.937s 122M
-noflat -compression JPEG-2000 -tilex 256 -tiley 256 566.435s 549M

A few issues noted while running and validating the conversion workflows above:

  • omitting -tilex and -tiley when converted without the -precompressed fails with java.lang.ArithmeticException: / by zero as the tile size is reported as Tile size = 0 x 0
  • the DICOM file created with JPEG-2000 and pre-compression fails to open with the showinf utility. The utility seems to be stuck at the Calculating image offsets step

I will run additional validation and comparative tests tomorrow using the last released version of Bio-Formats to identify whether these are regressions.

byte[] buf = getTile(reader, writer.getResolution(), index,
xCoordinate, yCoordinate, width, height);

// if we asked for precompressed tiles, but that wasn't possible,
// then log that decompression/recompression happened
// TODO: decide if an exception is better here?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least in my testing of the CMU-1.svs sample file from OpenSlide, there was only plane where this was an issue. I don't think throwing an exception is a better alternative since it might significantly restrict the usage of this feature and might happen quite late in the conversion as well which is frustrating for the end-user.

Additionally, there is still some significant gain associated to using precompressed tiles for the highest resolutions even if the lowest resolutions are decompressed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the risk of adding too much complexity, I could see having an option (flag or -option) to also provide whichever of warning or error is not the default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for now having it logged is a better option than throwing the exception

// TODO: decide if an exception is better here?
if (precompressed && !tryPrecompressed) {
LOGGER.warn("Decompressed tile: series={}, resolution={}, x={}, y={}",
writer.getSeries(), writer.getResolution(), x, y);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning message is currently written for every single tile of the resolution but it applies to the entire resolution. Would it make sense to compute and display it outside the loop?
Otherwise we might want to align the display with the rest of the utility as this reads a bit confusing if you don't know what is happening

Tile size = 256 x 256
Decompressed tile: series=0, resolution=3, x=0, y=0
Decompressed tile: series=0, resolution=3, x=1, y=0
Decompressed tile: series=0, resolution=3, x=2, y=0
Decompressed tile: series=0, resolution=3, x=3, y=0
Decompressed tile: series=0, resolution=3, x=0, y=1
Decompressed tile: series=0, resolution=3, x=1, y=1
Decompressed tile: series=0, resolution=3, x=2, y=1
Decompressed tile: series=0, resolution=3, x=3, y=1
Decompressed tile: series=0, resolution=3, x=0, y=2
Decompressed tile: series=0, resolution=3, x=1, y=2
Decompressed tile: series=0, resolution=3, x=2, y=2
Decompressed tile: series=0, resolution=3, x=3, y=2
	Series 0: converted 1/1 planes (100%)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the whole resolution needs to be decompressed/recompressed, this should now only warn once. If some of the tiles in the resolution work with -precompressed and some do not, it will still warn once for each tile that doesn't work. Mostly I'd expect that to happen with right and bottom edge tiles, which could be smaller and need to be padded to the correct size.

boolean sameTileWidth = reader.getOptimalTileWidth() == writer.getTileSizeX();
boolean sameTileHeight = reader.getOptimalTileHeight() == writer.getTileSizeY();

// TODO: this needs to check for compatible compression options
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would these be additional arguments to the signature? Or is there some mechanism to introspect the compression of a given resolution?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

30f1c3d should address this, by allowing the Codec (not string compression type) to be retrieved from the writer. If the reader and writer Codec are not instances of the same class, then using precompressed tiles is not allowed. This should solve the issues with -precompressed -compression JPEG-2000 with CMU-1.svs and -precompressed -compression JPEG with JPEG-2000 data.

@snoopycrimecop
Copy link
Member

Conflicting PR. Removed from build BIOFORMATS-push#689. See the console output for more details.
Possible conflicts:

--conflicts

@sbesson
Copy link
Member

sbesson commented Nov 24, 2023

Retested conversion of the same CMU-1.svs file with the latest released Bio-Formats (7.0.1). Omitting -tilex and -tiley, the bfconvert command reports a correct tile size but fails with a different error message:

Tile size = 256 x 256
Exception in thread "main" loci.formats.FormatException: Tile too small, expected 46000x32914. Setting the tile size to 46000x32914 or smaller may work.
 at loci.formats.out.DicomWriter.saveBytes(DicomWriter.java:207)
 at loci.formats.ImageWriter.saveBytes(ImageWriter.java:260)

This error comes from the work introduced in #4097. However passing explicitly -tilex 256 -tiley 256 allows to convert so at minimum the error message was misleading.

@@ -384,6 +388,10 @@ private void printUsage() {
" -no-sequential: do not assume that planes are written in sequential order",
" -swap: override the default input dimension order; argument is f.i. XYCTZ",
" -fill: byte value to use for undefined pixels (0-255)",
" -precompressed: transfer compressed bytes from input dataset directly to output.",
" Most input and output formats do not support this option.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two quick brainstormy thoughts:

  • I could see an option to formatlist that would print out a table of which of these interfaces is supported by each format.
  • It would be interesting to see (OME-)TIFF as another supported writer format for this to have an N>2.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree these would be nice targets for future work, but are well outside the scope of this PR and not realistic for 7.1.0. OK to open new issues for each, so that we can schedule for future releases?

@snoopycrimecop
Copy link
Member

Conflicting PR. Removed from build BIOFORMATS-push#690. See the console output for more details.
Possible conflicts:

--conflicts

@snoopycrimecop
Copy link
Member

Conflicting PR. Removed from build BIOFORMATS-push#691. See the console output for more details.
Possible conflicts:

--conflicts

@snoopycrimecop
Copy link
Member

snoopycrimecop commented Nov 27, 2023

Conflicting PR. Removed from build BIOFORMATS-push#692. See the console output for more details.
Possible conflicts:

--conflicts Conflict resolved in build BIOFORMATS-push#693. See the console output for more details.

Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through a more comprehensive testing using the following sample files

  • the CC0 CMU-1.svs sample mentioned above where all original IFDs use JPEG compression except for the macro image which is LZW-compressed
  • the CC-BY 77917.svs sample file where all IFDs associated with the whole slide image use JPEG-2000 compression, the macro image uses LZW compression and the label image uses JPEG compression
    Conversion was performed:
  • using Bio-Formats 7.0.1 with JPEG and JPEG-2000 compression (-noflat -compression JPEG -tilex 256 -tiley 256 and -noflat -compression JPEG-2000 -tilex 256 -tiley 256)
  • using Bio-Formats 7.1.0-SNAPSHOT with this PR included with JPEG and JPEG-2000 compression (-noflat -compression JPEG -tilex 256 -tiley 256 and -noflat -compression JPEG-2000 -tilex 256 -tiley 256)
  • using Bio-Formats 7.1.0-SNAPSHOT with this PR included with JPEG and JPEG-2000 compression using pre-compressed tiles (-precompressed -noflat -compression JPEG and -precompressed -noflat -compression JPEG-2000)

The result of the conversion is output below

Source Bio-Formats Compression Pre-compressed Execution time Size setId time
CMU-1.svs 7.0.1 JPEG no 237.062 109M 1.613s
CMU-1.svs 7.0.1 JPEG-2000 no 735.462s 539M 4.749s
77917.svs 7.0.1 JPEG no 1279.184s 815M 11.341s
77917.svs 7.0.1 JPEG-2000 no 3248.119s 5.0G 39.501s
CMU-1.svs 7.1.0-SNAPSHOT JPEG no 238.611s 109M 1.622s
CMU-1.svs 7.1.0-SNAPSHOT JPEG-2000 no 683.246s 539M 4.637s
77917.svs 7.1.0-SNAPSHOT JPEG no 1232.993s 815M 7.805s
77917.svs 7.1.0-SNAPSHOT JPEG-2000 no 3084.838s 5.0G 36.061s
CMU-1.svs 7.1.0-SNAPSHOT JPEG yes 85.187s 176M 2.352s
CMU-1.svs 7.1.0-SNAPSHOT JPEG-2000 yes 87.122s 178M N/A
77917.svs 7.1.0-SNAPSHOT JPEG yes 264.604s 673M N/A
77917.svs 7.1.0-SNAPSHOT JPEG-2000 yes 257.726s 674M 4.733s

All datasets include valid TIFF tag as per the work of #3911 The DICOM datasets listed with N/A in the initialization time above hang at the Calculating image offsets stage. Running tiffinfo also displays a warning

$ tiffinfo 7.1.0-SNAPSHOT/jpeg_precompressed/77917/77917_0_0.dcm 
JPEGFixupTagsSubsampling: Warning, Unable to auto-correct subsampling values, likely corrupt JPEG compressed data in first strip/tile; auto-correcting skipped.
TIFF Directory at offset 0x2424ab62 (606382946)
  Image Width: 96999 Image Length: 45667
  Tile Width: 256 Tile Length: 256
  Resolution: 39525.7, 39525.7 pixels/cm
  Bits/Sample: 8
  Sample Format: unsigned integer
  Compression Scheme: JPEG
  Photometric Interpretation: YCbCr
  Samples/Pixel: 3
  Planar Configuration: single image plane
  Software: OME Bio-Formats 7.1.0-SNAPSHOT

All valid datasets were imported into OMERO alongside the originals for visual validation

Screenshot 2023-11-28 at 11 05 15
Screenshot 2023-11-28 at 11 04 46

Overall, a large portion of the logic works as expected and the precompressed transfer results in a noticeable speed-up of the conversion time in some scenarios.
From an API perspective, I think the API additions make sense. Essentially, the openBytes/saveBytes APIs of the readers/writers are supplemented by openCompressedBytes/saveCompressedBytes equivalent as well as a few utility functions through two new top-level interfaces. The usage of Java 8+ default implementation means the changes should remain backwards compatible.

The primary issue that came up from the functional testing happens when using the -precompressed flag and mismatching compression schemes between the source and the target. Ideally this scenario should be detected and fall back to decompressing/recompressing the tiles as done for mismatching compressions e.g. LZW -> JPEG.

In addition to the above:

  • the tile size seems to be reported as 0x0 in the command-line utility
  • the tile size needs to be specified as an input but this seems to have been the case before this PR
  • the file size of the DICOM datasets created with pre-compressed tiles is consistent with the original files but differs from the files created with decompression/recompression. I assume this is mostly a reflection of the difference in different codec implementations?

Adding one final usability/RFE thought which might be outside the scope of the original scoping of this work. The conversion of 77917.svs provides an interesting example as the original file (SVS) uses several compression schemes with a subset of them being available in the target format (DICOM). For an optimal conversion, one could almost imagine a more adaptive scenario where each series would be copied using either the original compression if the target format supports it or fallback to the compression specified by -compression xxx if the target format does not support it.

@melissalinkert
Copy link
Member Author

Thanks for the detailed write-up, @sbesson.

The primary issue that came up from the functional testing happens when using the -precompressed flag and mismatching compression schemes between the source and the target. Ideally this scenario should be detected and fall back to decompressing/recompressing the tiles as done for mismatching compressions e.g. LZW -> JPEG.

As noted above, 30f1c3d should fix this.

the tile size seems to be reported as 0x0 in the command-line utility

I think that's fixed with 2b1d661 and 3b63973, but let me know if you still see a problem when re-testing.

the tile size needs to be specified as an input but this seems to have been the case before this PR

bfconvert -noflat -precompressed -compression JPEG CMU-1.svs CMU-1.dcm at least should work now, but again, if I've missed something let me know.

the file size of the DICOM datasets created with pre-compressed tiles is consistent with the original files but differs from the files created with decompression/recompression. I assume this is mostly a reflection of the difference in different codec implementations?

Codec implementation and quality settings are going to be a factor; in this case, channel interleaving (or lack thereof) may be the biggest difference. SVSReader will report interleaved = false, so another possible direction for future work is to either change that reader or make it so that bfconvert can transform between interleaved and non-interleaved data.

Adding one final usability/RFE thought which might be outside the scope of the original scoping of this work. The conversion of 77917.svs provides an interesting example as the original file (SVS) uses several compression schemes with a subset of them being available in the target format (DICOM). For an optimal conversion, one could almost imagine a more adaptive scenario where each series would be copied using either the original compression if the target format supports it or fallback to the compression specified by -compression xxx if the target format does not support it.

Definitely not comfortable with trying to do that for 7.1.0, but an issue to look at that in a future release would be completely fine.

Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the last commit, -tilex 256 -tiley 256 can now be ommitted from the conversion commands (both with and without -precompressed) with this PR included and the conversion numbers look as follows:

Source Bio-Formats Compression Pre-compressed Execution time Size setId time
CMU-1.svs 7.0.1 JPEG no 238.378s 109M 1.613s
CMU-1.svs 7.0.1 JPEG-2000 no 665.698s 539M 4.749s
77917.svs 7.0.1 JPEG no 1252.47s 815M 11.341s
77917.svs 7.0.1 JPEG-2000 no 3097.387s 5.0G 39.501s
CMU-1.svs 7.1.0-SNAPSHOT JPEG no 239.225s 109M 1.516s
CMU-1.svs 7.1.0-SNAPSHOT JPEG-2000 no 709.188s 539M 5.152s
77917.svs 7.1.0-SNAPSHOT JPEG no 1279.364s 815M 9.395s
77917.svs 7.1.0-SNAPSHOT JPEG-2000 no 3191.433s 5.0G 43.525s
CMU-1.svs 7.1.0-SNAPSHOT JPEG yes 85.224s 176M 2.422s
CMU-1.svs 7.1.0-SNAPSHOT JPEG-2000 yes 684.453s 539M 5.892s
77917.svs 7.1.0-SNAPSHOT JPEG yes 1247.511s 815M 11.32s
77917.svs 7.1.0-SNAPSHOT JPEG-2000 yes 259.626s 674M 10.589s

All generated DICOM datasets can now successfully be loaded into OMERO and visually validated.

The conversion log only displays a single warning per resolution

[Aperio SVS] -> 7.1.0-SNAPSHOT/jpeg_precompressed/77917/77917.dcm [DICOM]
More than 4GB of pixel data, compression will need to be used
Tile size = 256 x 256
Decompressing resolution: series=0, resolution=0
        Series 0: converted 1/1 planes (100%)
Tile size = 256 x 256
Decompressing resolution: series=0, resolution=1
        Series 0: converted 1/1 planes (100%)
Tile size = 256 x 256
Decompressing resolution: series=0, resolution=2
        Series 0: converted 1/1 planes (100%)
Tile size = 256 x 256
Decompressing resolution: series=0, resolution=3
        Series 0: converted 1/1 planes (100%)
Tile size = 256 x 256
Decompressing resolution: series=0, resolution=4
        Series 0: converted 1/1 planes (100%)
Decompressed tile: series=1, resolution=0, x=0, y=0
        Series 1: converted 1/1 planes (100%)
Decompressed tile: series=2, resolution=0, x=0, y=0
        Series 2: converted 1/1 planes (100%)
[done]
1247.511s elapsed (43.714287+177604.58ms per plane, 719ms overhead)
...
[Aperio SVS] -> 7.1.0-SNAPSHOT/j2k_precompressed/77917/77917.dcm [DICOM]
More than 4GB of pixel data, compression will need to be used
Tile size = 256 x 256
        Series 0: converted 1/1 planes (100%)
Tile size = 256 x 256
        Series 0: converted 1/1 planes (100%)
Tile size = 256 x 256
        Series 0: converted 1/1 planes (100%)
Tile size = 256 x 256
        Series 0: converted 1/1 planes (100%)
Tile size = 256 x 256
Decompressing resolution: series=0, resolution=4
        Series 0: converted 1/1 planes (100%)
Decompressed tile: series=1, resolution=0, x=0, y=0
        Series 1: converted 1/1 planes (100%)
Decompressed tile: series=2, resolution=0, x=0, y=0
        Series 2: converted 1/1 planes (100%)
[done]
259.626s elapsed (29.428572+36507.715ms per plane, 613ms overhead)

Also confirmed that combining any of the incompatible bfconvert options throw an informative error as expected.

Also successfully executed the two example under formats-gpl/utils usign CMU-1.svs as the input file.

Barring the minor documentation issue found in bfconvert, these changes look ready to merge based on the functional review above.

Following up on the conversation in #3992 (comment), I would propose the following progression:

  • assuming approval from @dgault and @joshmoore, include these changes i.e. new APIs + reader implementation + writer implementation in preparation of the upcoming 7.1.0 release

  • review the Bio-Formats developer documentation to also include these APIs when developers new readers/writers

  • capture the various RFEs that appear as part of the testing as issues and discuss their prioritisation in upcoming patch/minor releases at upcoming Formats meeting:

    • discovery of the format with compressed tile reading/writing support
    • extension to other reader/writers. Immediate candidates from my side might be: OMETIFF{Reader,Writer} and possibly FakeReader to facilitate the testing
    • depending on the above, possibly migration of the examples of the API to https://github.com/ome/bio-formats-examples where they are regularly executed
    • add support for writing series with different compressions and use the source codec when possible
  • also include these new compressed reading/writing API as part of the investigation in Chunk API: Add new API and functionality for reading and writing chunks #3794

@melissalinkert
Copy link
Member Author

👍 to the roadmap outlined in #3992 (review)

Copy link
Member

@dgault dgault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im happy enough with all of the new API additions here. The reader and writer implementations also look fine from my side.

I did some manual sanity testing similar to Seb's more comprehensive tests, using our public sample SVS files (https://downloads.openmicroscopy.org/images/SVS/) along with the same OpenSlide files as Seb had used (https://openslide.cs.cmu.edu/download/openslide-testdata/Aperio/). Using bfconvert with the new option and the execution times with the new precompressed option were much faster as expected. The resulting files were also valid and open and displayed correctly.

The PrecompressedConversionExample The new preCompressedExample also looks good and runs as expected with a suitable SVS input file. The compressed tile times look to faster and after decompression the lengths match that of the rawTile. These may be better suited in the examples repo but that can happen at a later date.

Agreed with the conclusion of #3992 (review) that the documentation will need reviewed and updated to cover this functionality and the options for ImageConverter. But overall this looks good for inclusion with 7.1.0

byte[] buf = getTile(reader, writer.getResolution(), index,
xCoordinate, yCoordinate, width, height);

// if we asked for precompressed tiles, but that wasn't possible,
// then log that decompression/recompression happened
// TODO: decide if an exception is better here?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for now having it logged is a better option than throwing the exception

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants