OIRReader: Major performance improvement when reading a ROI within a plane #4205

NicoKiaru · 2024-07-01T07:49:00Z

In the implementation before this PR each time a small ROI is read from an image, a full OIR Pixelblock was read. Combined with the fact that the Optimal Tile Size was not overriden and defaulted to a thin stripe in y (128 pixels), this could lead to a particularly bad scenario for big images:

Imagine a 10000 by 10000 pixel plane read by tiles of 10000x128 pixels -> 79 tiles have to be read to read the full plane
For big tiled OIR images, a PixelBlock typically spans the full plane
This means that, to acquire all tiles of this plane, the full plane is actually read 97 times

The proposed new implementation of this PR:

prevents reading fully all PixelBlocks when one needs to read a ROI within them
overrides getOptimalTile size methods to match the underlying PixelBlock size (or crops at 2048 pixels)

When requesting a cropped region within big images, the speedup is massive.

Here's a real time display with big dataviewer with the current implementation:

Here's a real time display with big dataviewer with the implementation in this PR:

…ontaining a CZT Key object

…e OIRReader: - prevents reading fully all PixelBlocks when one needs to read a ROI within them - overrides getOptimalTile size methods to match the underlying PixelBlock size (or crops at 2048 pixels)

…the array in a random order

imagesc-bot · 2024-07-01T10:48:07Z

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/why-is-bioformats-reading-large-oir-files-so-much-slower-than-the-microscope-companion-software-provided-by-olympus/66922/6

dgault · 2024-07-01T10:51:09Z

Thanks again @NicoKiaru for the hard work investigating and getting the PR opened. I have marked it for inclusion for the repository tests to check for any side effects in other datasets.

NicoKiaru · 2024-07-01T10:53:58Z

Thanks, I have tested it on very few files, so while I tried to keep the changes to the minimum, I'm not sure I did not break something! Let's see.

jburel · 2024-07-03T11:40:57Z

Failed with the following errors

  [testng] java.lang.NullPointerException: Cannot assign field "yStart" because "block" is null
   [testng] 	at loci.formats.in.OIRReader.initFile(OIRReader.java:432)
   [testng] 	at loci.formats.FormatReader.setId(FormatReader.java:1480)

   [testng] java.lang.ArrayIndexOutOfBoundsException: Index 1023 out of bounds for length 1023
at loci.formats.in.OIRReader.initFile(OIRReader.java:420)

melissalinkert · 2024-07-03T13:38:42Z

Note that some of the test failures occur on public data:

Fix wrong max number of blocks counts

NicoKiaru · 2024-07-07T12:57:43Z

So I've fixed what I could test with the public files.

dgault · 2024-07-15T15:45:32Z

Added a commit to fix a NullPointerException in the nightly tests (https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-test-folder/18844/console)

dgault · 2024-07-16T12:59:53Z

Added a further commit to address failures in the nightly tests: https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-test-folder/19008/console

These last 2 failures related to spectral images using modulo C. I have updated the indexing for the blocks to match the previous behaviour.

dgault · 2024-07-18T14:00:19Z

I am still seeing failures for the modulo C files in the nightly tests, though from testing it looks as though the original implementation currently in the reader may also be incorrect. In both cases a single set of pixel blocks is being used for each of the modulo C frames, the tests fail because this PR is defaults to a different frame.

The real issue seems to be that in the case of the Lambda files as set here, that I am unsure how to identify from the pixel block uids which index they correctly belong to. Currently both implementations are matching the channel id which is the same for each step and results in the same frames each time. I will try and dig further to see if I can correctly identify the correct blocks.

NicoKiaru · 2024-07-18T14:12:29Z

Ok, good luck with that. If you have a public lambda file, I could also investigate. But if not I can also try to create one and make it public.

NicoKiaru · 2024-07-18T15:23:21Z

I've submitted some example file to the bio-formats QA community in Zenodo. They cover:

lambda scan
lambda scan + t
lambda scan + z
lambda scan + z + t

EDIT: Zenodo repo: https://zenodo.org/records/12773657

And it looks like none of these files are properly opened by the current reader (and I'm pretty sure this PR does not improve the situation).

NicoKiaru · 2024-07-19T15:43:00Z

I've made a small modification, which, in the files I could test, fixes the reading of spectral acquisition.

NicoKiaru · 2024-08-26T08:08:14Z

Hello, do you think you would be able to give it a (new) go ? I've shared some sample spectral dataset + the previous version was not able to read spectral dataset from FV4000. Of course there may still be issued with private files, but I can't fix anything in that regard.

melissalinkert · 2024-09-02T14:17:11Z

Thanks for the updates, @NicoKiaru, and apologies for the delay. This will now be included in tonight's builds, so we'll know the status in the morning.

melissalinkert · 2024-09-03T15:56:26Z

This appears to have passed all tests (https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-test-folder/27045/console), so is on my list to review for inclusion in 8.0.0.

NicoKiaru · 2024-09-03T19:42:18Z

Thanks! Before doing that please wait a day or two: I want to upload such a big 2D dataset since I'm not sure you have that in the demo files.

NicoKiaru · 2024-09-04T07:21:14Z

Ok, I've submitted a tiled image on Zenodo bio-formats submission community.

I think to demo the performance increase I should rather have done a single very large plane than a z-stack. But timing the opening with a 'crop on import' the perf increase should still be visible I hope.

joshmoore · 2024-09-04T07:43:59Z

see https://zenodo.org/records/13680725

melissalinkert

I am generally fine with this, there are just a few small things that should be fixed before merging, the null check on line 213-214 being most important.

Several tests of showinf -crop ... with https://zenodo.org/records/13680725 in particular show a definite performance improvement with these changes compared to the current state of develop. I also tried reading the matl.omp2info (see OlympusTileReader); while that wasn't dramatically faster, it certainly wasn't any worse, and did demonstrate that the optimal tile size changes here are correctly propagated to OlympusTileReader.

As previously indicated, automated tests are passing. Since those tests don't thoroughly exercise reading cropped planes, I also ran a simple test to compare plane hashes and read times across all of our .oir data:

import java.security.MessageDigest;
import java.util.ArrayList;
import loci.common.DataTools;
import loci.common.DebugTools;
import loci.tests.testng.*;
import loci.formats.in.OIRReader;

public class PixTest {
  public static void main(String[] args) throws Exception {
    DebugTools.enableLogging("WARN");
    MessageDigest md = MessageDigest.getInstance("MD5");
    try (OIRReader reader = new OIRReader()) {
      ArrayList<String> testFiles = new ArrayList<String>();
      TestTools.getFiles(args[0], testFiles, new ConfigurationTree(args[0], args[1], null), null, null, "");
      for (String testFile : testFiles) {
        if (!reader.isThisType(testFile)) {
          continue;
        }
        System.out.println(testFile);
        reader.setId(testFile);

        int x = reader.getSizeX() /4;
        int y = reader.getSizeY() / 4;
        int w = (int) Math.min(reader.getSizeX() / 3, 2048);
        int h = (int) Math.min(reader.getSizeY() / 3, 2048);

        long t0 = System.currentTimeMillis();
        for (int plane=0; plane<reader.getImageCount(); plane++) {
          byte[] buf = reader.openBytes(plane, x, y, w, h);
          md.reset();
          md.update(buf);
          String md5 = DataTools.bytesToHex(md.digest());
          System.out.println("plane #" + plane + " MD5: " + md5);
        }
        long t1 = System.currentTimeMillis();
        System.out.println("total plane read time: " + (t1 - t0) + " ms");
        reader.close();
      }
    }
  }
}

Without this change, that test against all of our existing .oir data took 75m22s locally. With this change and the addition of a null check between lines 213/214, it takes 58m58s. There are MD5 differences for the following files, but these appear to be correct and a result of better lambda handling:

olympus/spectral_scan_XYL.oir
qa-17409/Stitch_A01_G001.oir

For both files, showinf with develop shows the same data for all planes, which is not the case with this change.

So, once the few suggestions here have been addressed and we have one more passing build that includes the updates, I am happy to see this merged.

components/formats-gpl/src/loci/formats/in/OIRReader.java

NicoKiaru · 2024-09-06T14:17:45Z

Thanks a lot @melissalinkert for this thorough review! I'll get back to it next week.

* set optimalTileHeight to 0 in close * check reader initialisation in optimal tile size method calls * checks and warns in case of null PixelBlocks

NicoKiaru · 2024-09-09T08:14:50Z

I've done the modifications you suggested. Thanks again @melissalinkert !

melissalinkert

Merging as https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-test-repo/182/ and https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-test-repo/183/ passed with the latest changes included.

Thanks again for the work here, @NicoKiaru.

imagesc-bot · 2024-10-27T19:41:26Z

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/fiji-friends-weekly-dev-update-thread/103718/9

NicoKiaru added 3 commits June 29, 2024 16:28

Changes an exhaustive listing through all OIR PixelBlock with a map c…

27db3ea

…ontaining a CZT Key object

Major performance improvement when reading a ROI within a plane in th…

e266c08

…e OIRReader: - prevents reading fully all PixelBlocks when one needs to read a ROI within them - overrides getOptimalTile size methods to match the underlying PixelBlock size (or crops at 2048 pixels)

OIRReader: changes List<PixelBlock> to PixelBlock[] in order to fill …

7f52992

…the array in a random order

dgault added the include label Jul 1, 2024

melissalinkert removed the include label Jul 3, 2024

NicoKiaru added 2 commits July 7, 2024 14:54

Merge branch 'ome:develop' into oir-crop-pixel-block

f3cc4b5

Update OIRReader.java

c72a71b

Fix wrong max number of blocks counts

Merge branch 'ome:develop' into oir-crop-pixel-block

cde782c

melissalinkert added the include label Jul 14, 2024

OIRReader: Add null check for block

7fbe98d

OIRReader: Update zct block indexing for modulo C

0bd9fa9

dgault removed the include label Jul 18, 2024

NicoKiaru added a commit to BIOP/ijp-kheops that referenced this pull request Jul 19, 2024

Temporary OIR fix (related to ome/bioformats#4205)

ec8888f

OIRReader: fix reading of oir spectral (lambda) images

d7ccc56

sbesson added the include label Sep 2, 2024

sbesson added this to the 8.0.0 milestone Sep 2, 2024

NicoKiaru mentioned this pull request Sep 3, 2024

Commits an alternative Zeiss CZI Reader #4092

Open

melissalinkert self-requested a review September 3, 2024 15:53

melissalinkert requested changes Sep 6, 2024

View reviewed changes

OIRReader - improve safety:

26efbd1

* set optimalTileHeight to 0 in close * check reader initialisation in optimal tile size method calls * checks and warns in case of null PixelBlocks

melissalinkert approved these changes Sep 10, 2024

View reviewed changes

melissalinkert merged commit 1f03e09 into ome:develop Sep 10, 2024
18 checks passed

NicoKiaru deleted the oir-crop-pixel-block branch September 11, 2024 08:08

melissalinkert mentioned this pull request Sep 11, 2024

Consider more thorough comparison of image data in automated tests #4238

Open

NicoKiaru mentioned this pull request Oct 29, 2024

Abysmal perf converting OIR with LZW compression tile size 512 on a windows shared drive BIOP/ijp-kheops#26

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OIRReader: Major performance improvement when reading a ROI within a plane #4205

OIRReader: Major performance improvement when reading a ROI within a plane #4205

NicoKiaru commented Jul 1, 2024

imagesc-bot commented Jul 1, 2024

dgault commented Jul 1, 2024

NicoKiaru commented Jul 1, 2024

jburel commented Jul 3, 2024 •

edited

Loading

melissalinkert commented Jul 3, 2024

NicoKiaru commented Jul 7, 2024

dgault commented Jul 15, 2024

dgault commented Jul 16, 2024

dgault commented Jul 18, 2024

NicoKiaru commented Jul 18, 2024 •

edited

Loading

NicoKiaru commented Jul 18, 2024 •

edited

Loading

NicoKiaru commented Jul 19, 2024

NicoKiaru commented Aug 26, 2024

melissalinkert commented Sep 2, 2024

melissalinkert commented Sep 3, 2024

NicoKiaru commented Sep 3, 2024

NicoKiaru commented Sep 4, 2024

joshmoore commented Sep 4, 2024

melissalinkert left a comment

NicoKiaru commented Sep 6, 2024

NicoKiaru commented Sep 9, 2024

melissalinkert left a comment

imagesc-bot commented Oct 27, 2024

OIRReader: Major performance improvement when reading a ROI within a plane #4205

OIRReader: Major performance improvement when reading a ROI within a plane #4205

Conversation

NicoKiaru commented Jul 1, 2024

imagesc-bot commented Jul 1, 2024

dgault commented Jul 1, 2024

NicoKiaru commented Jul 1, 2024

jburel commented Jul 3, 2024 • edited Loading

melissalinkert commented Jul 3, 2024

NicoKiaru commented Jul 7, 2024

dgault commented Jul 15, 2024

dgault commented Jul 16, 2024

dgault commented Jul 18, 2024

NicoKiaru commented Jul 18, 2024 • edited Loading

NicoKiaru commented Jul 18, 2024 • edited Loading

NicoKiaru commented Jul 19, 2024

NicoKiaru commented Aug 26, 2024

melissalinkert commented Sep 2, 2024

melissalinkert commented Sep 3, 2024

NicoKiaru commented Sep 3, 2024

NicoKiaru commented Sep 4, 2024

joshmoore commented Sep 4, 2024

melissalinkert left a comment

Choose a reason for hiding this comment

NicoKiaru commented Sep 6, 2024

NicoKiaru commented Sep 9, 2024

melissalinkert left a comment

Choose a reason for hiding this comment

imagesc-bot commented Oct 27, 2024

jburel commented Jul 3, 2024 •

edited

Loading

NicoKiaru commented Jul 18, 2024 •

edited

Loading

NicoKiaru commented Jul 18, 2024 •

edited

Loading