Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't fail on non-4:4:4 JPEG #4061

Merged
merged 8 commits into from
Jan 31, 2024
Merged

Don't fail on non-4:4:4 JPEG #4061

merged 8 commits into from
Jan 31, 2024

Conversation

CGDogan
Copy link
Contributor

@CGDogan CGDogan commented Jul 30, 2023

The previous pull request description was hard to read, so I'm rewriting it: MCU size is 8x8 for 4:4:4. We should read relative subsampling "numbers" from SOF0 and calculate MCU size accordingly. This way, we won't get IndexOutOfBounds when we read 4:2:0 etc.Closes #3616

@CGDogan CGDogan changed the title Don't fail on subsampled JPEG Don't fail on non-4:4:4 JPEG Jul 31, 2023
@dgault dgault added the include label Aug 2, 2023
@dgault
Copy link
Member

dgault commented Aug 2, 2023

Hi @CGDogan, thank you for opening the PR. I don't believe we currently have any 4:2:2 samples that I could find. When testing the 2 sample files you linked, I was able to open both them without any issues. Did you expect that the 4:2:0 file you attached would cause failures? If so, can you provide me with the exact steps to reproduce the exception.

@CGDogan
Copy link
Contributor Author

CGDogan commented Aug 2, 2023

Sorry @dgault, 4:4:4 opens as expected before and after, I included it for testcase, but for the 4:2:0 I somehow attached a wrong file. Here's a file that fails before this pull request with the ImageJ Bioformats importer.
8193-8193.tar.gz

To reproduce in java, it's enough to

try {
    reader.setId("8193-8193.jpg");
    var y =reader.openBytes(0);
} catch (Exception e) {
    System.out.println("couldnt open file");
}

@CGDogan CGDogan closed this Aug 2, 2023
@CGDogan
Copy link
Contributor Author

CGDogan commented Aug 2, 2023

Sorry I finally managed to find a 4:2:2 but with it I made up a 4:2:0 that still doesnt work and I need to investigate this before I reopen this

@CGDogan CGDogan reopened this Aug 2, 2023
@CGDogan
Copy link
Contributor Author

CGDogan commented Aug 2, 2023

OK it works fine with 4:2:2,
out422new16b.jpeg.zip

verified that it reads until the end of the array, I encountered another bug related to integer divisibility that isn't introduced by this pull request, so I think it's ready but let me know if you have any comments. That's why suddenly it couldn't decode 4:2:0

@CGDogan
Copy link
Contributor Author

CGDogan commented Aug 2, 2023

Opened a separate issue for that #4064 - happens even with 4:4:4

@dgault dgault added this to the 7.1.0 milestone Sep 4, 2023
@dgault
Copy link
Member

dgault commented Nov 17, 2023

Without this PR and testing the sample file 8193-8193.jpg results in an exception:

java.lang.IndexOutOfBoundsException: Index: 513, Size: 513
	at java.util.ArrayList.rangeCheck(ArrayList.java:659)
	at java.util.ArrayList.get(ArrayList.java:435)
	at loci.formats.services.JPEGTurboServiceImpl.getTile(JPEGTurboServiceImpl.java:299)
	at loci.formats.services.JPEGTurboServiceImpl.getTile(JPEGTurboServiceImpl.java:253)
	at loci.formats.in.TileJPEGReader.openBytes(TileJPEGReader.java:81)

With this PR included the file can be opened and displayed without exception.

With and without this PR, testing the sample file out422new16b.jpeg results in the file opening and displaying without exception.

I will add these new samples to our repository of test images and configure them for the daily tests.

Copy link
Member

@melissalinkert melissalinkert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not comfortable with this being merged without testing NDPI specifically, as that's the primary consumer of JPEGTurboServiceImpl. The automated tests will verify the upper-left most tile, but we know there have been issues with the bottom row of tiles in recent past (#3526, #4083). An import test as in #3526 should be sufficient, but I will not have time to do that this week.

MCU sizes will be read as long as SOF0 is read. SOF0 is assumed to be read since it contains the dimensions as well. We should either fail on not finding a SOF0 or finding multiple of SOF0 tags. So if there should be an assertion, it should be "exactly one SOF0 found", that would be more sensible.
@CGDogan
Copy link
Contributor Author

CGDogan commented Nov 23, 2023

Meanwhile I updated it now to work not only for standard 4:4:4, 4:2:0 and 4:2:2 but weirder ones as well. I'm attaching a zip folder containing unusual test cases, these all work as I tested with bfconvert.
Old test files not so useful jpegtestfiles.zip

New test files exhaustive (I have verified that GitHub didn't compress or convert these after upload):

magick2x2
magick2x1
magick1x2
magick1x4
magick2x4
magick2x3
magick1x3
magick4x2
magick4x1
magick3x2
magick3x1
magick1x1.jpg.ZIP ARCHIVE
ffmpeg444
ffmpeg422
ffmpeg420
orig

To test this, please not only apply the pull request locally but also apply: Not needed, test files are large now.

formats-bsd/src/loci/formats/in/JPEGReader.java


  	super.setId(currentId + ".fixed");
	}
-	if (getSizeX() > MAX_SIZE && getSizeY() > MAX_SIZE &&
+System.out.println("disabled AWT reader");
+	if (true || getSizeX() > MAX_SIZE && getSizeY() > MAX_SIZE &&
  	!legacyReaderInitialized)
	{

because the test images I've attached are small which would normally be read with AWT and not with this tiled reader

Nothing left for me to commit / improve on this Pull Request now. Comments welcome

@CGDogan
Copy link
Contributor Author

CGDogan commented Nov 23, 2023

Addressing the comment:

I am not comfortable with this being merged without testing NDPI specifically, as that's the primary consumer of JPEGTurboServiceImpl. The automated tests will verify the upper-left most tile, but we know there have been issues with the bottom row of tiles in recent past (#3526, #4083). An import test as in #3526 should be sufficient, but I will not have time to do that this week.

MCU sizes inferred from SOF0 are the same for every single tile. This Pull Request changes the MCU size from 8x8 to "something more correct" if that exists. So in short, for 4:4:4 images nothing has changed (MCU size was 8x8 and still 8x8). And for images of 4:2:0 etc., even the first few tiles weren't decoded correctly, and the decoder ended up in giving out-of-bounds errors once all the bytes were tried to be read

In short, after this pull request, if the first tile is decoded the same as before, then it means this was a 4:4:4 image, so the rest will be decoded the same way (same mcuHeight mcuWidth params), and they will be correct.

@melissalinkert
Copy link
Member

As far as I can see, only one of the individual samples linked in #4061 (comment) actually tests this use case. All can be opened with this PR, but most will use the ImageIO code path as this exception is thrown when initializing JPEGTurboServiceImpl (I assume this is what #4064 references):

Initialization with TileJPEGReader failed
java.io.IOException: Restart interval and markers invalid
	at loci.formats.services.JPEGTurboServiceImpl.initialize(JPEGTurboServiceImpl.java:229)
	at loci.formats.in.TileJPEGReader.reopenFile(TileJPEGReader.java:144)
	at loci.formats.in.TileJPEGReader.initFile(TileJPEGReader.java:121)
	at loci.formats.FormatReader.setId(FormatReader.java:1480)
	at loci.formats.DelegateReader.setId(DelegateReader.java:277)
	at loci.formats.in.JPEGReader.setId(JPEGReader.java:149)
	at loci.formats.ImageReader.setId(ImageReader.java:865)
	at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:692)
	at loci.formats.tools.ImageInfo.testRead(ImageInfo.java:1043)
	at loci.formats.tools.ImageInfo.main(ImageInfo.java:1129)

Only https://github.com/ome/bioformats/assets/126820728/99c0d588-72fc-47a2-8fa7-08614aa2f2cc fails as indicated in the description above without this PR, and can be opened using JPEGTurboServiceImpl with this PR.

@CGDogan, could you please concisely describe in a new comment which files are meant to demonstrate this specific fix?

Otherwise, I am now reasonably satisfied that this doesn't break existing NDPI data and conversion workflows. Including these changes in a build of bioformats2raw allows the public NDPI data in https://downloads.openmicroscopy.org/images/Hamamatsu-NDPI/ to be converted to OME-TIFF without error, and the resulting files display as expected in QuPath.

@CGDogan
Copy link
Contributor Author

CGDogan commented Dec 3, 2023

Firstly about the error message: "Restart interval and markers invalid" actually should say "this jpeg file was encoded with the encoder choosing not to add restart markers. This reader assumes restart markers". I updated this error message

So I added sane restart markers to the dataset with jpegtran -restart 1. It turns out that this works well! The old TurboJpeg can only decode the super-common ones: 1x1, 2x1, 1x2, 2x2. 1x1 is the standard 4:4:4. Zip file containing one example each for the other three: testfiles.zip

EDIT: I have now edited my patch considering libjpeg-turbo/libjpeg-turbo#92 . You might want to rerun the test cases but this is a very rare case anyway so you should see everything passing. Because an old version of turbojpeg is being used, the number of images that can be decoded correctly is unchanged.

@melissalinkert
Copy link
Member

I don't have immediate concerns with the latest changes, but I would like more time to finish repeating tests with the latest commit. Pushing to 7.2.0 for now (as discussed with @ome/formats earlier in the week); this would probably be fine for a patch release if we want to do something sooner.

@melissalinkert melissalinkert modified the milestones: 7.1.0, 7.2.0 Dec 6, 2023
Copy link
Member

@melissalinkert melissalinkert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirm that the files linked in #4061 (comment) fail to open without this change, and successfully open with this change. The files have now been added to our test repository.

As this has been included in builds for some time without issue, happy to merge this for inclusion in 7.2.0. Thanks again, @CGDogan.

@melissalinkert melissalinkert merged commit 320e35d into ome:develop Jan 31, 2024
17 checks passed
@dgault
Copy link
Member

dgault commented Feb 1, 2024

Some odd failures in the tests last night: https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-push/761/console

This PR has been included as part of merge-ci for quite some time so it is strange that these failures haven't been seen before. I have tried some manual tests this morning on the files and have not been able to reproduce.

@melissalinkert
Copy link
Member

Thanks @dgault - I should have waited to merge for one more passing build after including new samples. I can reproduce the failures when running the whole jpeg folder locally (which includes the new samples for this PR), but the individual failing sub-folders pass when run on their own. That suggests that something isn't being reset properly when the reader/service is closed. I'll open a separate PR as soon as there is a fix.

melissalinkert added a commit to melissalinkert/bioformats that referenced this pull request Feb 1, 2024
This problem was exposed by ome#4061, but likely not directly caused by it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JPEG: IndexOutOfBoundsException
3 participants