-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't fail on non-4:4:4 JPEG #4061
Conversation
Hi @CGDogan, thank you for opening the PR. I don't believe we currently have any |
Sorry @dgault, 4:4:4 opens as expected before and after, I included it for testcase, but for the 4:2:0 I somehow attached a wrong file. Here's a file that fails before this pull request with the ImageJ Bioformats importer. To reproduce in java, it's enough to
|
Sorry I finally managed to find a 4:2:2 but with it I made up a 4:2:0 that still doesnt work and I need to investigate this before I reopen this |
OK it works fine with 4:2:2, verified that it reads until the end of the array, I encountered another bug related to integer divisibility that isn't introduced by this pull request, so I think it's ready but let me know if you have any comments. That's why suddenly it couldn't decode 4:2:0 |
Opened a separate issue for that #4064 - happens even with 4:4:4 |
Without this PR and testing the sample file 8193-8193.jpg results in an exception:
With this PR included the file can be opened and displayed without exception. With and without this PR, testing the sample file out422new16b.jpeg results in the file opening and displaying without exception. I will add these new samples to our repository of test images and configure them for the daily tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not comfortable with this being merged without testing NDPI specifically, as that's the primary consumer of JPEGTurboServiceImpl
. The automated tests will verify the upper-left most tile, but we know there have been issues with the bottom row of tiles in recent past (#3526, #4083). An import test as in #3526 should be sufficient, but I will not have time to do that this week.
MCU sizes will be read as long as SOF0 is read. SOF0 is assumed to be read since it contains the dimensions as well. We should either fail on not finding a SOF0 or finding multiple of SOF0 tags. So if there should be an assertion, it should be "exactly one SOF0 found", that would be more sensible.
Meanwhile I updated it now to work not only for standard 4:4:4, 4:2:0 and 4:2:2 but weirder ones as well. I'm attaching a zip folder containing unusual test cases, these all work as I tested with bfconvert. New test files exhaustive (I have verified that GitHub didn't compress or convert these after upload): magick2x2
Nothing left for me to commit / improve on this Pull Request now. Comments welcome |
components/formats-bsd/src/loci/formats/services/JPEGTurboServiceImpl.java
Show resolved
Hide resolved
Addressing the comment:
MCU sizes inferred from SOF0 are the same for every single tile. This Pull Request changes the MCU size from 8x8 to "something more correct" if that exists. So in short, for 4:4:4 images nothing has changed (MCU size was 8x8 and still 8x8). And for images of 4:2:0 etc., even the first few tiles weren't decoded correctly, and the decoder ended up in giving out-of-bounds errors once all the bytes were tried to be read In short, after this pull request, if the first tile is decoded the same as before, then it means this was a 4:4:4 image, so the rest will be decoded the same way (same mcuHeight mcuWidth params), and they will be correct. |
As far as I can see, only one of the individual samples linked in #4061 (comment) actually tests this use case. All can be opened with this PR, but most will use the ImageIO code path as this exception is thrown when initializing
Only https://github.com/ome/bioformats/assets/126820728/99c0d588-72fc-47a2-8fa7-08614aa2f2cc fails as indicated in the description above without this PR, and can be opened using @CGDogan, could you please concisely describe in a new comment which files are meant to demonstrate this specific fix? Otherwise, I am now reasonably satisfied that this doesn't break existing NDPI data and conversion workflows. Including these changes in a build of bioformats2raw allows the public NDPI data in https://downloads.openmicroscopy.org/images/Hamamatsu-NDPI/ to be converted to OME-TIFF without error, and the resulting files display as expected in QuPath. |
Firstly about the error message: "Restart interval and markers invalid" actually should say "this jpeg file was encoded with the encoder choosing not to add restart markers. This reader assumes restart markers". I updated this error message So I added sane restart markers to the dataset with EDIT: I have now edited my patch considering libjpeg-turbo/libjpeg-turbo#92 . You might want to rerun the test cases but this is a very rare case anyway so you should see everything passing. Because an old version of turbojpeg is being used, the number of images that can be decoded correctly is unchanged. |
I don't have immediate concerns with the latest changes, but I would like more time to finish repeating tests with the latest commit. Pushing to 7.2.0 for now (as discussed with @ome/formats earlier in the week); this would probably be fine for a patch release if we want to do something sooner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirm that the files linked in #4061 (comment) fail to open without this change, and successfully open with this change. The files have now been added to our test repository.
As this has been included in builds for some time without issue, happy to merge this for inclusion in 7.2.0. Thanks again, @CGDogan.
Some odd failures in the tests last night: https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-push/761/console This PR has been included as part of merge-ci for quite some time so it is strange that these failures haven't been seen before. I have tried some manual tests this morning on the files and have not been able to reproduce. |
Thanks @dgault - I should have waited to merge for one more passing build after including new samples. I can reproduce the failures when running the whole |
This problem was exposed by ome#4061, but likely not directly caused by it.
The previous pull request description was hard to read, so I'm rewriting it: MCU size is 8x8 for 4:4:4. We should read relative subsampling "numbers" from SOF0 and calculate MCU size accordingly. This way, we won't get IndexOutOfBounds when we read 4:2:0 etc.Closes #3616