Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

corrupted image converting .tiff #503

Open
lsh-0 opened this issue May 8, 2020 · 9 comments
Open

corrupted image converting .tiff #503

lsh-0 opened this issue May 8, 2020 · 9 comments

Comments

@lsh-0
Copy link

lsh-0 commented May 8, 2020

We have a .tiff image that is producing corrupted output.

Original: https://prod-elife-published.s3.amazonaws.com/digests/55692/digest-55692.tif

Corrupted: https://iiif.elifesciences.org/digests/55692%2Fdigest-55692.tif/full/full/0/default.webp

With the same type of corruption for .png and .jpg output formats. The corruption is different on each generation however we have caching turned on so you'll need to add request parameters to see that (/digests/55692%2Fdigest-55692.tif/full/full/0/default.webp?a=z).

We run loris from a Docker container so it's state is fairly fixed. You can pull the image from here: https://hub.docker.com/r/elifesciences/loris

And see it's definition here (forked from loris-docker): https://github.com/elifesciences/loris-docker

Version info:

  • Ubuntu 18.04
  • libtiff5/bionic-updates,bionic-security,now 4.0.9-5ubuntu0.3 amd64
  • libtiff5-dev/bionic-updates,bionic-security,now 4.0.9-5ubuntu0.3 amd64
  • libtiffxx5/bionic-updates,bionic-security,now 4.0.9-5ubuntu0.3 amd64
  • loris==3.0.0
  • Pillow==6.2.0
@bcail
Copy link
Contributor

bcail commented May 8, 2020

$ mediainfo elife-digest-55692.tif
General
Complete name : elife-digest-55692.tif
Format : TIFF
File size : 214 KiB

Image
Title : ...
Format : LZW
Width : 540 pixels
Height : 523 pixels
Color space : RGB
Bit depth : 8 bits
Compression mode : Lossless

I wonder if the LZW format is causing issues. @lsh-0 do you know if other LZW images are working for you, or are they all failing?

@lsh-0
Copy link
Author

lsh-0 commented May 12, 2020

Thanks for getting back to me. Here is another LZW compressed .tif, it appears to be working fine:

Original: https://prod-elife-published.s3.amazonaws.com/digests/53232/digest-53232.tif

IIIF: https://iiif.elifesciences.org/digests/53232%2Fdigest-53232.tif/full/full/0/default.webp

I have about 50 candidates I'm going to write a wee script to run through. All these images go through a review process and are then scrutinised on the site itself. If any are corrupted now it's likely they were working fine in the previous version of Loris (circa v2.2.0).

@lsh-0
Copy link
Author

lsh-0 commented May 12, 2020

None of the other candidates exhibited corruption, which is good.

I have a side-project using ImageMagick that will compare two images for differences with a fuzz factor and a threshold for passing. I intend to run through all of the our images and check for corruption that way. It would be good for our peace of mind to run this whenever we upgrade Loris and may even shake out more examples of this specific corruption.

Please let me know if you have any further suggestions you'd like me to investigate.

@alexwlchan
Copy link
Contributor

alexwlchan commented May 12, 2020

Pulling different versions of the image from https://hub.docker.com/r/elifesciences/loris/tags:

  • I could repro the issue with 45b9edd23a9132fc67a8c71ded30836089a63b00
  • I couldn't repro with 89a0ed19ee5a86f9b2afa864dfa27b7addaaf7b3

So something about this commit introduced the issue to your build: elifesciences/loris-docker@984cb31

These are all the changes in Loris between 2.3.3 and 3.0: v2.3.3...v3.0.0

@lsh-0
Copy link
Author

lsh-0 commented May 12, 2020

This is helpful, thank you. I'll try upgrading the container to Ubuntu 20.04 on Thurs or Fri that has a newer version of libtiff in it (4.1 vs 4.0). I looked at it's changelog the other day but nothing jumped out at me.

@alexwlchan
Copy link
Contributor

I think the issue isn’t libtiff; it might be the Python library Pillow.

Your working image had Pillow 4.3.0; your current image has 6.2.0. We used to pin the version of Pillow to avoid an issue with JPEG-compressed TIFs (see #407, python-pillow/Pillow#2926, #485).

If I process your image with Pillow 7.0, I see the same corruption:

from PIL import Image

im = Image.open("digest-55692.tif")
im.save("digest-55692.jpg", quality=90)

So a short-term fix would be for you to pin the version of Pillow you use in your image (I can’t see where you install it in your Dockerfiles?).

I’ll have a look to see if the Pillow maintainers are aware of the issue.

@lsh-0
Copy link
Author

lsh-0 commented May 12, 2020

Your working image

Ah, it wasn't actually working. That was a new 2.3.3 installation. I was migrating Loris to a containerised installation. It seemed to go well until we started seeing corruption in a handful of new images resulting in this rushed upgrade to 3.0.0.

Our previous working version was a patched ~2.0 (?) era release? Not sure. We relied on it crashing to re-request a different source format.

@lsh-0
Copy link
Author

lsh-0 commented May 14, 2020

thanks for raising the issue with Pillow, @alexwlchan , it's appreciated.

@lsh-0
Copy link
Author

lsh-0 commented Jul 6, 2021

I can confirm that Loris 3.0.0, 3.2.0 and 3.2.1 with pillow==8.2.0 fixes this corruption issue.

edit: and it looks like there is a PR for that already!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants