Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rudimentary ICC color profile support to colorhash #110

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

david-russo
Copy link

@david-russo david-russo commented Jun 9, 2020

Color profiles can significantly alter the appearance of an image's underlying color values on modern display equipment. This change modifies the colorhash algorithm to take that into account by normalizing (and converting) any embedded color profiles into the standard sRGB color space prior to hashing. This allows colorhash to detect color differences between images which may only differ by the inclusion, absence, or change of a color profile.

It also provides an option to ignore embedded color profiles and instead only analyze an image's raw color data (i.e. the default behaviour before this change), allowing images to be compared with or without their profiles.

Color profiles can significantly alter the appearance of an image's
color. This change modifies the colorhash algorithm to take that into
account by normalizing any embedded color profiles into the standard
sRGB color space before analysis. It also provides an option to ignore
embedded color profiles and instead analyze the image's raw color data
(i.e. the default behaviour before this change), allowing images to be
compared without their profiles.
@JohannesBuchner
Copy link
Owner

This seems like a good idea, but it is currently failing in the CI (see the logs there).

Do these color profiles also alter the luminescence? If yes, they could be relevant for the other hashes as well.

@JohannesBuchner
Copy link
Owner

Actually, if I read correctly, these ICC profiles are meant to adjust to the particular viewing device. Wouldn't it be more objective to ignore them? Probably I am misunderstanding this.

@david-russo
Copy link
Author

david-russo commented Jun 9, 2020

They can have an effect on luminosity, yes, since they can essentially map any internal colour value to any other colour. Which means you could use a color profile to completely invert an image's colors, if you so desired.

There are a few kinds of profile, not just display profiles. Display profiles do describe what colors a display device is capable of displaying, but usually aren't embedded in files because they're specific to a given piece of hardware.

There are also input profiles, however, which are usually the ones you'd find embedded (such as by a scanner), and which describe precisely how the color values stored in an image file map to a standard device-independent color space, such as L*a*b*. This is the profile we care about, because it tells us how to correctly interpret the stored values.

When coupled with a display profile, such as the one that comes with your OS or monitor, both profiles are used to accurately map colour from the input profile to the display profile using a device-independent color space as a connection space: Input --> L*a*b* --> Display

It's even more complicated than all that gibberish above, but all we need to care about is that it works, and it's built-in to most image rendering software, so if we compare two images that are identical except one has an accurate input profile embedded, and the other doesn't, they're going to look different perceptually though they share the same underlying RGB values.

I hope that made some sense.

@david-russo
Copy link
Author

The CI error appears to be due to a dependency missing from Pillow's conda package. Pillow needs to be built with LittleCMS2 to support its ImageCms module. I developed using pip, and PyPI's Pillow package seems to come with everything included. Comparing the two packages (conda, pip), I can see it missing from conda.

I'm not familiar with conda, but one of the possible solutions seems somewhat involved: https://stackoverflow.com/questions/52741538/anaconda3-python-3-6-pillow-cannot-import-imagecms

@david-russo
Copy link
Author

david-russo commented Jun 11, 2020

The relevant conda package issue is here.

@JohannesBuchner
Copy link
Owner

Two suggestions:

  1. change the code so that when ImageCms is not loadable, it does not use this functionality, either with hasattr or a try-except(ImportError) construct.
  2. try to modify the travis install to install pillow with imagecms, perhaps with pip instead of conda. (adding conda uninstall pillow and pip install pillow).

@coveralls
Copy link

Coverage Status

Coverage decreased (-7.7%) to 86.047% when pulling 036d9fc on bl-dpt:colorhash-icc-support into 691f210 on JohannesBuchner:master.

@david-russo
Copy link
Author

I went with not allowing ImageCms to be optional due to the impact color profiles can have on an image's appearance, and I believe you were right earlier in thinking that they could be relevant to the other hash functions, so the code should probably be refactored to share this feature.

Copy link
Owner

@JohannesBuchner JohannesBuchner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good, I am impressed you got travis working with a single commit. It usually takes me a few.

Zooming out a bit, if we want this to be a feature for all hash functions, perhaps it should be applied outside this functions. It appears the correction can be applied such that the image object is modified.

Then perhaps we should instead of including this in imagehash, document for users the optimal way to pass an PIL image to imagehash. This should include how to correct for the icc profile, but also applying:

from PIL import ImageOps
image = ImageOps.exif_transpose(image)

Do you think this would be the right approach?

@@ -38,12 +38,17 @@ install:
- conda install --quiet --file conda-requirements.txt
- pip install coveralls

# Work around conda's Pillow package lacking ImageCms
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in line 9 add:

env:
    - REINSTALL_PILLOW=no
    - REINSTALL_PILLOW=yes

@@ -38,12 +38,17 @@ install:
- conda install --quiet --file conda-requirements.txt
- pip install coveralls

# Work around conda's Pillow package lacking ImageCms
- conda uninstall pillow
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and then here prefix the two commands here with [[ "$REINSTALL_PILLOW" == "yes" ]] &&

@@ -320,7 +321,7 @@ def whash(image, hash_size = 8, image_scale = None, mode = 'haar', remove_max_ha



def colorhash(image, binbits=3):
def colorhash(image, binbits=3, ignore_icc=False):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a positive variable name here to avoid the double negative. For example: apply_icc_profile=True

@david-russo
Copy link
Author

david-russo commented Jun 12, 2020

Yeah, I agree. Any necessary profile conversion should happen on the Image object before it being passed to any of the hash functions. As for where that's done:

ICC profiles aren't particularly simple to handle, and there are a number of ways to screw up the conversion between profiles if one is not familiar with the concepts or specifics, such as the pros and cons between different rendering intents and color spaces.

For instance, I chose to normalize on the sRGB color space here because it's the only one supported by PIL well enough to allow conversion into HSV, as the current colorhash algorithm requires. It's also usually the default color space in which software renders images without profiles, so it should be generally analogous to what most users would see on their screens... but ideally both images with and without color profiles would first be converted to one of the larger, human vision-based color spaces, such as LAB, before hashing, to better account for colors that lie outside the sRGB space.

Since such decisions require a fair bit of knowledge, and are, currently at least, fairly tightly coupled to the hashing implementations, I think we should keep profile conversion internal to imagehash.

More generally, I think we should abstract away the need for users to import PIL at all, allowing them to pass in file-type objects or file paths instead. The use of PIL internally seems more like an implementation detail to me, which we needn't bother the user with.

@david-russo
Copy link
Author

For this PR, do I also need to update data_files in setup.py with the new test images?

@JohannesBuchner
Copy link
Owner

I agree with what you say. I think there is a benefit though of passing PIL objects however, as one can apply additional image modifications beforehand. If I understand you correctly, we would need one function that prepares an image for 'L' mode use, and one that prepare it for 'HSV' mode use. These can be used within the hash functions, just like the current color space transformations.

@JohannesBuchner
Copy link
Owner

For this PR, do I also need to update data_files in setup.py with the new test images?

If you use them only for testing, no, if you want them to be installed in the site-package folder, yes. I think no.

@david-russo
Copy link
Author

david-russo commented Jun 12, 2020

I agree with what you say. I think there is a benefit though of passing PIL objects however, as one can apply additional image modifications beforehand.

There's little harm in supporting all three, I suppose :)

If I understand you correctly, we would need one function that prepares an image for 'L' mode use, and one that prepare it for 'HSV' mode use. These can be used within the hash functions, just like the current color space transformations.

Since the current colorhash algorithm converts images into both "L" and "HSV", Pillow seems to support conversion from RGB to either of them just fine, so we should be able to use a single function for the current implementation. It would only get more complicated if we wanted to go from LAB to certain other modes.

@JohannesBuchner
Copy link
Owner

I would be happy to include this as a single preprocessing helper function. A section in the README should explain why and how to use it. This will make it easy for users to plug it in.

Then we can also go with dependencies being optional, as it is not a core functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants