Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use colorhash to find similarity in percentage #207

Open
EBakirdinov opened this issue Feb 28, 2024 · 5 comments
Open

Use colorhash to find similarity in percentage #207

EBakirdinov opened this issue Feb 28, 2024 · 5 comments

Comments

@EBakirdinov
Copy link

EBakirdinov commented Feb 28, 2024

My question is can i use colorhash to find similarity of image in percentage.

Example:

test = imagehash.colorhash(Image.open(path1), binbits=64)
test_2 = imagehash.colorhash(Image.open(path2), binbits=64)

print(test - test_2)

let's imagine i get 75. But the question what is the max possible value for this two images. Is it 80 so my images are not similar or is it 800 so my images are quite similar.

@EBakirdinov EBakirdinov changed the title Use colorhash to find similariry in percentage Use colorhash to find similarity in percentage Feb 28, 2024
@JohannesBuchner
Copy link
Owner

JohannesBuchner commented Feb 28, 2024

The code is here https://github.com/JohannesBuchner/imagehash/blob/master/imagehash/__init__.py#L435

It computes a few numbers (14) for black, gray, and 6 histogram bins for faint and bright colors each. The numbers are between 0 and 2^binbits-1. The bits of these are then flattened into a single, large array of binary numbers.

The subtraction operation is here: https://github.com/JohannesBuchner/imagehash/blob/master/imagehash/__init__.py#L111
It counts the number of different bits.

So I guess the maximum possible is binbits*14?

@JohannesBuchner
Copy link
Owner

Similar images should have a small difference.

This function is designed with small binbits (default=3) in mind. If the number is way different, all 3 bits are likely different, while if they are similar, likely only one or two (the least significant bits) are different. This does not have to be true (in digits, 9 vs 10 has 2 differences, while the numbers are actually close together), so it is not ideal. But if you choose binbits=64, then counting the number of different bits is not a good approach, and does not really group quite similar things together.

All that said, the colorhash is just one possible implementation, and there are probably better approaches.

@EBakirdinov
Copy link
Author

@JohannesBuchner I'm little bit confused. Im new to it. For example i have one blank black image and one blank white image and binbits 32. At the end i'm getting 128. Shouldn't it be 448 (32 * 14)?

@EBakirdinov
Copy link
Author

EBakirdinov commented Feb 28, 2024

Similar images should have a small difference.

This function is designed with small binbits (default=3) in mind. If the number is way different, all 3 bits are likely different, while if they are similar, likely only one or two (the least significant bits) are different. This does not have to be true (in digits, 9 vs 10 has 2 differences, while the numbers are actually close together), so it is not ideal. But if you choose binbits=64, then counting the number of different bits is not a good approach, and does not really group quite similar things together.

All that said, the colorhash is just one possible implementation, and there are probably better approaches.

Ooh. So working with high binbits is not that efficient?

@JohannesBuchner
Copy link
Owner

@JohannesBuchner I'm little bit confused. Im new to it. For example i have one blank black image and one blank white image and binbits 32. At the end i'm getting 128. Shouldn't it be 448 (32 * 14)?

Maybe copy the function code of colorhash and run it line by line for an example image, and look at the variables. frac_black and frac_gray are probably as you expect, but I am not sure about h_bright_counts and h_faint_counts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants