Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically find and match observation photos using SSIM #268

Open
Tracked by #227
JWCook opened this issue Oct 30, 2022 · 2 comments
Open
Tracked by #227

Automatically find and match observation photos using SSIM #268

JWCook opened this issue Oct 30, 2022 · 2 comments
Labels
idea Feature ideas to be evaluated later

Comments

@JWCook
Copy link
Member

JWCook commented Oct 30, 2022

This would be a feature to automatically match individual local photos to individual remote photos. If you already have a bunch of observations uploaded but not tagged, this would be a massive timesaver.

A basic implementation using an existing SSIM library would be manageable. Possible libraries:

  • scikit-image
  • SSIM-PIL
  • pyssim
  • tensorflow.image.ssim
  • PIQ

Some preliminary tests are here: https://github.com/pyinat/naturtag/blob/ssim/test_ssim.py

Basic steps might look something like:

  • Get all user's observations
  • Download medium-sized versions of all observation photos
  • Get all specified local photos
  • Narrow down matches by comparing EXIF date taken with observation date/time, += some time window
    • iNat strips photo metadata, so we don't have date taken info for remote images
  • Generally assume that remote images are a subset of local images
  • For each observation photo:
    • Get SSIM score against each candidate local photo (within possible time window)
    • Pick highest-ranked image, with some score threshhold (in case no local match exists)

Notes:

  • This could be quite computationally intensive for a large photo collection/observation history, so performance optimization may end up being the most difficult part.
  • Some of the above libraries have fairly heavy dependencies, and I don't want to bloat the size of the naturtag installer too much
  • The best performance is going to be with GPU acceleration, but that requires platform-specific binaries that I don't think I could package with naturtag
  • Possibly include both a CPU-only version and a GPU-optimized version?
@JWCook JWCook added the idea Feature ideas to be evaluated later label Oct 30, 2022
@JWCook JWCook mentioned this issue Oct 30, 2022
55 tasks
@barnabywalters
Copy link

I just found out about naturtag, and have been working on a much simpler script to do this specific task. I’ve been using datetime-based photo-to-observation matching when possible (sadly iNat makes this unnecessarily difficult), falling back to RMSE image comparison in ambiguous cases. Happy to share what I end up with and maybe send a PR once I’ve got something working.

@JWCook
Copy link
Member Author

JWCook commented Oct 10, 2024

Cool, thanks for reaching out! Have you had success with matching some of your own observation photos this way? I believe I previously tried matching image timestamps (like EXIF DateTimeOriginal) and iNat observation metadata (observed_on), but that didn't work well for observations with multiple photos. There is some additional per-photo metadata available on the photo info pages on inaturalist.org. It's not available through the API, but I have an example of scraping it here (and some related links here). The "Date time original" field is likely the best candidate, but I'm not sure if that reliably matches the source EXIF or XMP metadata.

This is an interesting (and IMO very useful) problem to solve, so I'd appreciate any other notes or ideas you have on the topic!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
idea Feature ideas to be evaluated later
Projects
None yet
Development

No branches or pull requests

2 participants