Skip to content
This repository has been archived by the owner on Jan 31, 2024. It is now read-only.

duplicates #46

Answered by Avnsx
AHAXPOHOC asked this question in Q&A
Discussion options

You must be logged in to vote

Short:
The scraper uses content hashing, to decide if a image is already known or if it has not been downloaded before ever.
MD5 hashing is being used for videos & https://github.com/JohannesBuchner/imagehash - average_hash for photos.

Longer answer:
Combined with the update_recent_download function (which initiates even more hashing for uniqueness): #16 (comment)

So duplicates, is every content that is either too similar to each other or the exact same.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by Avnsx
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #44 on September 05, 2022 03:21.