-
hi! have some question... |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Short: Longer answer: So duplicates, is every content that is either too similar to each other or the exact same. |
Beta Was this translation helpful? Give feedback.
Short:
The scraper uses content hashing, to decide if a image is already known or if it has not been downloaded before ever.
MD5 hashing is being used for videos & https://github.com/JohannesBuchner/imagehash -
average_hash
for photos.Longer answer:
Combined with the
update_recent_download
function (which initiates even more hashing for uniqueness): #16 (comment)So duplicates, is every content that is either too similar to each other or the exact same.