This is a simple program to find duplicate images or images that are similar to each other under a
given directory.
This program uses pHash library to identify duplicates and similar images.
-
Download the application from the release page.
-
Run the program in a command line with the following command line arguments
./DupIm <path_to_directory> <threshold_count>
-
<path_to_directory> is the relative or absolute path to the directory under which the program recursively finds for duplicate or similar images.
-
<threshold_count> is the sensitivity of the search. It must be a positive integer. If the number is closer to zero, the search will be more strict and the images will have to be very similar in order to be detected. Default value of 15 is used if not provided.
-
On completion, you will find that two new files will be present in the current directory. These are
-
'Dupim.output.txt' this is the file that contains the result of the search. It lists all the files that were found to be duplicates or similar to each other with their hamming distance (note that smaller the hamming distance, more similar are the images)
-
'Dupim.log.txt' this is just a log file that contains all the pHashes of images that were recorded. If an image path does not exists in the file means that it was found to be a duplicate of an existing file. Check the output file to see the duplicate.
-
This application depends on the following c++ libraries:
- opencv2
- pHash
Once the dependencies are installed, clone the repository with
git clone https://github.com/nishantHolla/DupIm.git
and run
cmake -S . -B build
to build the application