Skip to content

A command-line app to identify duplicate files by comparing samples of files. Especially useful for large sets of media files.

License

Notifications You must be signed in to change notification settings

feromax/qwkdupefinder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

qwkdupefinder

A command-line app to identify duplicate files by comparing samples of files. Especially useful for large sets of media files.

Building and Running

with your golang environment

Assuming you have a working Go v1.14+ setup,

cd $GOPATH/src
git clone https://github.com/feromax/qwkdupefinder
cd qwkdupefinder
go build -o qwkdupefinder cmd/qwkdupefinder.go
./qwkdupefinder -h

don't have golang? if you have docker...

These steps will use a prepacked golang v1.14 container image to build the binary. Assuming you'll want the binary in ~/bin,

  1. Download and start the golang container.
    $ cd
    $ mkdir -p bin
    $ docker run -it --rm -v `pwd`/bin:/build golang:1.14.7-alpine
    
  2. Once the container image has downloaded and started, clone the source code repositoroy and build.
    /go # cd src
    /go/src # apk add --no-cache git
    /go/src # git clone https://github.com/feromax/qwkdupefinder
    /go/src # go build -o /build/qwkdupefinder qwkdupefinder/cmd/qwkdupefinder.go
    /go/src # exit
    
  3. Magically, the built artifact (i.e., the binary) will be in ~/bin/. Run via
    ./bin/qwkdupfinder -h
    

Sample Session

# saves a copy of the output to /tmp/dupes.txt while showing it live in your terminal session
$ ~/bin/qwkdupfinder /mnt/media /mnt/restore /home/admin/go /home/admin/work_files/ | tee /tmp/dupes.txt

=================================================================================================================
DUPLICATES REPORT ON BASIS OF FILESIZE AND SAMPLING CONTENTS -- ✓ indicates verified duplicate (for files <100KB)
=================================================================================================================
MATCH:1  	SIZE:1153525978	"/mnt/media/home-movies/298329884.mov"
MATCH:1  	SIZE:1153525978	"/mnt/restore/0000000238.MOV"

MATCH:2         SIZE:2335727    "/mnt/media/pics/IMAG0107.jpg"
MATCH:2         SIZE:2335727    "/mnt/restore/PIC-00000019238.JPG"

MATCH:3 ✓       SIZE:4764       "/home/admin/work_files/2019/o/go/pkg/mod/github.com/go-sql-driver/[email protected]/rows.go"
MATCH:3 ✓       SIZE:4764       "/home/admin/go/src/github.com/go-sql-driver/mysql/rows.go"

To use the output to help you remove duplicates, run this

$ egrep "(^$|^MATCH:)" /tmp/dupes.txt | sed -e 's/^[^"]*//' > /tmp/dupes.raw

and edit /tmp/dupes.raw to REMOVE ALL FILES YOU WANT TO KEEP.

Once you're confident that the only files left are the ones you want to disappear, which may resemble my edits --

# file /tmp/dupes.raw

"/mnt/restore/0000000238.MOV"

"/mnt/restore/PIC-00000019238.JPG"

"/home/admin/work_files/2019/0/go/pkg/mod/github.com/go-sql-driver/[email protected]/rows.go"

-- run this and re-read your list of files that are about to go away forever:

$ sed -e 's/^"/rm "/' /tmp/dupes.raw
rm "/mnt/restore/0000000238.MOV"

rm "/mnt/restore/PIC-00000019238.JPG"

rm "/home/admin/work_files/2019/o/go/pkg/mod/github.com/go-sql-driver/[email protected]/rows.go"

If the above rm commands look good to you, complete the deduplication:

# sed -e 's/^"/rm "/' /tmp/dupes.raw | sh

(remove the comment character to actually perform the deletion.)

About

A command-line app to identify duplicate files by comparing samples of files. Especially useful for large sets of media files.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages