pfann

This is an unofficial reproduction of paper "Neural Audio Fingerprint for High-specific Audio Retrieval based on Contrasive Learning."

Now I have a thesis that is a "trivial" improvement to the above paper: "Improvement of Neural Network- and Landmark-based Audio Fingerprinting" (in Traditional Chinese). Link here

Note: I am now employed ~~and our company does not allow GitHub login during work~~. I have less time to work on my side project or maintain my thesis code, ~~and I do not have access to high performance GPU (currently), so I cannot solve compatibility issues or problems related to training.~~ Finally I bought a gaming computer in 2023, now I can help you solve training issues.

Install

conda install python=3.9 # python 3.10 doesn't work with faiss...
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia # I forget which version of PyTorch I used, but latest PyTorch seems to work
conda install -c pytorch faiss-gpu # can also be faiss-cpu if you don't test gpu-accelerated search
pip install tqdm
pip install tensorboardX
pip install torch_optimizer
pip install scipy
pip install julius
pip install matplotlib # for visualization purpose, not needed for server
pip install seaborn # for visualization purpose, not needed for server
pip install scikit-learn

Prepare dataset

FMA dataset

Download fma_medium from https://github.com/mdeff/fma and unzip to ../pfann_dataset/fma_medium .

python tools/listaudio.py --folder ../pfann_dataset/fma_medium --out lists/fma_medium.csv
python tools/filterduration.py --csv lists/fma_medium.csv --min-len 29.9 --out lists/fma_medium_30s.csv
python tools/traintestsplit.py --csv lists/fma_medium_30s.csv --train lists/fma_medium_train.csv --train-size 10000 --test lists/fma_medium_valtest.csv --test-size 1000
python tools/traintestsplit.py --csv lists/fma_medium_valtest.csv --train lists/fma_medium_val.csv --train-size 500 --test lists/fma_medium_test.csv --test-size 500
python tools/traintestsplit.py --csv lists/fma_medium_train.csv --train-size 2000 --train lists/fma_inside_test.csv
rm test.csv
python tools/listaudio.py --folder ../pfann_dataset/fma_large --out lists/fma_large.csv

AudioSet

Download 3 csv files unbalanced_train_segments.csv, balanced_train_segments.csv, eval_segments.csv, and ontology.json from https://research.google.com/audioset/download.html . Then run these to list all the videos needed:

python tools/audioset.py /path/to/unbalanced_train_segments.csv lists/audioset1.csv --ontology /path/to/ontology.json
python tools/audioset.py /path/to/balanced_train_segments.csv lists/audioset2.csv --ontology /path/to/ontology.json
python tools/audioset.py /path/to/eval_segments.csv lists/audioset3.csv --ontology /path/to/ontology.json

Use these commands to crawl videos from youtube and convert to wav:

python tools/audioset2.py lists/audioset1.csv ../pfann_dataset/audioset
python tools/audioset2.py lists/audioset2.csv ../pfann_dataset/audioset
python tools/audioset2.py lists/audioset3.csv ../pfann_dataset/audioset

After downloading, run this command to list all successfully downloaded files:

python tools/listaudio.py --folder ../pfann_dataset/audioset --out lists/noise.csv

This command will show errors because some videos are unavailable.

Finally run the command:

python tools/filterduration.py --csv lists/noise.csv --min-len 9.9 --out lists/noise_10s.csv
python tools/traintestsplit.py --csv lists/noise_10s.csv --train lists/noise_train.csv --train-size 8 --test lists/noise_val.csv --test-size 2 -p

Microphone impulse response dataset

Go to http://micirp.blogspot.com/ , and download files to ../pfann_dataset/micirp. Then run the commands:

python tools/listaudio.py --folder ../pfann_dataset/micirp --out lists/micirp.csv
python tools/traintestsplit.py --csv lists/micirp.csv --train lists/micirp_train.csv --train-size 8 --test lists/micirp_val.csv --test-size 2 -p

Aachen Impulse Response Database

Download zip from https://www.iks.rwth-aachen.de/en/research/tools-downloads/databases/aachen-impulse-response-database/ and unzip to ../pfann_dataset/AIR_1_4.

python -m datautil.ir ../pfann_dataset/AIR_1_4 lists/air.csv
python tools/traintestsplit.py --csv lists/air.csv --train lists/air_train.csv --train-size 8 --test lists/air_val.csv --test-size 2 -p

Train

python train.py --param configs/default.json -w4

Generate query

Inside test (not used in my thesis anymore):

python genquery.py --params configs/gentest.json --len 10 --num 2000 --mode train --out out/queries/inside

Assume that you have installed all the datasets, then just run this to generate all queries:

./genall.sh

Will output to folders out/queries/out2_snr$snr, where $snr is one of -6, -4, -2, 0, 2, 4, 6, 8. The query list (used by matcher.py) is out/queries/out2_snr$snr/list.txt, and the ground truth is out/queries/out2_snr$snr/expected.csv.

Build a fingerprint database

Inside test (not used in my thesis anymore):

python tools/csv2txt.py --dir ../pfann_dataset/fma_medium lists/fma_medium_train.csv --out lists/fma_medium_train.txt
python builder.py lists/fma_medium_train.txt /path/to/db configs/default.json

Usage of builder.py:

python builder.py <music list file> <output database location> <model config>

Music list file is a file containing list of music file paths. File must be UTF-8 without BOM. For example:

/path/to/fma_medium/000/000002.mp3
/path/to/fma_medium/000/000005.mp3
/path/to/your/music/aaa.wav
/path/to/your/music/bbb.wav

Model config is a JSON file like in configs/ folder. It is used to load a trained model. If omitted, the model config is configs/default.json by default.

This program supports both MP3 and WAV audio format. Relative paths are supported but not recommended.

Recognize music

Usage of matcher.py:

python matcher.py <query list> <database location> <output result file>

Query list is a file containing list of query file paths. For example:

/path/to/queries/out2_snr2/000002.wav
/path/to/queries/out2_snr2/000005.wav
/path/to/song_recorded_on_street1.wav
/path/to/song_recorded_on_street2.wav

Database location is the place where builder.py saves database.

The result file will be a TSV file with 2 fields: query file path, and matched music path, but without header. It may look like this:

/path/to/queries/out2_snr2/000002.wav	/path/to/fma_medium/000/000002.mp3
/path/to/queries/out2_snr2/000005.wav	/path/to/fma_medium/000/000005.mp3
/path/to/song_recorded_on_street1.wav	/path/to/your/music/aaa.wav
/path/to/song_recorded_on_street2.wav	/path/to/your/music/aaa.wav

Matcher will also generate a _detail.csv file and a .bin file. CSV file contains more information about the matches. It has 5 columns: query, answer, score, time, and part_scores.

query: Query file path
answer: Matched music path
score: Matching score, used in my thesis
time: The time when the query clip starts in the matched music, in seconds
part_scores: Mainly used for debugging, currently empty

BIN file contains matching scores of every database music for each query. It is used in my ensemble experiments. The file format is a flattened 2D array of following structure, without header:

struct match_t {
  float score; // Matching score
  float offset; // The time when the query clip starts in the matched music, in seconds
};

The matching score of j-th database music in i-th query is at index i * database size + j.

Evaluation

python tools/accuracy.py /path/to/query6s/expected.csv /path/to/result_detail.csv

Ensemble experiment

python ensemble/svmheatmap2.py out/lm_ out/shift_4_ out/svm lin_acc.csv

More info TODO

Name		Name	Last commit message	Last commit date
Latest commit History 240 Commits
configs		configs
cpp		cpp
datautil		datautil
denoise		denoise
ensemble		ensemble
lists		lists
tools		tools
.gitignore		.gitignore
builder.py		builder.py
cppmatcher.py		cppmatcher.py
database.py		database.py
extractemb.py		extractemb.py
genall.sh		genall.sh
genquery.py		genquery.py
genquery_naf.py		genquery_naf.py
matchemb.py		matchemb.py
matcher.py		matcher.py
matchfromgt.py		matchfromgt.py
model.py		model.py
preview.py		preview.py
readme.md		readme.md
rebuild.py		rebuild.py
simpleutils.py		simpleutils.py
testall.sh		testall.sh
thesis.pdf		thesis.pdf
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

pfann

Install

Prepare dataset

FMA dataset

AudioSet

Microphone impulse response dataset

Aachen Impulse Response Database

Train

Generate query

Build a fingerprint database

Recognize music

Evaluation

Ensemble experiment

About

Uh oh!

Releases

Packages

Languages

stdio2016/pfann

Folders and files

Latest commit

History

Repository files navigation

pfann

Install

Prepare dataset

FMA dataset

AudioSet

Microphone impulse response dataset

Aachen Impulse Response Database

Train

Generate query

Build a fingerprint database

Recognize music

Evaluation

Ensemble experiment

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages