Term Project, CSC602: Machine Learning, Spring 2024, IIIT Kalyani
Name: Nikhil Laxminarayana, Harsh Singh Rawat
Reg: ECE/21136/796, ECE/21127/787
We present a pipeline for training classifiers over a diverse dataset to classify instrument samples using frequency domain feature extraction with the help of KAPRE: Keras Audio PREprocessors library. We choose to name this tool SPACK. This is also the term project for the course CSC602: Machine Learning, offered in Spring 2024 at IIIT Kalyani.
- Clone this repository, then create a virtual environment using
venv
orconda
and then to install the dependencies typepip install -r requirements.txt
in the terminal where the cloned repository resides after activating the virtual environment. - Then run
clean.py
to generate aclean
directory. These are the processed wavfiles used for classification. Then runtrain.py --model-type conv1d
to train a Conv1D net over theclean
dataset. - Change the value of the model type to
conv2d
and then tolstm
to store the model parameters intomodels
directory. - The models are now ready to be used to predict values. To predict values over the trainig data, run
predict.py
with the output logs iny_pred.npy
file. - Some notebooks in
notebooks/
directory contain code snippets used to generate the confusion matrix and ROC characteristics for the models.
We are able to achieve good accuracies over the testing data, containing diverse audio sample .wav files.
Confusion Matrix for LSTM | ROC for LSTM |
---|---|
Train accuracy vs epoch for the LSTM model |
---|
- Choi, Keunwoo & Joo, Deokjin & Kim, Juho. (2017). Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras. link