music-classification-cnn 🎸🎹🎺🎷

A deep learning project leveraging Convolutional Neural Networks to classify audio clips into musical genres.

Overview

This project aims to classify audio clips into musical genres using Convolutional Neural Networks (CNNs). The dataset used for this project is the GTZAN dataset, which consists of 1000 audio clips of 30 seconds each, equally distributed across 10 musical genres. The dataset is divided into 10 folders, each containing 100 audio clips of a particular genre. The audio clips are in .wav format and have a sampling rate of 22050 Hz.

The project was initially developed for the class project of the course "Neural Networks and Deep Learning" at the University of Pula. The project was later extended to include additional features and improvements. While the developed models got near perfect accuracy on the training set, the accuracy on the test set was around 50%. This indicates that the models were overfitting the training data. To address this issue, the project was extended to include data augmentation and regularization techniques as well as finetuning and transfer learning using pre-trained models, such as VGG16 and MobileNet.

Final results showed that the best model achieved an accuracy of 80% on the test set for top-2 genre classification. The model was trained on a dataset that was augmented using pitch shifting, time stretching, frequency masking and time masking.

Mel Spectrogram

Mel spectrogram is a representation of the spectrum of a sound signal as a function of time. It is obtained by applying the Short-Time Fourier Transform (STFT) to the audio signal and then mapping the resulting spectrum to the mel scale. The mel scale is a perceptual scale of pitches that approximates the human ear's response to different frequencies. The mel spectrogram is commonly used in audio processing tasks, such as speech recognition and music classification.

For this project, I generated my own mel spectrograms using the librosa library in Python. The mel spectrograms were used as input to the CNN models for genre classification.

Mel Spectrogram of an audio clip from the GTZAN dataset

Data Augmentation Techniques applied on spectrogram images

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
images		images
LICENSE		LICENSE
README.md		README.md
arial.ttf		arial.ttf
music-genre-classification.ipynb		music-genre-classification.ipynb
nmdu_original.ipynb		nmdu_original.ipynb
spotify-scraper.ipynb		spotify-scraper.ipynb
spotify_test.ipynb		spotify_test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

music-classification-cnn 🎸🎹🎺🎷

Overview

Mel Spectrogram

About

Releases

Packages

Languages

License

lukablaskovic/music-classification-cnn

Folders and files

Latest commit

History

Repository files navigation

music-classification-cnn 🎸🎹🎺🎷

Overview

Mel Spectrogram

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages