Skip to content

Urban sound audio classification, Multi-label image classification, Content-Based Image Retrieval

Notifications You must be signed in to change notification settings

veronicamorelli/Digital-Signal-Image-Management

Repository files navigation

Digital-Signal-Image-Management

Overview

The project consists in the development of an application for the recognition of one-dimensional signals (audio) and two-dimensional signals (images). Specifically we have developed three different task:

  • Processing-1D: Recognize the class to which an urban sound belongs with ML and DL models. For solve this task we have tried different models and different configuration of features (energy, zero crossing rate, mfcc, spectrogram, etc...)
  • Processing-2D: Correctly classify the state of health (healthy/unhealthy) and the crop species with DL models, so it's a multi label multi class classification. In this case we created CNN from scratch and we also tried different pretrained architecture with weights based on general task (ImageNet).
  • Retrieval: Find the ten images most similar to the input image, using both a pretrained neural network and an autoencoder as feature extractor

Data

All the data used for this project were collected directly in the following ways:

How to run code

Unless otherwise specified in the notebook section all codes can be runned in Google Colaboratory platform. All notebooks all already setted to import the necessary packages and also in this way you can easily use a GPU!

Results Table

Comparative result of models based on test set created by subsampling the original dataset:

  • Processing-1D:

SVM Classification

Features Accuracy
Energy 0.20
Duration 0.18
Zero Crossing Rate 0.20
Spectrogram 0.11
Mel Spectrogram 0.26
MFCC 0.17
Energy + Duration 0.20
Energy + Duration + Zero Crossing Rate 0.42

CNN on Mel Spectrogram

Architectures Training Accuracy Validation Accuracy
3 convolutional layers
ReLu
Max Pooling
Dropout
1 dense layer
0.98 0.89
3 convolutional layers
ReLu
Max Pooling
Batch Normalization
Dropout
1 dense layer
0.96 0.86
  • Processing-2D:

The results obtained with the best CNN from scratch and with the pretrained architectures are shown

Architectures Training Accuracy Validation Accuracy Training Loss Validation Loss
CNN from scratch 0.79 0.85 0.21 0.19
MobileNet-V2 0.53 0.57 0.50 0.49
ResNet-50 0.56 0.55 0.49 0.47

About us

Aurora Cerabolini - Data Science Student @ University of Milano-Bicocca

Veronica Morelli - Data Science Student @ University of Milano-Bicocca

About

Urban sound audio classification, Multi-label image classification, Content-Based Image Retrieval

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published