Skip to content

mahad-saleem/Speaker-Verification-MFCC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Speaker-Verification-MFCC

MFCCs method for feature extraction is perhaps the most popular and dominant for speech signals. MFCC is based on mel scale which is based on human ear scale. MFCC provide a robust mechanism to extract features of different speakers and are sensitive based on recordings and different speakers. It extracts the features from speech as our human ear does. Thus, one can see that this method might be the most accurate methods of all feature extraction methods, but perhaps this is not the case. As MFCCs are sensitive to recording conditions, one can expect conflicting results as well. There are certain parameters which govern the accuracy of MFCCs in which filter banks and no. of mel coefficients are prominent. We used the data from 10 different speakers and used it to test and train our speaker recognition model. We extracted 12 MFCCs as they contain the maximum information. Accuracy was widely affected by filter banks. Filter banks are number of mel scale coefficients. Here, triangular filter banks were used. Changing the number of filter banks improves or degrades our accuracy. The optimal number of filter banks which we tested were 26. The accuracy of our speaker recognition system can also be improved using Gaussian filters. In Gaussian filters, the distribution of our filter is in the form of a normal distribution. Another version of MFCCs is inverted MFCCs which further improves the accuracy of our system.
We tested MFCCs using two approaches. One was using the conventional flow process which is given below. This process was coded in python.

 The other approach was using the built-in library for feature extraction in python. We saw improved accuracy in using built-in library. This improvement in accuracy was perhaps based on the better window size, number of MFCCs and type of filter banks. This two way approach helped to learn the algorithm thoroughly and see the differences in our approaches.
For speaker classification purposes, Vector Quantization (VQ) is the simplest and easiest method. It clusters the feature vectors in K non-overlapping clusters. Feature vectors from each speaker is converted into a codebook. The technique of K nearest neighbours is used to classify a test speaker using Euclidean distance measures. There are other methods for classifying a test speaker which enhances the accuracy of our model. Mainly Support vector machines (SVM), Gaussian mixture models (GMM) and Artificial Neural networks (ANN) are used. These methods are rather complex than vector quantization (VQ). The combination of MFCCs and vector quantization is perhaps the easiest and simplest method. Our main focus here was to implement a speaker recognition system which is simple and easy to implement. Furthermore, this method can be generalized to any pattern recognition system.

Conclusion: We have proposed and evaluated the MFCCs method for feature extraction and Vector Quantization for speaker classification. Our experiments show that accuracy was much improved by using built-in library to calculate MFCCs as compared to our own implementation. The accuracy we got with built-in library was upto 80% with 12 filterbanks and it was much more than our own implementation, i.e. 50% with 12 filterbanks. Furthermore, MFCCs and vector quantization was very simple and easy to implement and these methods are universal and can be generalized to any pattern recognition system.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages