This project addresses the challenge of accent diversity in English speech recognition systems, aiming to enhance their accuracy through the implementation of a supervised machine learning model. The model utilizes Sequential Mel-frequency cepstral coefficients (MFCC) features to classify speakers as either having an Indian or American English accent.
- Proposed Method: Sequential MFCC features are extracted from audio signals to provide a unique perspective for accent identification.
- Dataset: The dataset comprises 3-5 second audio clips from VCTK-corpus, categorized into training (80%) and testing (20%).
- Preprocessing: To address data imbalances, oversampling is applied to the training set. Feature extraction involves calculating 20 MFCC coefficients for each audio file using the Librosa library.
- Feature Extraction: Mel-frequency cepstral coefficients (MFCC) are calculated for each audio frame, and sequential concatenation of these features provides a robust input set for accent classification.
- Supervised Learning: Various classifiers, including K-Nearest Neighbors, Support Vector Machines, Gaussian Mixture Models, Neural Networks, and Logistic Regression, are employed for accent classification.
- Evaluation Metrics: Precision, recall, reject rate, and overall accuracy are used to evaluate classifier performance.
- Top Performers: Neural Networks, K-Nearest Neighbors, and Logistic Regression demonstrate superior performance, with Neural Networks emerging as the top classifier.
- Promising Results: Accent classification through Sequential MFCC features showcases promising results, with up to 95% accuracy.
- Future Work: Incorporating this classification into speech recognition systems, extending the model to diverse accents and dialects, and exploring Hidden Markov Models for classification are areas for future development.
- This project was developed during the IIIT Hyderabad Hackathon as part of the Qualcomm problem statement on Accent Detection.
- The evaluation criteria include accuracy scores, novelty in methodology or ideation, and the clarity of presentation.