Speech Accent Detection

Published: August 06, 2020

Summary

The Speech Accent Detection project focuses on classifying the accents of English language speakers based on audio recordings. Using machine learning models such as FFNN (Feed-Forward Neural Network), CNN (Convolution Neural Network), and LSTM (Long Short-Term Memory), the system accurately identifies accents. This capability can assist English language learners in understanding their accents and improving their pronunciation.

Objectives

Classify the speaker’s accent based on audio files (WAV format).
Implement machine learning models to accurately identify accents.
Assist English language learners in accent reduction and pronunciation improvement.

Dataset

The project utilizes the George Mason University Speech Accent Archive dataset, which contains around 3500 audio files and speakers from over 100 countries. Additionally, the CSTR VCTK Corpus and Mozilla Voice data are used to enhance the dataset.

Models

FFNN (Feed-Forward Neural Network): Achieved an accuracy of 90%.
CNN (Convolution Neural Network): Achieved an accuracy of 90%.
LSTM (Long Short-Term Memory): Achieved an accuracy of 87%.

Future Work

Explore multi-accent detection beyond native and non-native accents.
Train models with more datasets to classify a wider range of accents accurately.

Dr. Kushnazarov Farruh

Speech Accent Detection

Summary

Objectives

Dataset

Models

Future Work

Links

Share on