Speech Accent Detection
Published:
Summary
The Speech Accent Detection project focuses on classifying the accents of English language speakers based on audio recordings. Using machine learning models such as FFNN (Feed-Forward Neural Network), CNN (Convolution Neural Network), and LSTM (Long Short-Term Memory), the system accurately identifies accents. This capability can assist English language learners in understanding their accents and improving their pronunciation.
Objectives
- Classify the speaker’s accent based on audio files (WAV format).
- Implement machine learning models to accurately identify accents.
- Assist English language learners in accent reduction and pronunciation improvement.
Dataset
The project utilizes the George Mason University Speech Accent Archive dataset, which contains around 3500 audio files and speakers from over 100 countries. Additionally, the CSTR VCTK Corpus and Mozilla Voice data are used to enhance the dataset.
Models
- FFNN (Feed-Forward Neural Network): Achieved an accuracy of 90%.
- CNN (Convolution Neural Network): Achieved an accuracy of 90%.
- LSTM (Long Short-Term Memory): Achieved an accuracy of 87%.
Future Work
- Explore multi-accent detection beyond native and non-native accents.
- Train models with more datasets to classify a wider range of accents accurately.
Links
For more details, please visit the GitHub repository.