Pan, S.-T.; Wu, H.-J. Performance Improvement of Speech Emotion Recognition Systems by Combining 1D CNN and LSTM with Data Augmentation. Electronics2023, 12, 2436.
Pan, S.-T.; Wu, H.-J. Performance Improvement of Speech Emotion Recognition Systems by Combining 1D CNN and LSTM with Data Augmentation. Electronics 2023, 12, 2436.
Pan, S.-T.; Wu, H.-J. Performance Improvement of Speech Emotion Recognition Systems by Combining 1D CNN and LSTM with Data Augmentation. Electronics2023, 12, 2436.
Pan, S.-T.; Wu, H.-J. Performance Improvement of Speech Emotion Recognition Systems by Combining 1D CNN and LSTM with Data Augmentation. Electronics 2023, 12, 2436.
Abstract
In recent years, with the popularity of smart mobile devices, the interaction between devices and users, especially in the form of voice interaction, has become increasingly important. If smart devices can understand more users' emotional states through voice data, more customized services can be provided for users. This paper proposes a novel machine learning model for speech emotion recognition, which combines convolutional neural networks (CNN), long short-term memory neural networks (LSTM), and deep neural networks (DNN), called CLDNN. To make the designed system can recognize the audio signal closer to the human auditory system does, this article uses the Mel frequency cepstral coefficients (MFCCs) of audio data as the input of the machine learning model. First, the MFCCs of the voice signal is extracted as the input of the model, and the feature values of the data are calculated using several local feature learning blocks (LFLB) composed of one-dimensional CNN. Because the audio signals are time-series data, the feature values obtained from LFLBs then input into LSTM layer to enhance the learning on time-series level. Finally, fully connected layers are used for classification and prediction. Three databases RAVDESS, EMO-DB and IEMOCAP are used for the experiments in this paper. The experimental results show that the proposed method can improve the accuracy compared to other related researches in speech emotion recognition.
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.