摘要
语音是情感表达的重要途径,自然状态和表演状态下的语音所蕴含的情感信息并不完全相同.为了探索自然状态和表演状态下语音情感识别的差异,采用深度学习算法分析了IEMOCAP公用数据集,对自然状态和表演状态下的中性、愤怒、开心和悲伤等四类情绪语音数据进行实验:首先提取语音数据的声学特征(对比了emobase2010特征集和eGeMAPs特征集),然后利用卷积神经网络(Convolutional Neural Networks,CNN)对自然与表演状态下的语音情感进行识别,比较了两种状态下的情感识别率,再利用混淆矩阵分析两种状态下不同情绪之间的误分率和相似性.实验结果显示,自然状态下的情感识别率明显高于表演状态下,还发现愤怒和悲伤在两种状态下的误分率有明显区别.该现象对理解情绪的表达机制有启发意义。
Speech is an important way of emotional expression. The emotional information is not the same under the speech state of nature and scripted. In order to explore the difference of speech emotion recognition under the nature and the scripted state, the deep learning algorithm is used to analysis IEMOCAP public datasets. Four types of emotions, such as neutral,anger,happy and sad,are analyzed in the following experiments. Firstly,acoustic features are extracted (compared in the emobase2010 and eGeMAPs features sets). Then,Convolution Neural network (CNN) was carried out to recognize speech emotion in the nature and scripted state,respectively. Finally,confusion matrix is used to analyze the difference of the recognition accuracy of two states in every emotions. Results show that the emotion recognition accuracy in natural state was significantly higher than the one in the scripted state. There was also significant difference in the two states for angry and sad emotions. The results would be helpful for understanding the mechanism of emotional expression.
作者
王蔚
胡婷婷
冯亚琴
Wang Wei;Hu Tingting;Feng Yaqin(MLC Lab,Department of Educational Technology,School of Educational Science,Nanjing Normal University,Nanjing,210097,China)
出处
《南京大学学报(自然科学版)》
CAS
CSCD
北大核心
2019年第4期660-666,共7页
Journal of Nanjing University(Natural Science)
基金
国家哲学社会科学基金(BCA150054)
关键词
情感类别
语音情感识别
深度学习
伪装语音
emotion categorization
speech emotion recognition
deep learning
deceptive speech