期刊文献+

基于时空特征的语音情感识别模型TSTNet 被引量:5

Speech Emotion Recognition TSTNet Based on Spatial-temporal Features
在线阅读 下载PDF
导出
摘要 针对社交语音由于语气、音调、语速等差异以及填充信息丢失或冗余等问题,提出一种基于时空特征的语音情感识别方法。该方法利用卷积神经网络(CNN)和双向循环神经网络(BiGRU)技术,包含空间特征提取、时间特征提取和特征融合3个模块。考虑到音频数据内容长短不一,首先对音频数据进行预处理,应用3种补零填充方法,得到不同尺度的语谱图。设计了空间特征提取方法捕获音频的局部特征,并利用时间特征提取方法获取音频数据的时间特征和前后语义关系,从而得到3个时空特征向量。此外,融合了时空特征向量并通过全连接层进行语音情感分类。利用科大讯飞语音情感数据集进行了数值实验,实验结果与传统语音情感识别模型的实验结果相比,在准确率、精确率、召回率和F1值等4项指标上均取得了较好结果。 For differences in tone,pitch,speaking speed,etc.of social speech and information loss or redundancy during filling,a speech emotional recognition method was proposed based on spatial-temporal features.The method applied convolutional neural network(CNN)and bilateral recurrent neural network(BiGRU),including spatial feature extraction module,temporal feature extraction module and feature fusion module.Considering the different lengths of audio data content,the audio data was preprocessed first,and three zero-padded padding lengths were applied to obtain spectrograms of different scales.Then the spatial feature extraction module was designed to capture the local feature of the audio,and used the temporal feature extraction module to obtain the temporal feature and the semantic relationship of the audio data,thus obtained three spatial-temporal feature vectors.In addition,these temporal feature vectors were fused and input full connection layer for classification of speech emotion.With the numerical experiment using IFLYTEK speech emotion data sets,the experiment achieved better results in the accuracy,precision,recall,and F1 value than those of the experiment of traditional speech emotion recognition model.
作者 薛均晓 黄世博 王亚博 张朝阳 石磊 XUE Junxiao;HUANG Shibo;WANG Yabo;ZHANG Chaoyang;SHI Lei(School of Software, Zhengzhou University, Zhengzhou 450002, China;School of Cyberspace Security, Zhengzhou University, Zhengzhou 450002, China;School of Information Engineering, Zhengzhou University, Zhengzhou 450002, China)
出处 《郑州大学学报(工学版)》 CAS 北大核心 2021年第6期28-33,共6页 Journal of Zhengzhou University(Engineering Science)
基金 河南省高等学校青年骨干教师培养计划(22020GGJS014)。
关键词 语音情感识别 语谱图 时空特征 speech emotion recognition spectrogram spatial-temporal features
  • 相关文献

参考文献6

二级参考文献44

  • 1叶庆云,蒋佳.基于语音MFCC特征的改进算法[J].武汉理工大学学报,2007,29(5):150-152. 被引量:9
  • 2余伶俐,蔡自兴,陈明义.语音信号的情感特征分析与识别研究综述[J].电路与系统学报,2007,12(4):76-84. 被引量:27
  • 3Minsk M L. The society of mind. New York: Touchstone, 1985:85-86.
  • 4Picard R W. Affeetive computing. London: MIT Press, 1997:192-195.
  • 5AIST. Successful development of a robot with appearance and performance similar to human [EB/OL]. (2009-05-13)[2014-02-12]. http://www.aist. go .jp/aist_e/latest research/2009/20090513/200905 13.html.
  • 6Ganchev T, Fakotakis N, Kokkinakis G. Comparative evaluation of various MFCC implementations on the speaker verification task // 10th International Conference on Speech and Computer: Proceedings of the SPECOM-2005. Patras, 2005:191-194.
  • 7李桂春,郑能恒,李泰.基于模糊隶属值加权的MFCC特征提取算法//第七届和谐人机环境联合学术会议(HHME2011)论文集.北京,2011:40-46.
  • 8Tyagi V, Wellekens C. On desensitizing the Mel- cepstrum to spurious spectral components for robust speech recognition // ICASSP'05. Vancouver, 2005: 529-532.
  • 9Xu Min, Duan Lingyu, Cai Jianfei, et al. HMM- based audio keyword generation. Lecture Notes in Computer Science, 2004, 3333:566-574.
  • 10Sahidullah M, Saha G. Design, analysis and experimental evaluation of block based transfor- mation in MFCC computation for speaker recognition. Speech Communication, 2012, 4(4): 543-565.

共引文献73

同被引文献43

引证文献5

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部