Improved MFCC Features and TWM Model for Speech Emotion Recognition

下载PDF

导出

摘要 To solve the problem that traditional Mel Frequency Cepstral Coefficient(MFCC)features cannot fully represent dynamic speech features,this paper introduces first⁃order and second⁃order difference on the basis of static MFCC features to extract dynamic MFCC features,and constructs a hybrid model(TWM,TIM⁃NET(Temporal⁃aware Bi⁃directional Multi⁃scale Network)WGAN⁃GP(Wasserstein Generative Adversarial Network with Gradient Penalty)multi⁃head attention)combining multi⁃head attention mechanism and improved WGAN⁃GP on the basis of TIM⁃NET network.Among them,the multi⁃head attention mechanism not only effectively prevents gradient vanishing,but also allows for the construction of deeper networks that can capture long⁃range dependencies and learn from information at different time steps,improving the accuracy of the model;WGAN⁃GP solves the problem of insufficient sample size by improving the quality of speech sample generation.The experiment results show that this method significantly improves the accuracy and robustness of speech emotion recognition on RAVDESS and EMO⁃DB datasets.

作者 Liyan Zhang Jiaxin Du Shuang Chen Jiayan Li

机构地区 School of Computer and Communication Engineering School of Information Engineering

出处《Journal of Harbin Institute of Technology(New Series)》 2025年第6期38-46,共9页 哈尔滨工业大学学报(英文版)

关键词 dynamic features speech emotion recognition multi⁃head attention mechanism generative adversarial networks

分类号 TP183 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献9

1陶建华,陈俊杰,李永伟.语音情感识别综述[J].信号处理,2023,39(4):571-587. 被引量：21
2樊永红,黄鹤鸣,张会云.基于焦点损失的ATCN-GRU语音情感识别[J].计算机仿真,2024,41(2):249-254. 被引量：7
3张永梅,齐昊宇,郭奥.基于WGAN和多头注意力机制的学生数据生成模型[J].北方工业大学学报,2024,36(1):76-83. 被引量：4
4焦亚萌,周成智,李文萍,崔琳,董免.融合多头注意力的VGGNet语音情感识别研究[J].国外电子测量技术,2022,41(1):63-69. 被引量：15
5周佳鑫,焦亚萌,王彦斌,郑燕茹.融合注意力和辅助分类器的膨胀残差网络语音情感识别研究[J].国外电子测量技术,2023,42(8):19-25. 被引量：7
6张可欣,刘云翔.融合多种语言的语音情感识别[J].电子设计工程,2023,31(6):25-29. 被引量：1
7张晓莉.改进MFCC特征和MLA模型的语音情感识别[J].福建电脑,2024,40(1):52-56. 被引量：6
8崔晨露,崔琳.面向数据增强的轻量化语音情感识别[J].计算机与现代化,2023(4):83-89. 被引量：8
9周菊香,刘金生,甘健侯,吴迪,李子杰.基于多尺度时序感知网络的课堂语音情感识别方法[J].计算机应用,2024,44(5):1636-1643. 被引量：3

二级参考文献56

1苗敏敏,徐宝国,胡文军,王爱民,宋爱国.基于自适应优化空频微分熵的情感脑电识别[J].仪器仪表学报,2021,42(3):221-230. 被引量：18
2许雪琼,余小清,李昌莲,万旺根.改进波形相似叠加算法的音频时长调整[J].应用科学学报,2009,27(5):514-519. 被引量：2
3韩文静,李海峰,马琳.考虑情感程度相对顺序的维度语音情感识别[J].信号处理,2011,27(11):1658-1663. 被引量：2
4韩文静,李海峰,阮华斌,马琳.语音情感识别研究进展综述[J].软件学报,2014,25(1):37-50. 被引量：177
5邵兵,杜鹏飞.基于卷积神经网络的语音情感识别方法[J].科技创新导报,2016,13(6):87-90. 被引量：6
6田熙燕,徐君鹏,杜留锋.基于语谱图和卷积神经网络的语音情感识别[J].河南科技学院学报（自然科学版）,2017,45(2):62-68. 被引量：8
7张一珂,张鹏远,颜永红.基于对抗训练策略的语言模型数据增强技术[J].自动化学报,2018,44(5):891-900. 被引量：21
8张若凡,黄俊,古来,许二敏,古智星.基于语谱图的老年人语音情感识别方法[J].软件导刊,2018,17(9):28-31. 被引量：3
9曾润华,张树群.改进卷积神经网络的语音情感识别方法[J].应用科学学报,2018,36(5):837-844. 被引量：12
10陈逸灵,程艳芬,陈先桥,王红霞,李超.PAD三维情感空间中的语音情感识别[J].哈尔滨工业大学学报,2018,50(11):160-166. 被引量：7

共引文献56

1张晋婧,刘双峰,丰雷,张瑜.融合注意力机制的人脸识别算法研究[J].国外电子测量技术,2023,42(2):107-113. 被引量：10
2朱海艳,张付春,季跃龙,李盟,王百洋.基于神经网络的脑电信号体质检测研究[J].数字印刷,2022(6):53-63. 被引量：1
3崔晨露,崔琳.面向数据增强的轻量化语音情感识别[J].计算机与现代化,2023(4):83-89. 被引量：8
4闫超,贾振堂.基于Transformer与增强信息融合的双源情感识别[J].国外电子测量技术,2023,42(4):187-193. 被引量：6
5任倩,王博.基于人机交互的心理健康监测数据异常标记识别研究[J].自动化与仪器仪表,2023(7):182-186. 被引量：2
6许萌,韩鹏.面向学前教育对话机器人的多模态情感识别实现关键技术[J].自动化与仪器仪表,2023(9):137-141. 被引量：1
7许春冬,汪雄,闵源.融合注意力机制的SimNet声音事件定位与检测算法[J].国外电子测量技术,2023,42(8):33-39. 被引量：1
8孙颖,李泽,张雪英.基于约束式双通道模型的语音情感识别[J].东北大学学报（自然科学版）,2023,44(11):1537-1542. 被引量：3
9张晓莉.改进MFCC特征和MLA模型的语音情感识别[J].福建电脑,2024,40(1):52-56. 被引量：6
10黄磊,赵津.基于多头注意力机制的曝光控制算法[J].国外电子测量技术,2023,42(11):1-7.

1Xiaoke Li,Zufan Zhang.Cross-feature fusion speech emotion recognition based on attention mask residual network and Wav2vec 2.0[J].Digital Communications and Networks,2025,11(5):1567-1577.
2李新岩,赵云景.雷公藤多苷对自发性2型糖尿病小鼠胰岛β细胞损伤的作用机制[J].吉林中医药,2025,45(7):824-829.
3李远,胡明辉,马波.基于VSG-CNN的往复式压缩机故障诊断方法[J].现代制造工程,2025(10):138-147.
4Zhe SUN,Qiwei YAO,Ling SHI,Huaqiang JIN,Yingjie XU,Peng YANG,Han XIAO,Dongyu CHEN,Panpan ZHAO,Xi SHEN.Virtual sample diffusion generation method guided by large language model-generated knowledge for enhancing information completeness and zero-shot fault diagnosis in building thermal systems[J].Journal of Zhejiang University-SCIENCE A,2025,26(10):895-916.
5Xin Chen,Long Huo,Chengqian Sun.Deep-learning-based Short-term Voltage Stability Assessment with Topology-adaptive Voltage Dynamic Feature and Domain Transfer[J].Journal of Modern Power Systems and Clean Energy,2025,13(5):1545-1555.
6LUO Yuwei,HUO Guanying,CHENG Zhen.Sonar image target detection based on diffusion model sample generation and STC-YOLO network[J].Chinese Journal of Acoustics,2025,44(4):499-526.
7Vestince Balidi Mbayachi,Lixin Liang,Bao Zhang,Yaru Zhang,Guiming Zhong,Kuizhi Chen,Guangjin Hou.Factors determining the Li^(+) conductivity in high-performance PVDF-based composite electrolytes revealed by solid-state NMR[J].Journal of Energy Chemistry,2025,110(11):165-175.
8G.H.Xing,Q.Hao,Guo-Jian Lyu,F.Zhu,Yun-Jiang Wang,Y.Yang,E.Pineda,J.C.Qiao.Integrating dynamic relaxation with inelastic deformation in metallic glasses: Theoretical insights and experimental validation[J].Journal of Materials Science & Technology,2025,218(15):135-152.
9Bo Li,Xiao-Tao Wen,Yu-Qiang Zhang,Zi-Yu Qin,Zhi-Di An.Full waveform inversion with fractional anisotropic total p-variation regularization[J].Petroleum Science,2025,22(8):3266-3278.
10YAN Duowen,JIAN Zhihua,CAI Yi.Speaker anonymization using adversarial sample generation[J].Chinese Journal of Acoustics,2025,44(4):656-673.

Journal of Harbin Institute of Technology(New Series)

2025年第6期

浏览历史

内容加载中请稍等...

Improved MFCC Features and TWM Model for Speech Emotion Recognition

参考文献9

二级参考文献56

共引文献56

相关作者

相关机构

相关主题

浏览历史