期刊文献+

Improved MFCC Features and TWM Model for Speech Emotion Recognition

在线阅读 下载PDF
导出
摘要 To solve the problem that traditional Mel Frequency Cepstral Coefficient(MFCC)features cannot fully represent dynamic speech features,this paper introduces first⁃order and second⁃order difference on the basis of static MFCC features to extract dynamic MFCC features,and constructs a hybrid model(TWM,TIM⁃NET(Temporal⁃aware Bi⁃directional Multi⁃scale Network)WGAN⁃GP(Wasserstein Generative Adversarial Network with Gradient Penalty)multi⁃head attention)combining multi⁃head attention mechanism and improved WGAN⁃GP on the basis of TIM⁃NET network.Among them,the multi⁃head attention mechanism not only effectively prevents gradient vanishing,but also allows for the construction of deeper networks that can capture long⁃range dependencies and learn from information at different time steps,improving the accuracy of the model;WGAN⁃GP solves the problem of insufficient sample size by improving the quality of speech sample generation.The experiment results show that this method significantly improves the accuracy and robustness of speech emotion recognition on RAVDESS and EMO⁃DB datasets.
出处 《Journal of Harbin Institute of Technology(New Series)》 2025年第6期38-46,共9页 哈尔滨工业大学学报(英文版)
  • 相关文献

参考文献9

二级参考文献56

共引文献54

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部