期刊文献+

基于无监督图对比学习的语音情感识别

Speech emotion recognition with unsupervised graph contrastive learning
在线阅读 下载PDF
导出
摘要 针对多数语音数据集中有标签数据稀疏和高维语音特征建模困难的问题,提出基于无监督图对比学习的语音情感识别网络(SERUGCL).该方法使用无标签数据进行训练,基于特征相似性构建语音特征原始视图,利用图结构建模语音帧之间的依赖关系,从而缓解高维特征直接建模带来的计算压力;通过快速梯度符号方法(FGSM)和子图采样-边缘扰动组合生成2种增强视图.所有视图通过差异化编码器进行处理,并采用加权池化机制获取全局嵌入.使用支持向量机(SVM)进行情感分类.所提出的SERUGCL模型在IEMOCAP数据集上取得69.96%的未加权准确率(UA)和70.24%的加权准确率(WA),在EMO-DB数据集上取得91.04%的UA和90.29%的WA.相较于DSTCNet,SERUGCL在IEMOCAP数据集上的UA和WA提高了8.18个百分点和8.44个百分点,在EMO-DB数据集上的UA和WA提高了4.49个百分点和1.50个百分点.对比试验和消融实验结果也验证了模型的有效性. A speech emotion recognition network based on unsupervised graph contrastive learning(SERUGCL)was proposed to address the issues of sparse labeled data and difficulties in modeling high-dimensional speech features in most speech datasets.This method was trained using unlabeled data.Firstly,an original view of speech features was constructed based on feature similarity,and the graph structure was utilized to model the dependencies between speech frames,thereby alleviating the computational pressure caused by directly modeling high-dimensional features.Then,two enhanced views were generated through a combination of the fast gradient sign method(FGSM)and subgraph sampling-edge perturbation.All views were processed by a differentiated encoder,and a weighted pooling mechanism was adopted to obtain the global embedding.Finally,support vector machine(SVM)was used for emotion classification.The SERUGCL model achieved unweighted accuracy(UA)of 69.96%and weighted accuracy(WA)of 70.24%on the IEMOCAP dataset,and UA of 91.04%and WA of 90.29%on the EMO-DB dataset.Compared with DSTCNet,the UA and WA of SERUGCL improved by 8.18 and 8.44 percentage points on IEMOCAP and by 4.49 and 1.50 percentage points on EMO-DB datasets respectively.The results of comparative and ablation experiments also verified the effectiveness of the model.
作者 张雪梅 孙颖 张雪英 ZHANG Xuemei;SUN Ying;ZHANG Xueying(College of Electronic Information Engineering,Taiyuan University of Technology,Taiyuan 030024,China)
出处 《浙江大学学报(工学版)》 北大核心 2026年第4期782-790,共9页 Journal of Zhejiang University(Engineering Science)
关键词 语音情感识别 无监督学习 图对比学习 特征增强 加权池化 speech emotion recognition unsupervised learning graph contrastive learning feature augmentation weighted pooling
  • 相关文献

参考文献2

二级参考文献9

共引文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部