期刊文献+

一种多任务学习的跨模态视频情感分析方法 被引量:5

Cross-Modal Video Emotion Analysis Method Based on Multi-Task Learning
在线阅读 下载PDF
导出
摘要 针对现有跨模态视频情感分析模型中模态融合不充分、空间复杂度较高以及较少考虑说话人本身属性对情感影响等问题,提出了一种结合多头注意力与多任务学习的跨模态视频情感分析模型。对视频进行预处理,得到视频、音频、文本三个模态的特征表示。将得到的特征表示分别输入到GRU网络以提取时序特征。利用所提出的最大池化多头注意力机制,实现文本与视频、文本与音频的两两融合。将融合后的特征输入到情感分类与性别分类多任务网络得到说话人的情感极性与性别属性。实验结果表明,所提模型能够较好地利用模态间的差异信息与说话人性别属性,在有效提升情感识别准确率的同时降低了模型的空间复杂度。 To address the issues of insufficient modal fusion,high spatial complexity,and less consideration of speaker’s own attributes in existing cross-modal video emotion analysis models,this paper proposes a video emotion model combination of multi-head attention and multi-task learning.Firstly,the video is preprocessed to obtain feature representations of three modalities of video,audio,and text.Secondly,the feature representations are input to GRU network to extract timing features.After that,the proposed max-pooling multi-head attention mechanism is used to realize pairwise fusion of text and video,text and audio.Finally,the fused features are input into the emotion classification and gender classification multi-task network to obtain the emotional classification and gender of speaker.Experimental results show that the proposed model can make better use of the difference information between modalities and gender attributes of speaker,so as to effectively improve accuracy of emotion recognition as well as reducing spatial complexity of model.
作者 缪裕青 董晗 张万桢 周明 蔡国永 杜华巍 MIAO Yuqing;DONG Han;ZHANG Wanzhen;ZHOU Ming;CAI Guoyong;DU Huawei(School of Computer Science&Information Security,Guilin University of Electronic Technology,Guilin,Guangxi 541004,China;Guangxi Key Laboratory of Image&Graphics Intelligent Processing,Guilin University of Electronic Technology,Guilin,Guangxi 541004,China;Engineering Comprehensive Training Center,Guilin University of Aerospace Technology,Guilin,Guangxi 541004,China;College of Information Science and Technology,Zhongkai University of Agriculture and Engineering,Guangzhou 510225,China;Guilin Hivision Technology Company,Guilin,Guangxi 541004,China;Guangxi Key Laboratory of Trusted Software,Guilin University of Electronic Technology,Guilin,Guangxi 541004,China)
出处 《计算机工程与应用》 CSCD 北大核心 2023年第12期141-147,共7页 Computer Engineering and Applications
基金 国家自然科学基金(61866007) 广西自然科学基金重点项目(2018GXNSFDA138006) 广西高校图像图形智能处理重点实验室研究项目(GIIP201706) 广西自然科学基金(2020GXNSFAA159094) 广西高校中青年教师科研基础能力提升项目(2021KY0799) 桂林电子科技大学研究生教育创新计划(2022YCXS066)。
关键词 视频情感分析 模态融合 多头注意力 多任务学习 模型复杂度 video emotion analysis modal fusion multi-head attention multi-task learning model complexity
  • 相关文献

参考文献2

二级参考文献8

共引文献58

同被引文献34

引证文献5

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部