混合量子与图神经网络的多模态情感分析方法

A Hybrid Quantum-Graph Neural Network for Multimodal Sentiment Analysis

下载PDF

导出

摘要多模态情感分析(Multimodal Sentiment Analysis,MSA)是人工智能情感计算领域最具应用潜力的技术之一.视觉、语音与文本中包含了人类多数真实情感特征,融合三种模态获得更精细的情感多维度主观表达以保障情感分析结果准确依然面临诸多挑战.三种模态各自提取的情感特征子集中元素数量和时序不一致时,各模态选取代表性情感特征的良好策略是避免特殊情感特征被忽略或过度提取,以及保证后续融合分析时情感计算结果可信的关键.三种模态代表性情感特征直接融合分析时模态间情感信息的传递机制与互补机制未被充分利用,导致情感分析结果仅关联于某一模态代表语义特征,造成模型过拟合,分类输出结果错误.此外,人类的情感表达具有模态异构性与不一致性,常导致情感特征分布不均及模态极性歧义问题.算法模型不仅要捕获不同模态间的互补信息与细粒度关联,还要抑制冗余特征对情感判别的干扰,避免数据融合过程存在“语义鸿沟”,使结果稳定性受限.本文基于多尺度时序表征与量子比特多态表征思想,提出了混合量子与图神经网络的多模态情感分析方法.首先,构建代表性序列的拓扑表征图网络捕捉各特征节点之间的图结构动态关系,并在图网络中添加多头图注意力机制自适应调整节点与边权重,保证特殊情感特征可信选取.然后,设计情感特征量子计算网络,将多模态特征按量子编码映射至高维希尔伯特空间,基于量子叠加与纠缠机制进一步促进模态间特征的深层次耦合与相互依赖建模,通过量子测量过程将叠加态坍缩至特定的本征态,实现量子态与情感特征的对应映射,获得更具判别性的多模态融合情感表征.最终,将单模态与多模态预测作为多个子任务形成多任务协同优化机制,生成伪标签与共享表征提高每个任务的性能,结合多任务损失函数缓解模态表征不一致性,增强了模型的泛化性.在CMU-MOSI、CH-SIMS和CMU-MOSEI基准数据集上的系列实验结果表明,相较常用基线模型,方法情感二分类准确率提高了1.5%~8.7%、五分类准确率提高了3.3%~10.7%、七分类准确率提高了1.5%~14.5%、F1分数最高提升8.5、皮尔逊相关系数最高提升0.146和平均绝对误差最高下降0.304. Multimodal sentiment analysis(MSA)is one of the most promising technologies in the field of affective computing.Visual,acoustic,and textual modalities encode most human emotional features.Integrating them yields a finer,multidimensional representation of subjective affect.However,achieving accurate and robust sentiment analysis still faces significant challenges.When the sentiment feature subsets extracted from each modality differ in element quantity or temporal alignment,an effective strategy for selecting representative emotional features is essential to prevent key features from being overlooked or over-extracted,thereby ensuring the reliability of subsequent fusion analysis.Direct fusion of representative features across modalities often fails to fully exploit information transmission and complementarity,which can cause excessive reliance on a single modality’s semantic representation and lead to overfitting or misclassification.Furthermore,human emotional expression exhibits modality heterogeneity and inconsistency,often resulting in uneven feature distributions and polarity ambiguity.Algorithmic models must not only capture cross-modal complementary information and fine-grained correlations but also suppress redundant features that interfere with emotional discrimination.The presence of a“semantic gap”in data fusion further limits result stability.To address these issues,this paper proposes a hybrid quantumgraph neural network,inspired by multi-scale temporal representation and qubit-based polymorphic encoding.First,a topological graph network of representative sequences is constructed to capture dynamic relationships among feature nodes,and a multi-head graph attention mechanism is introduced to adaptively adjust node and edge weights,ensuring reliable selection of critical sentiment features.Then,a quantum sentiment feature computation network is designed,mapping multimodal features into a high-dimensional Hilbert space via quantum encoding.Leveraging quantum superposition and entanglement,the model enhances deep intermodal coupling and dependency modeling.Through quantum measurement,superposed states collapse into specific eigenstates,establishing a correspondence between quantum states and sentiment features,and yielding more discriminative multimodal fusion representations.Finally,single-modal and multimodal predictions are formulated as multiple subtasks under a multitask collaborative optimization framework.Pseudo-label generation and shared representations improve task-specific performance,while a joint multitask loss mitigates inconsistencies among modality representations,enhancing the model’s generalization ability.Experimental results on the CMU-MOSI,CH-SIMS,and CMU-MOSEI benchmark datasets demonstrate that,compared with conventional baselines,the proposed method improves binary classification accuracy by 1.5%~8.7%,five-class accuracy by 3.3%~10.7%,and seven-class accuracy by 1.5%~14.5%.The F1 score increases by up to 8.5 points,the pearson correlation coefficient improves by up to 0.146,and the mean absolute error decreases by up to 0.304.

作者李兴广蔡禹健崔炜李劲松张莹瑀 LI Xing-guang;CAI Yu-jian;CUI Wei;LI Jin-song;ZHANG Ying-yu(School of Electronic Information and Engineering,Changchun University of Science and Technology,Changchun,Jilin 130022,China)

机构地区长春理工大学电子信息工程学院

出处《电子学报》北大核心 2025年第11期3983-3995,共13页 Acta Electronica Sinica

基金吉林省科技厅项目(No.20250102225JC)。

关键词多模态情感分析图神经网络量子机器学习跨模态信息融合多任务优化 multimodal sentiment analysis graph neural network quantum machine learning cross-modal information fusion multitask optimization

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献6

1Guanlin Zhai,Yan Yang,Heng Wang,Shengdong Du.Multi-Attention Fusion Modeling for Sentiment Analysis of Educational Big Data[J].Big Data Mining and Analytics,2020,3(4):311-319. 被引量：6
2赵力,将春辉,邹采荣,吴镇扬.语音信号中的情感特征分析和识别的研究[J].电子学报,2004,32(4):606-609. 被引量：50
3邵志文,周勇,谭鑫,马利庄,刘兵,姚睿.基于深度学习的表情动作单元识别综述[J].电子学报,2022,50(8):2003-2017. 被引量：9
4张换香,彭俊杰.基于方面级情感分析的深度语义挖掘模型[J].电子学报,2024,52(7):2307-2319. 被引量：8
5蒋昆,赵征鹏,普园媛,黄健,谷金晶,徐丹.基于跨模态超图优化学习的多模态情感分析[J].计算机科学,2025,52(7):210-217. 被引量：1
6于瑞祺,张鑫云,任爽.基于变分量子电路的量子机器学习算法综述[J].计算机研究与发展,2025,62(4):821-851. 被引量：3

二级参考文献36

1Mondher Bouazizi,Tomoaki Ohtsuki.Multi-Class Sentiment Analysis on Twitter: Classification Performance and Challenges[J].Big Data Mining and Analytics,2019,2(3):181-194. 被引量：9
2刘晓旻,谭华春,章毓晋.人脸表情识别研究的新进展[J].中国图象图形学报,2006,11(10):1359-1368. 被引量：62
3周迪伟高东杰（译）.计算机语音处理[M].国防工业出版社,1987..
4Y Niimi.Emotional Robot World[M].Tokyo:Talk and Speak Press,Japan,1995.67-96.
5Cowie R.Emotion recognition in human-computer interaction.IEEE Signal Processing Magazine,2001,18(1):32-80.
6Zhao L,Y Kobayashi,Y Niimi.Tone recognition of Chinese continuous speech using continuous HMMs.日本音响学会论文志,1997,53(12):933-940.
7M Shigenaga.Features of Emotionally Uttered Speech Revealed by Discriminant Analysis(Ⅵ)[M].The preprint of the acoustical society of Japan,1999.2-18.
8Xun-bing SHEN Qi wu Xiao-lan FU.Effects of the duration of expressions on the recognition of microexpressions[J].Journal of Zhejiang University-Science B(Biomedicine & Biotechnology),2012,13(3):221-230. 被引量：16
9赵力,钱向民,邹采荣,吴镇扬.从语音信号中提取情感特征的研究[J].数据采集与处理,2000,15(1):120-123. 被引量：11
10赵力,钱向民,邹采荣,吴镇扬.语音信号中的情感特征分析和识别的研究[J].通信学报,2000,21(10):18-24. 被引量：27

共引文献71

1韩丁,新吉乐,王亮,王亚欣,李厅霞,郭瑛.声学监测技术在家畜福利化养殖中的发展应用[J].内蒙古农业大学学报（自然科学版）,2023,44(5):89-100. 被引量：1
2张立华,杨莹春.情感语音变化规律的特征分析[J].清华大学学报（自然科学版）,2008,48(S1):652-657. 被引量：15
3付丽琴,毛峡,陈立江.基于改进的排序式选举算法的语音情感融合识别[J].计算机应用,2009,29(2):381-385. 被引量：1
4赵腊生,张强,魏小鹏.语音情感识别研究进展[J].计算机应用研究,2009,26(2):428-432. 被引量：21
5陈清.英语儿歌中的语言象似性特征研究[J].长沙铁道学院学报（社会科学版）,2009,10(1):226-228. 被引量：3
6徐俊,蔡莲红.面向情感转换的层次化韵律分析与建模[J].清华大学学报（自然科学版）,2009(S1):1274-1277. 被引量：7
7詹永照,曹鹏.语音情感特征提取和识别的研究与实现[J].江苏大学学报（自然科学版）,2005,26(1):72-75. 被引量：15
8蒋丹宁,蔡莲红.基于语音声学特征的情感信息识别[J].清华大学学报（自然科学版）,2006,46(1):86-89. 被引量：40
9国辛纯,郭继昌,窦修全.基于HMM的语音信号情感识别研究[J].电子测量技术,2006,29(5):69-70. 被引量：10
10陈明义,余伶俐,朱晗,周昆湘.基于特征参数融合的语音情感识别方法[J].微电子学与计算机,2006,23(12):168-171. 被引量：10

1赵彩月,李再帏,朱文发,范国鹏,张辉,张海燕,李超杰.温度应力下钢轨导波频散特性及检测模态选取[J].声学技术,2025,44(6):882-889.
2孙旦.基于动态分层注意力网络的水声目标识别方法研究[J].长江信息通信,2025,38(11):1-4.
3黎一宁,宫能飞,田悦含,王铁军.受控型非局域CNOT量子门的设备无关验证[J].中国科学:物理学、力学、天文学,2025,55(4):140-150.
4程铁辕.基于消费者行为的白酒国际化分析[J].商业经济,2025(7):106-109.
5刘素艳,王春辉,孙学浩,杨志勃,马增强.多源特征节点的图神经网络轴承变转速故障诊断[J].振动与冲击,2026,45(3):201-209.
6吕晋峰,石连生,时彦彩,王静,吴莉莉.基于神经网络训练的智能电网运行风险多模态预测研究[J].中国新技术新产品,2026(2):38-40.
7闫孝姮,王治仁,赵一澄,陈伟华,张维广.结合多尺度熵的SVMD-CPO-RF-AdaBoost均压电极结垢智能识别方法[J].电子测量与仪器学报,2025,39(10):243-254.
8容木桂,伍建秀.驻村帮扶桑蚕集体经营促进乡村振兴的实践与启示——以广西来宾市忻城县古蓬镇上浪村为例[J].农村经济与科技,2025,36(9):127-129.
9林正良.量子计算在油气勘探中应用前景广阔[J].中国石化,2026(2):52-54.
10刘燊.量子认知心理学:跨学科视角下的认知模型与决策理论创新[J].心理技术与应用,2025,13(3):177-184.

电子学报

2025年第11期

浏览历史

内容加载中请稍等...

混合量子与图神经网络的多模态情感分析方法

参考文献6

二级参考文献36

共引文献71

相关作者

相关机构

相关主题

浏览历史