摘要
多模态情感分析(Multimodal Sentiment Analysis,MSA)是人工智能情感计算领域最具应用潜力的技术之一.视觉、语音与文本中包含了人类多数真实情感特征,融合三种模态获得更精细的情感多维度主观表达以保障情感分析结果准确依然面临诸多挑战.三种模态各自提取的情感特征子集中元素数量和时序不一致时,各模态选取代表性情感特征的良好策略是避免特殊情感特征被忽略或过度提取,以及保证后续融合分析时情感计算结果可信的关键.三种模态代表性情感特征直接融合分析时模态间情感信息的传递机制与互补机制未被充分利用,导致情感分析结果仅关联于某一模态代表语义特征,造成模型过拟合,分类输出结果错误.此外,人类的情感表达具有模态异构性与不一致性,常导致情感特征分布不均及模态极性歧义问题.算法模型不仅要捕获不同模态间的互补信息与细粒度关联,还要抑制冗余特征对情感判别的干扰,避免数据融合过程存在“语义鸿沟”,使结果稳定性受限.本文基于多尺度时序表征与量子比特多态表征思想,提出了混合量子与图神经网络的多模态情感分析方法.首先,构建代表性序列的拓扑表征图网络捕捉各特征节点之间的图结构动态关系,并在图网络中添加多头图注意力机制自适应调整节点与边权重,保证特殊情感特征可信选取.然后,设计情感特征量子计算网络,将多模态特征按量子编码映射至高维希尔伯特空间,基于量子叠加与纠缠机制进一步促进模态间特征的深层次耦合与相互依赖建模,通过量子测量过程将叠加态坍缩至特定的本征态,实现量子态与情感特征的对应映射,获得更具判别性的多模态融合情感表征.最终,将单模态与多模态预测作为多个子任务形成多任务协同优化机制,生成伪标签与共享表征提高每个任务的性能,结合多任务损失函数缓解模态表征不一致性,增强了模型的泛化性.在CMU-MOSI、CH-SIMS和CMU-MOSEI基准数据集上的系列实验结果表明,相较常用基线模型,方法情感二分类准确率提高了1.5%~8.7%、五分类准确率提高了3.3%~10.7%、七分类准确率提高了1.5%~14.5%、F1分数最高提升8.5、皮尔逊相关系数最高提升0.146和平均绝对误差最高下降0.304.
Multimodal sentiment analysis(MSA)is one of the most promising technologies in the field of affective computing.Visual,acoustic,and textual modalities encode most human emotional features.Integrating them yields a finer,multidimensional representation of subjective affect.However,achieving accurate and robust sentiment analysis still faces significant challenges.When the sentiment feature subsets extracted from each modality differ in element quantity or temporal alignment,an effective strategy for selecting representative emotional features is essential to prevent key features from being overlooked or over-extracted,thereby ensuring the reliability of subsequent fusion analysis.Direct fusion of representative features across modalities often fails to fully exploit information transmission and complementarity,which can cause excessive reliance on a single modality’s semantic representation and lead to overfitting or misclassification.Furthermore,human emotional expression exhibits modality heterogeneity and inconsistency,often resulting in uneven feature distributions and polarity ambiguity.Algorithmic models must not only capture cross-modal complementary information and fine-grained correlations but also suppress redundant features that interfere with emotional discrimination.The presence of a“semantic gap”in data fusion further limits result stability.To address these issues,this paper proposes a hybrid quantumgraph neural network,inspired by multi-scale temporal representation and qubit-based polymorphic encoding.First,a topological graph network of representative sequences is constructed to capture dynamic relationships among feature nodes,and a multi-head graph attention mechanism is introduced to adaptively adjust node and edge weights,ensuring reliable selection of critical sentiment features.Then,a quantum sentiment feature computation network is designed,mapping multimodal features into a high-dimensional Hilbert space via quantum encoding.Leveraging quantum superposition and entanglement,the model enhances deep intermodal coupling and dependency modeling.Through quantum measurement,superposed states collapse into specific eigenstates,establishing a correspondence between quantum states and sentiment features,and yielding more discriminative multimodal fusion representations.Finally,single-modal and multimodal predictions are formulated as multiple subtasks under a multitask collaborative optimization framework.Pseudo-label generation and shared representations improve task-specific performance,while a joint multitask loss mitigates inconsistencies among modality representations,enhancing the model’s generalization ability.Experimental results on the CMU-MOSI,CH-SIMS,and CMU-MOSEI benchmark datasets demonstrate that,compared with conventional baselines,the proposed method improves binary classification accuracy by 1.5%~8.7%,five-class accuracy by 3.3%~10.7%,and seven-class accuracy by 1.5%~14.5%.The F1 score increases by up to 8.5 points,the pearson correlation coefficient improves by up to 0.146,and the mean absolute error decreases by up to 0.304.
作者
李兴广
蔡禹健
崔炜
李劲松
张莹瑀
LI Xing-guang;CAI Yu-jian;CUI Wei;LI Jin-song;ZHANG Ying-yu(School of Electronic Information and Engineering,Changchun University of Science and Technology,Changchun,Jilin 130022,China)
出处
《电子学报》
北大核心
2025年第11期3983-3995,共13页
Acta Electronica Sinica
基金
吉林省科技厅项目(No.20250102225JC)。
关键词
多模态情感分析
图神经网络
量子机器学习
跨模态信息融合
多任务优化
multimodal sentiment analysis
graph neural network
quantum machine learning
cross-modal information fusion
multitask optimization