摘要
针对目前情感分析模型的研究大多侧重于文本模态处理,而音频和视频模态的处理相对简单,未能充分挖掘其在增强情感信息方面的潜力,并且存在跨模态特征融合中的信息冗余问题。为此,提出了一种名为面向多源信息聚类与私有特征学习的情感分析的模型。引入隐性聚类的思维,通过跨模态注意力机制优化音视频与文本特征的互补能力,将不同模态的特征划分为若干类簇,以减少无关信息对融合过程的干扰。进一步地,通过特征一致性增强机制使用马氏距离度量方法对音视频模态特征进行增强和过滤,从而提升情感信息密度。与此同时,采用自适应权重调控机制,根据类簇的语义一致性来调节音视频模态的融合权重比例,并结合文本模态来消除模态间的语义歧义。此外,模型还引入自监督学习策略,进一步增强单模态的情感预测能力,帮助模型学到各模态的独特特性。实验结果表明,在CMU-MOSEI和CMU-MOSI数据集上,该模型在情感分类任务中的表现显著提升,验证了其在多模态信息融合和冗余信息抑制方面的有效性。
Current sentiment analysis models often focus on text modality processing,while the handling of audio and video modalities remains relatively simple,failing to fully exploit their potential in enhancing emotional information.Additionally,there is the issue of information redundancy in cross-modal feature fusion.To address these challenges,this paper proposes a sentiment analysis model based on multimodal information clustering and private feature learning.By introducing the concept of latent clustering thinking,the model optimizes the complementarity of audio,vision,and text features through a cross-modal attention mechanism,dividing the features of different modalities into several clusters to reduce the interference of irrelevant information during the fusion process.Furthermore,a feature consistency enhancement mechanism using Mahalanobis distance is employed to enhance and filter audio and video modality features,thereby increasing the density of emotional information.Simultaneously,an adaptive weight adjustment mechanism is applied,which adjusts the fusion weight ratio of the audio and video modalities based on the semantic consistency of the clusters and combines them with the text modality to eliminate semantic ambiguity between modalities.Additionally,the model incorporates a self-supervised learning strategy to further enhance the emotional prediction ability of each modality,helping the model learn the unique characteristics of each modality.Experimental results on the CMU-MOSEI and CMU-MOSI datasets show significant improvements in sentiment classification performance,validating the effectiveness of the model in multimodal information fusion and redundancy suppression.
作者
钟婷
冯广
林健忠
杨燕茹
周垣桦
郑润庭
刘天翔
ZHONG Ting;FENG Guang;LIN Jianzhong;YANG Yanru;ZHOU Yuanhua;ZHENG Runting;LIU Tianxiang(School of Automation,Guangdong University of Technology,Guangzhou 510006,China;School of Computer Science,Guangdong University of Technology,Guangzhou 510006,China)
出处
《计算机工程与应用》
北大核心
2025年第24期176-186,共11页
Computer Engineering and Applications
基金
国家自然科学基金重点项目(62237001)
广东省哲学社会科学青年项目(GD23YJY08)。
关键词
多模态情感分析
注意力机制
隐性聚类
马氏距离
自监督学习
multimodal sentiment analysis
attention mechanism
latent clustering
Mahalanobis distance
self-supervised learning