期刊文献+
共找到11篇文章
< 1 >
每页显示 20 50 100
Enhanced Multimodal Sentiment Analysis via Integrated Spatial Position Encoding and Fusion Embedding
1
作者 Chenquan Gan Xu Liu +3 位作者 Yu Tang Xianrong Yu Qingyi Zhu Deepak Kumar Jain 《Computers, Materials & Continua》 2025年第12期5399-5421,共23页
Multimodal sentiment analysis aims to understand emotions from text,speech,and video data.However,current methods often overlook the dominant role of text and suffer from feature loss during integration.Given the vary... Multimodal sentiment analysis aims to understand emotions from text,speech,and video data.However,current methods often overlook the dominant role of text and suffer from feature loss during integration.Given the varying importance of each modality across different contexts,a central and pressing challenge in multimodal sentiment analysis lies in maximizing the use of rich intra-modal features while minimizing information loss during the fusion process.In response to these critical limitations,we propose a novel framework that integrates spatial position encoding and fusion embedding modules to address these issues.In our model,text is treated as the core modality,while speech and video features are selectively incorporated through a unique position-aware fusion process.The spatial position encoding strategy preserves the internal structural information of speech and visual modalities,enabling the model to capture localized intra-modal dependencies that are often overlooked.This design enhances the richness and discriminative power of the fused representation,enabling more accurate and context-aware sentiment prediction.Finally,we conduct comprehensive evaluations on two widely recognized standard datasets in the field—CMU-MOSI and CMU-MOSEI to validate the performance of the proposed model.The experimental results demonstrate that our model exhibits good performance and effectiveness for sentiment analysis tasks. 展开更多
关键词 multimodal sentiment analysis spatial position encoding fusion embedding feature loss reduction
在线阅读 下载PDF
Multimodal sentiment analysis for social media contents during public emergencies 被引量:1
2
作者 Tao Fan Hao Wang +2 位作者 Peng Wu Chen Ling Milad Taleby Ahvanooey 《Journal of Data and Information Science》 CSCD 2023年第3期61-87,共27页
Purpose:Nowadays,public opinions during public emergencies involve not only textual contents but also contain images.However,the existing works mainly focus on textual contents and they do not provide a satisfactory a... Purpose:Nowadays,public opinions during public emergencies involve not only textual contents but also contain images.However,the existing works mainly focus on textual contents and they do not provide a satisfactory accuracy of sentiment analysis,lacking the combination of multimodal contents.In this paper,we propose to combine texts and images generated in the social media to perform sentiment analysis.Design/methodology/approach:We propose a Deep Multimodal Fusion Model(DMFM),which combines textual and visual sentiment analysis.We first train word2vec model on a large-scale public emergency corpus to obtain semantic-rich word vectors as the input of textual sentiment analysis.BiLSTM is employed to generate encoded textual embeddings.To fully excavate visual information from images,a modified pretrained VGG16-based sentiment analysis network is used with the best-performed fine-tuning strategy.A multimodal fusion method is implemented to fuse textual and visual embeddings completely,producing predicted labels.Findings:We performed extensive experiments on Weibo and Twitter public emergency datasets,to evaluate the performance of our proposed model.Experimental results demonstrate that the DMFM provides higher accuracy compared with baseline models.The introduction of images can boost the performance of sentiment analysis during public emergencies.Research limitations:In the future,we will test our model in a wider dataset.We will also consider a better way to learn the multimodal fusion information.Practical implications:We build an efficient multimodal sentiment analysis model for the social media contents during public emergencies.Originality/value:We consider the images posted by online users during public emergencies on social platforms.The proposed method can present a novel scope for sentiment analysis during public emergencies and provide the decision support for the government when formulating policies in public emergencies. 展开更多
关键词 Public emergency multimodal sentiment analysis Social platform Textual sentiment analysis Visual sentiment analysis
在线阅读 下载PDF
A Robust Framework for Multimodal Sentiment Analysis with Noisy Labels Generated from Distributed Data Annotation 被引量:1
3
作者 Kai Jiang Bin Cao Jing Fan 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第6期2965-2984,共20页
Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and sha... Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and share such multimodal data.However,due to professional discrepancies among annotators and lax quality control,noisy labels might be introduced.Recent research suggests that deep neural networks(DNNs)will overfit noisy labels,leading to the poor performance of the DNNs.To address this challenging problem,we present a Multimodal Robust Meta Learning framework(MRML)for multimodal sentiment analysis to resist noisy labels and correlate distinct modalities simultaneously.Specifically,we propose a two-layer fusion net to deeply fuse different modalities and improve the quality of the multimodal data features for label correction and network training.Besides,a multiple meta-learner(label corrector)strategy is proposed to enhance the label correction approach and prevent models from overfitting to noisy labels.We conducted experiments on three popular multimodal datasets to verify the superiority of ourmethod by comparing it with four baselines. 展开更多
关键词 Distributed data collection multimodal sentiment analysis meta learning learn with noisy labels
在线阅读 下载PDF
Leveraging Vision-Language Pre-Trained Model and Contrastive Learning for Enhanced Multimodal Sentiment Analysis
4
作者 Jieyu An Wan Mohd Nazmee Wan Zainon Binfen Ding 《Intelligent Automation & Soft Computing》 SCIE 2023年第8期1673-1689,共17页
Multimodal sentiment analysis is an essential area of research in artificial intelligence that combines multiple modes,such as text and image,to accurately assess sentiment.However,conventional approaches that rely on... Multimodal sentiment analysis is an essential area of research in artificial intelligence that combines multiple modes,such as text and image,to accurately assess sentiment.However,conventional approaches that rely on unimodal pre-trained models for feature extraction from each modality often overlook the intrinsic connections of semantic information between modalities.This limitation is attributed to their training on unimodal data,and necessitates the use of complex fusion mechanisms for sentiment analysis.In this study,we present a novel approach that combines a vision-language pre-trained model with a proposed multimodal contrastive learning method.Our approach harnesses the power of transfer learning by utilizing a vision-language pre-trained model to extract both visual and textual representations in a unified framework.We employ a Transformer architecture to integrate these representations,thereby enabling the capture of rich semantic infor-mation in image-text pairs.To further enhance the representation learning of these pairs,we introduce our proposed multimodal contrastive learning method,which leads to improved performance in sentiment analysis tasks.Our approach is evaluated through extensive experiments on two publicly accessible datasets,where we demonstrate its effectiveness.We achieve a significant improvement in sentiment analysis accuracy,indicating the supe-riority of our approach over existing techniques.These results highlight the potential of multimodal sentiment analysis and underscore the importance of considering the intrinsic semantic connections between modalities for accurate sentiment assessment. 展开更多
关键词 multimodal sentiment analysis vision–language pre-trained model contrastive learning sentiment classification
在线阅读 下载PDF
A Multimodal Sentiment Analysis Method Based on Multi-Granularity Guided Fusion
5
作者 Zilin Zhang Yan Liu +3 位作者 Jia Liu Senbao Hou Yuping Zhang Chenyuan Wang 《Computers, Materials & Continua》 2026年第2期1228-1241,共14页
With the growing demand formore comprehensive and nuanced sentiment understanding,Multimodal Sentiment Analysis(MSA)has gained significant traction in recent years and continues to attract widespread attention in the ... With the growing demand formore comprehensive and nuanced sentiment understanding,Multimodal Sentiment Analysis(MSA)has gained significant traction in recent years and continues to attract widespread attention in the academic community.Despite notable advances,existing approaches still face critical challenges in both information modeling and modality fusion.On one hand,many current methods rely heavily on encoders to extract global features from each modality,which limits their ability to capture latent fine-grained emotional cues within modalities.On the other hand,prevailing fusion strategies often lack mechanisms to model semantic discrepancies across modalities and to adaptively regulate modality interactions.To address these limitations,we propose a novel framework for MSA,termed Multi-Granularity Guided Fusion(MGGF).The proposed framework consists of three core components:(i)Multi-Granularity Feature Extraction Module,which simultaneously captures both global and local emotional features within each modality,and integrates them to construct richer intra-modal representations;(ii)Cross-ModalGuidance Learning Module(CMGL),which introduces a cross-modal scoring mechanism to quantify the divergence and complementarity betweenmodalities.These scores are then used as guiding signals to enable the fusion strategy to adaptively respond to scenarios of modality agreement or conflict;(iii)Cross-Modal Fusion Module(CMF),which learns the semantic dependencies among modalities and facilitates deep-level emotional feature interaction,thereby enhancing sentiment prediction with complementary information.We evaluate MGGF on two benchmark datasets:MVSA-Single and MVSA-Multiple.Experimental results demonstrate that MGGF outperforms the current state-of-the-art model CLMLF on MVSA-Single by achieving a 2.32% improvement in F1 score.On MVSA-Multiple,it surpasses MGNNS with a 0.26% increase in accuracy.These results substantiate the effectiveness ofMGGFin addressing two major limitations of existing methods—insufficient intra-modal fine-grained sentiment modeling and inadequate cross-modal semantic fusion. 展开更多
关键词 multimodal sentiment analysis cross-modal fusion cross-modal guided learning
在线阅读 下载PDF
Text-Image Feature Fine-Grained Learning for Joint Multimodal Aspect-Based Sentiment Analysis
6
作者 Tianzhi Zhang Gang Zhou +4 位作者 Shuang Zhang Shunhang Li Yepeng Sun Qiankun Pi Shuo Liu 《Computers, Materials & Continua》 SCIE EI 2025年第1期279-305,共27页
Joint Multimodal Aspect-based Sentiment Analysis(JMASA)is a significant task in the research of multimodal fine-grained sentiment analysis,which combines two subtasks:Multimodal Aspect Term Extraction(MATE)and Multimo... Joint Multimodal Aspect-based Sentiment Analysis(JMASA)is a significant task in the research of multimodal fine-grained sentiment analysis,which combines two subtasks:Multimodal Aspect Term Extraction(MATE)and Multimodal Aspect-oriented Sentiment Classification(MASC).Currently,most existing models for JMASA only perform text and image feature encoding from a basic level,but often neglect the in-depth analysis of unimodal intrinsic features,which may lead to the low accuracy of aspect term extraction and the poor ability of sentiment prediction due to the insufficient learning of intra-modal features.Given this problem,we propose a Text-Image Feature Fine-grained Learning(TIFFL)model for JMASA.First,we construct an enhanced adjacency matrix of word dependencies and adopt graph convolutional network to learn the syntactic structure features for text,which addresses the context interference problem of identifying different aspect terms.Then,the adjective-noun pairs extracted from image are introduced to enable the semantic representation of visual features more intuitive,which addresses the ambiguous semantic extraction problem during image feature learning.Thereby,the model performance of aspect term extraction and sentiment polarity prediction can be further optimized and enhanced.Experiments on two Twitter benchmark datasets demonstrate that TIFFL achieves competitive results for JMASA,MATE and MASC,thus validating the effectiveness of our proposed methods. 展开更多
关键词 multimodal sentiment analysis aspect-based sentiment analysis feature fine-grained learning graph convolutional network adjective-noun pairs
在线阅读 下载PDF
Multimodal sentiment analysis based on contrastive learning and cross-modal guided fusion
7
作者 Liu Huanqi Sun Jingtao +1 位作者 Zhang Fengling Hou Wenyan 《The Journal of China Universities of Posts and Telecommunications》 2025年第4期18-33,共16页
Multimodal sentiment analysis,which integrates text,speech,and image modalities,has emerged as a prominent research direction in artificial intelligence for precise emotion assessment.However,current techniques experi... Multimodal sentiment analysis,which integrates text,speech,and image modalities,has emerged as a prominent research direction in artificial intelligence for precise emotion assessment.However,current techniques experience difficulties in efficiently managing redundancy and inconsistency across features from different modalities,compromising sentiment analysis accuracy.Additionally,while the analysis of intraclass emotional features has garnered substantial attention,studies of interclass relationships have been neglected.To address these challenges,a multimodal sentiment analysis method based on contrastive learning and cross-modal guided fusion(CLCGF)is proposed.This method encodes text and images to derive latent representations and employs a cross-modal guided module with sparse attention mechanisms to effectively integrate textual and visual features,thereby mitigating redundancy issues within each modality's features.In addition to the sentiment classification task,a supervised contrastive learning task is incorporated to aid the model in learning effective features from multimodal data related to emotions.To assess the efficacy of the CLCGF method,experiments were conducted on three public datasets:MVSA-Single,MVSA-Multiple and HFM.The experimental results indicate that CLCGF significantly improves sentiment analysis accuracy compared with traditional methods. 展开更多
关键词 multimodal sentiment analysis sparse attention contrastive learning
原文传递
Modal Interactive Feature Encoder for Multimodal Sentiment Analysis
8
作者 Xiaowei Zhao Jie Zhou Xiujuan Xu 《国际计算机前沿大会会议论文集》 EI 2023年第2期285-303,共19页
Multimodal Sentiment analysis refers to analyzing emotions in infor-mation carriers containing multiple modalities.To better analyze the features within and between modalities and solve the problem of incomplete multi... Multimodal Sentiment analysis refers to analyzing emotions in infor-mation carriers containing multiple modalities.To better analyze the features within and between modalities and solve the problem of incomplete multimodal feature fusion,this paper proposes a multimodal sentiment analysis model MIF(Modal Interactive Feature Encoder For Multimodal Sentiment Analysis).First,the global features of three modalities are obtained through unimodal feature extraction networks.Second,the inter-modal interactive feature encoder and the intra-modal interactive feature encoder extract similarity features between modal-ities and intra-modal special features separately.Finally,unimodal special features and the interaction information between modalities are decoded to get the fusion features and predict sentimental polarity results.We conduct extensive experi-ments on three public multimodal datasets,including one in Chinese and two in English.The results show that the performance of our approach is significantly improved compared with benchmark models. 展开更多
关键词 multimodal sentiment analysis Modal Interaction Feature ENCODER
原文传递
A comprehensive survey on multimodal sentiment analysis:Techniques,models,and applications
9
作者 Heming Zhang 《Advances in Engineering Innovation》 2024年第7期47-52,共6页
Multimodal sentiment analysis(MSA)is an evolving field that integrates information from multiple modalities such as text,audio,and visual data to analyze and interpret human emotions and sentiments.This review provide... Multimodal sentiment analysis(MSA)is an evolving field that integrates information from multiple modalities such as text,audio,and visual data to analyze and interpret human emotions and sentiments.This review provides an extensive survey of the current state of multimodal sentiment analysis,highlighting fundamental concepts,popular datasets,techniques,models,challenges,applications,and future trends.By examining existing research and methodologies,this paper aims to present a cohesive understanding of MSA,Multimodal sentiment analysis(MSA)integrates data from text,audio,and visual sources,each contributing unique insights that enhance the overall understanding of sentiment.Textual data provides explicit content and context,audio data captures the emotional tone through speech characteristics,and visual data offers cues from facial expressions and body language.Despite these strengths,MSA faces limitations such as data integration challenges,computational complexity,and the scarcity of annotated multimodal datasets.Future directions include the development of advanced fusion techniques,real-time processing capabilities,and explainable AI models.These advancements will enable more accurate and robust sentiment analysis,improve user experiences,and enhance applications in human-computer interaction,healthcare,and social media analysis.By addressing these challenges and leveraging diverse data sources,MSA has the potential to revolutionize sentiment analysis and drive positive outcomes across various domains. 展开更多
关键词 multimodal sentiment analysis Natural Language Processing Emotion Recognition Data Fusion Techniques Deep Learning Models
在线阅读 下载PDF
GLAMSNet:A Gated-Linear Aspect-Aware Multimodal Sentiment Network with Alignment Supervision and External Knowledge Guidance
10
作者 Dan Wang Zhoubin Li +1 位作者 Yuze Xia Zhenhua Yu 《Computers, Materials & Continua》 2025年第12期5823-5845,共23页
Multimodal Aspect-Based Sentiment Analysis(MABSA)aims to detect sentiment polarity toward specific aspects by leveraging both textual and visual inputs.However,existing models suffer from weak aspectimage alignment,mo... Multimodal Aspect-Based Sentiment Analysis(MABSA)aims to detect sentiment polarity toward specific aspects by leveraging both textual and visual inputs.However,existing models suffer from weak aspectimage alignment,modality imbalance dominated by textual signals,and limited reasoning for implicit or ambiguous sentiments requiring external knowledge.To address these issues,we propose a unified framework named Gated-Linear Aspect-Aware Multimodal Sentiment Network(GLAMSNet).First of all,an input encoding module is employed to construct modality-specific and aspect-aware representations.Subsequently,we introduce an image–aspect correlation matching module to provide hierarchical supervision for visual-textual alignment.Building upon these components,we further design a Gated-Linear Aspect-Aware Fusion(GLAF)module to enhance aspect-aware representation learning by adaptively filtering irrelevant textual information and refining semantic alignment under aspect guidance.Additionally,an External Language Model Knowledge-Guided mechanism is integrated to incorporate sentimentaware prior knowledge from GPT-4o,enabling robust semantic reasoning especially under noisy or ambiguous inputs.Experimental studies conducted based on Twitter-15 and Twitter-17 datasets demonstrate that the proposed model outperforms most state-of-the-art methods,achieving 79.36%accuracy and 74.72%F1-score,and 74.31%accuracy and 72.01%F1-score,respectively. 展开更多
关键词 sentiment analysis multimodal aspect-based sentiment analysis cross-modal alignment multimodal sentiment classification large language model
在线阅读 下载PDF
Cross-Modal Complementary Network with Hierarchical Fusion for Multimodal Sentiment Classification 被引量:7
11
作者 Cheng Peng Chunxia Zhang +3 位作者 Xiaojun Xue Jiameng Gao Hongjian Liang Zhengdong Niu 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2022年第4期664-679,共16页
Multimodal Sentiment Classification(MSC)uses multimodal data,such as images and texts,to identify the users'sentiment polarities from the information posted by users on the Internet.MSC has attracted considerable ... Multimodal Sentiment Classification(MSC)uses multimodal data,such as images and texts,to identify the users'sentiment polarities from the information posted by users on the Internet.MSC has attracted considerable attention because of its wide applications in social computing and opinion mining.However,improper correlation strategies can cause erroneous fusion as the texts and the images that are unrelated to each other may integrate.Moreover,simply concatenating them modal by modal,even with true correlation,cannot fully capture the features within and between modals.To solve these problems,this paper proposes a Cross-Modal Complementary Network(CMCN)with hierarchical fusion for MSC.The CMCN is designed as a hierarchical structure with three key modules,namely,the feature extraction module to extract features from texts and images,the feature attention module to learn both text and image attention features generated by an image-text correlation generator,and the cross-modal hierarchical fusion module to fuse features within and between modals.Such a CMCN provides a hierarchical fusion framework that can fully integrate different modal features and helps reduce the risk of integrating unrelated modal features.Extensive experimental results on three public datasets show that the proposed approach significantly outperforms the state-of-the-art methods. 展开更多
关键词 multimodal sentiment analysis multimodal fusion Cross-Modal Complementary Network(CMCN) hierarchical fusion joint optimization
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部