Large-scale datasets are driving the rapid developments of deep convolutional neural networks for visual sentiment analysis.However,the annotation of large-scale datasets is expensive and time consuming.Instead,it ise...Large-scale datasets are driving the rapid developments of deep convolutional neural networks for visual sentiment analysis.However,the annotation of large-scale datasets is expensive and time consuming.Instead,it iseasy to obtain weakly labeled web images from the Internet.However,noisy labels st.ill lead to seriously degraded performance when we use images directly from the web for training networks.To address this drawback,we propose an end-to-end weakly supervised learning network,which is robust to mislabeled web images.Specifically,the proposed attention module automatically eliminates the distraction of those samples with incorrect labels bv reducing their attention scores in the training process.On the other hand,the special-class activation map module is designed to stimulate the network by focusing on the significant regions from the samples with correct labels in a weakly supervised learning approach.Besides the process of feature learning,applying regularization to the classifier is considered to minimize the distance of those samples within the same class and maximize the distance between different class centroids.Quantitative and qualitative evaluations on well-and mislabeled web image datasets demonstrate that the proposed algorithm outperforms the related methods.展开更多
Purpose:Nowadays,public opinions during public emergencies involve not only textual contents but also contain images.However,the existing works mainly focus on textual contents and they do not provide a satisfactory a...Purpose:Nowadays,public opinions during public emergencies involve not only textual contents but also contain images.However,the existing works mainly focus on textual contents and they do not provide a satisfactory accuracy of sentiment analysis,lacking the combination of multimodal contents.In this paper,we propose to combine texts and images generated in the social media to perform sentiment analysis.Design/methodology/approach:We propose a Deep Multimodal Fusion Model(DMFM),which combines textual and visual sentiment analysis.We first train word2vec model on a large-scale public emergency corpus to obtain semantic-rich word vectors as the input of textual sentiment analysis.BiLSTM is employed to generate encoded textual embeddings.To fully excavate visual information from images,a modified pretrained VGG16-based sentiment analysis network is used with the best-performed fine-tuning strategy.A multimodal fusion method is implemented to fuse textual and visual embeddings completely,producing predicted labels.Findings:We performed extensive experiments on Weibo and Twitter public emergency datasets,to evaluate the performance of our proposed model.Experimental results demonstrate that the DMFM provides higher accuracy compared with baseline models.The introduction of images can boost the performance of sentiment analysis during public emergencies.Research limitations:In the future,we will test our model in a wider dataset.We will also consider a better way to learn the multimodal fusion information.Practical implications:We build an efficient multimodal sentiment analysis model for the social media contents during public emergencies.Originality/value:We consider the images posted by online users during public emergencies on social platforms.The proposed method can present a novel scope for sentiment analysis during public emergencies and provide the decision support for the government when formulating policies in public emergencies.展开更多
Recent years have witnessed a rapid spread of multi-modality microblogs like Twitter and Sina Weibo composed of image, text and emoticon. Visual sentiment prediction of such microblog based social media has recently a...Recent years have witnessed a rapid spread of multi-modality microblogs like Twitter and Sina Weibo composed of image, text and emoticon. Visual sentiment prediction of such microblog based social media has recently attracted ever-increasing research focus with broad application prospect. In this paper, we give a systematic review of the recent advances and cutting-edge techniques for visual senti- ment analysis. To this end, in this paper we review the most recent works in this topic, in which detailed comparison as well as experimental evaluation are given over the cuttingedge methods. We further reveal and discuss the future trends and potential directions for visual sentiment prediction.展开更多
基金Project supported by the Key Project of the National Natural Science Foundation of China(No.U1836220)the National Nat-ural Science Foundation of China(No.61672267)+1 种基金the Qing Lan Talent Program of Jiangsu Province,China,the Jiangsu Key Laboratory of Security Technology for Industrial Cyberspace,China,the Finnish Cultural Foundation,the Jiangsu Specially-Appointed Professor Program,China(No.3051107219003)the liangsu Joint Research Project of Sino-Foreign Cooperative Education Platform,China,and the Talent Startup Project of Nanjing Institute of Technology,China(No.YKJ201982)。
文摘Large-scale datasets are driving the rapid developments of deep convolutional neural networks for visual sentiment analysis.However,the annotation of large-scale datasets is expensive and time consuming.Instead,it iseasy to obtain weakly labeled web images from the Internet.However,noisy labels st.ill lead to seriously degraded performance when we use images directly from the web for training networks.To address this drawback,we propose an end-to-end weakly supervised learning network,which is robust to mislabeled web images.Specifically,the proposed attention module automatically eliminates the distraction of those samples with incorrect labels bv reducing their attention scores in the training process.On the other hand,the special-class activation map module is designed to stimulate the network by focusing on the significant regions from the samples with correct labels in a weakly supervised learning approach.Besides the process of feature learning,applying regularization to the classifier is considered to minimize the distance of those samples within the same class and maximize the distance between different class centroids.Quantitative and qualitative evaluations on well-and mislabeled web image datasets demonstrate that the proposed algorithm outperforms the related methods.
基金This paper is supported by the National Natural Science Foundation of China under contract No.71774084,72274096the National Social Science Fund of China under contract No.16ZDA224,17ZDA291.
文摘Purpose:Nowadays,public opinions during public emergencies involve not only textual contents but also contain images.However,the existing works mainly focus on textual contents and they do not provide a satisfactory accuracy of sentiment analysis,lacking the combination of multimodal contents.In this paper,we propose to combine texts and images generated in the social media to perform sentiment analysis.Design/methodology/approach:We propose a Deep Multimodal Fusion Model(DMFM),which combines textual and visual sentiment analysis.We first train word2vec model on a large-scale public emergency corpus to obtain semantic-rich word vectors as the input of textual sentiment analysis.BiLSTM is employed to generate encoded textual embeddings.To fully excavate visual information from images,a modified pretrained VGG16-based sentiment analysis network is used with the best-performed fine-tuning strategy.A multimodal fusion method is implemented to fuse textual and visual embeddings completely,producing predicted labels.Findings:We performed extensive experiments on Weibo and Twitter public emergency datasets,to evaluate the performance of our proposed model.Experimental results demonstrate that the DMFM provides higher accuracy compared with baseline models.The introduction of images can boost the performance of sentiment analysis during public emergencies.Research limitations:In the future,we will test our model in a wider dataset.We will also consider a better way to learn the multimodal fusion information.Practical implications:We build an efficient multimodal sentiment analysis model for the social media contents during public emergencies.Originality/value:We consider the images posted by online users during public emergencies on social platforms.The proposed method can present a novel scope for sentiment analysis during public emergencies and provide the decision support for the government when formulating policies in public emergencies.
文摘Recent years have witnessed a rapid spread of multi-modality microblogs like Twitter and Sina Weibo composed of image, text and emoticon. Visual sentiment prediction of such microblog based social media has recently attracted ever-increasing research focus with broad application prospect. In this paper, we give a systematic review of the recent advances and cutting-edge techniques for visual senti- ment analysis. To this end, in this paper we review the most recent works in this topic, in which detailed comparison as well as experimental evaluation are given over the cuttingedge methods. We further reveal and discuss the future trends and potential directions for visual sentiment prediction.