摘要
网络文本处理中,颗粒度过大或过小都会造成情感分析关键词断裂,无法精准定位文本内部固有的结构断裂点,从而难以全面捕捉关键词,导致情感识别准确性、特征贡献率减弱。为此,提出基于Bagging_BiLSTM的网络文本情感分析方法。基于word2vec方法将网络文本中的词语转换为词向量,并根据词向量计算文本中各语句的重要度,选取重要度较高的语句作为颗粒度基准,构建文本摘要,更准确地反映文本的情感和结构特点;基于最大距离法的K-means算法对各文本摘要展开聚类,将相似的文本摘要归为一类,避免文本结构断裂造成的情感分析关键词断裂;在各聚类中任选一个文本,将该文本摘要的词向量输入到Bagging_BiLSTM模型中,识别各聚类文本的具体情感类型特征信息,完成情感分析。实验结果表明,上述方法的文本聚类效果较好,情感识别准确性、特征贡献率较高。
In the processing of network text,if the granularity is too large or too small,it will cause the keyword of sentiment analysis to be broken,and it will be impossible to accurately locate the inherent structural breakpoints in the text,which will make it difficult to fully capture the keywords,resulting in the decrease of sentiment recognition accuracy and feature contribution rate.Therefore,a network text sentiment analysis method based on Bagging_BiLSTM is proposed.Based on the word2vec method,the words in the network text are converted into word vectors,and the importance of each sentence in the text is calculated according to the word vectors.The sentences with higher importance are selected as the granularity benchmark to construct the text summary,which more accurately reflects the sentiment and structural characteristics of the text;the K-means algorithm based on the maximum distance method is used to cluster each text summary,and similar text summaries are classified into one category to avoid the sentiment analysis keyword break caused by the text structure break;one text is selected in each cluster,and the word vector of the text summary is input into the Bagging_BiLSTM model to identify the specific sentiment type feature information of each cluster text,and complete the sentiment analysis.The experimental results show that the text clustering effect of this method is good,and the sentiment recognition accuracy and feature contribution rate are high.
作者
刘杰
葛浩伟
LIU Jie;GE Hao-wei(College of Oceanography and Space Informatics,China University of Petroleum(East China),Qingdao Shandong 266580,China;Goertek Company,Weifang Shandong 264041,China)
出处
《计算机仿真》
2025年第9期274-278,共5页
Computer Simulation
基金
2024年度教育部人文社会科学研究专项任务项目(高校辅导员研究)(24JDSZ3087)
2023年山东省社会科学规划学校思想政治教育(全环境立德树人)研究专项(23CSZJ39)。
关键词
情感分析
网络文本分类
词向量
Bagging_BiLSTM
Sentiment analysis
Web text classification
word2vec
Word vector