摘要
微博情感分析指利用微博文本进行情感的自动分类。在对大规模的中文微博短文本进行分类时,存在着耗时长和一致性差等问题。针对以上问题,论文采用基于多分类器集成的self-training的半监督情感分类方法。在小规模的情感标注样本的基础上,使用多个分类器参与分类预测,通过设置子分类器的情感贡献权重来得到分类的情感置信度,选出置信度高的样本来扩大训练集,更新训练模型,从而提高情感分类的效率和准确性。并于传统的半监督情感分析方法进行比较,实验证明,论文算法具有更高的效率和准确性。
Chinese Micro-blog sentiment analysis refers to use Micro-blog text for emotional automaticclassification. In the large-scale Chinese micro-blog short text classification,there is a time consuming and poor consistency problem. In order to solve above problems,this paper uses semi-supervised emotion classification based on multiple classifier integration on self-training to classify. On the basis of emotion marked sample on a small scale,multiple classifiersin classification prediction is used. The confidence of classification by setting the weight contribution of the subclassifier. High confidence level samples are chosen to expand the training set,update training model,so as to improve the efficiency and accuracy of sentiment classification. In this paper,compared with traditional semi-supervised emotional analysis method,the experiments show that this algorithm has higher efficiency and accuracy.
作者
陈珂
黎树俊
谢博
CHEN Ke;LI Shujun;XIE Bo(Department of Computer Science and Technology,Guangdong University of Petrochemical Technology,Maoming 52500)
出处
《计算机与数字工程》
2018年第9期1850-1855,共6页
Computer & Digital Engineering
基金
广东省自然科学基金项目(编号:2016A030307049)
大学生创新创业训练项目(编号:201611656002
201611656029
2016py A033)资助
关键词
情感分析
半监督学习
分类器集成
sentiment analysis
senti-supervised learning
classifier integration