摘要
微博情感倾向性分析通常指对中文微博中每个句子褒义、贬义或者中性的情感进行自动分类。针对微博碎片化和情感类别失衡的特点,在半监督学习reserved self-training方法的框架基础上提取了适用于微博情感分类的文本特征,并提出了针对情感倾向性分析通过训练度阈值设定的方法来优化reserved self-training迭代终止的条件,在保留reserved self-training能有效处理微博语料中语料情感不平衡问题的优点基础上,防止了训练过度情况的发生。COAE 2014微博情感倾向性评测结果证明了该方法的有效性。
Sentiment analysis of Chinese Micro-blog usually refers to classification of Micro-blogs into positive,nega-tive and neutral polarity.According to the characteristics of Micro-blogs,such as fragmentation and imbalanced of sen-timent class,on the basis of reserved self-training method we presented before,text features were extracted that are appropriate for the sentiment analysis of Micro-blog,and then a training degree threshold setup method was proposed to optimize the iteration termination condition of reserved self-training method.These methods not only take advantage of the effective treatment on imbalanced distribution problem but also prevent the overtraining problem in training process. The evaluation result in COAE2014 showed the effectiveness of these methods.
出处
《山东大学学报(理学版)》
CAS
CSCD
北大核心
2014年第11期37-42,共6页
Journal of Shandong University(Natural Science)
关键词
情感分析
训练度阈值
reserved self-training
sentiment analysis
reserved self-training
training degree threshold