期刊文献+

基于逐步优化分类模型的跨领域文本情感分类 被引量:3

Cross-domain Sentiment Classification Based on Optimizing Classification Model Progressively
在线阅读 下载PDF
导出
摘要 跨领域文本情感分类已成为自然语言处理领域的一个研究热点。针对传统主动学习不能利用领域间的相关信息以及词袋模型不能过滤与情感分类无关的词语,提出了一种基于逐步优化分类模型的跨领域文本情感分类方法。首先选择源领域和目标领域的公共情感词作为特征,在源领域上训练分类模型,再对目标领域进行初始类别标注,选择高置信度的文本作为分类模型的初始种子样本。为了加快目标领域的分类模型的优化速度,在每次迭代时,选取低置信度的文本供专家标注,将标注的结果与高置信度文本共同加入训练集,再根据情感词典、评价词搭配抽取规则以及辅助特征词从训练集中动态抽取特征集。实验结果表明,该方法不仅有效地改善了跨领域情感分类效果,而且在一定程度上降低了人工标注样本的代价。 Cross-domain sentiment classification has attracted more attention in natural language processing field. Given that tradition active learning can' t make use of the public information between domains and the bag of words model can't filter these words not related with sentiment classification, a method of cross-domain sentiment classification based on optimizing classification model progressively was proposed. Firstly, this paper selected the public sentiment words as features to train classification model on the labeled source domain, then used the classification model to predict the initial category label for target domain and selected the texts with high confidence value as initial seed texts of the learning model. Secondly, we added the high confidence text and low confidence text to the training set at each iteration. Finally, the feature set was extracted to transform feature space based on the sentimental dictionary, evaluation colloca- tion rules and assist feature words, The experimental results indicate that this method can not only improve the accuracy of cross domain sentiment classification effectively, but also reduce the manual annotation price to some extent.
作者 张军 王素格
出处 《计算机科学》 CSCD 北大核心 2016年第7期234-239,共6页 Computer Science
基金 国家自然科学基金资助项目(61175067 61272095 60875040) 国家"八六三"高技术研究发展计划基金项目(2015AA015407) 山西省科技攻关项目(20110321027-02) 山西省回国留学人员科研项目(2013-014) 山西省科技基础条件平台建设项目(2015091001-0102)资助
关键词 情感分类 跨领域 分类模型 特征抽取 置信度 Sentiment classification, Cross domain, Classification model, Feature extraction, Confidence
  • 相关文献

参考文献7

二级参考文献98

  • 1陈炯,张永奎.一种基于词聚类的中文文本主题抽取方法[J].计算机应用,2005,25(4):754-756. 被引量:17
  • 2朱嫣岚,闵锦,周雅倩,黄萱菁,吴立德.基于HowNet的词汇语义倾向计算[J].中文信息学报,2006,20(1):14-20. 被引量:329
  • 3林传鼎,无.社会主义心理学中的情绪问题——在中国社会心理学研究会成立大会上的报告(摘要)[J].社会心理科学,2006,21(1):37-37. 被引量:15
  • 4毛勇,周晓波,夏铮,尹征,孙优贤.特征选择算法研究综述[J].模式识别与人工智能,2007,20(2):211-218. 被引量:95
  • 5谭松波.中文情感挖掘语料--chnsenticorp[EB/OL].[2010-05-01].http://www.searchforum.org.cn/tansongbo/corpus-senti.htm.
  • 6Yang Y, Pedersen J O. A comparative study on feature selection in text categorization [C] //Proc of the 14th Int Conf on Machine Learning. San Francisco, CA: Morgan Kaufmann, 1997:412-420.
  • 7Pang B, Lee L, Vaithyanathan S. Thumbs up? sentiment classification using machine learning techniques [C] //Proc of the Conf on Empirical Methods in Natural Language Processing (EMNLP). Philadelphia, PA~ Association for Computaional Linguistics, 2002:79-86.
  • 8Wang Suge, Wei Yingjie, Li Deyu, et al. A hybrid method of feature selection for Chinese text sentiment classification[C] //Proc of the 4th Int Conf on Fuzzy Systems and Knowledge Discovery. Los Alamitos, CA: IEEE Computer Society, 2007:435-439.
  • 9Tan Songbo, Zhang Jin. An empirical study of sentiment analysis for Chinese documents [J]. Expert Systems with Application, 2008, 34(4):2622-2629.
  • 10Turney P D, Littman M L. Measuring praise and criticism: inference of semantic orientation from association[J]. ACM Trans on Information Systems, 2003, 21 (4) : 315-346.

共引文献993

同被引文献22

引证文献3

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部