期刊文献+

基于监督学习的中文情感分类技术比较研究 被引量:139

Research on Sentiment Classification of Chinese Reviews Based on Supervised Machine Learning Techniques
在线阅读 下载PDF
导出
摘要 情感分类是一项具有较大实用价值的分类技术,它可以在一定程度上解决网络评论信息杂乱的现象,方便用户准确定位所需信息。目前针对中文情感分类的研究相对较少,其中各种有监督学习方法的分类效果以及文本特征表示方法和特征选择机制等因素对分类性能的影响更是亟待研究的问题。本文以n-gram以及名词、动词、形容词、副词作为不同的文本表示特征,以互信息、信息增益、CHI统计量和文档频率作为不同的特征选择方法,以中心向量法、KNN、Winnow、Na ve Bayes和SVM作为不同的文本分类方法,在不同的特征数量和不同规模的训练集情况下,分别进行了中文情感分类实验,并对实验结果进行了比较,对比结果表明:采用Bi Grams特征表示方法、信息增益特征选择方法和SVM分类方法,在足够大训练集和选择适当数量特征的情况下,情感分类能取得较好的效果。 Sentiment classification is an applied technology with great significance. It can solve information disorder and help people locate the required reviews in the Internet. Up to now, most research of sentiment classification is on English reviews, and little work has been done on Chinese reviews. To find an effective way for the task based on supervised machine learning method, and analyze the influence by term expression and term selection, this paper conducted some experiments under distinct environments, including different feature representation, different feature selection, different categorization technique, different size of features and different size of training data, over Chinese text collections. The experimental results show that sentiment classification will obtain high performance, when using bigrams representation, information gain and SVM classifier, enough training data and plenty of features.
出处 《中文信息学报》 CSCD 北大核心 2007年第6期88-94,108,共8页 Journal of Chinese Information Processing
基金 国家"973"重点基础研究发展规划基金资助项目(2004CB318109)
关键词 计算机应用 中文信息处理 情感分类 文本分类 语言模型 中文信息处理 computer application Chinese information processing sentiment classification text categorization language model Chinese information processing
  • 相关文献

参考文献17

  • 1Franco Salvetti, Stephen Lewis, Christoph Reichenbach. Automatic Opinion Polarity Classification of Movie Reviews[J]. Colorado Research in Linguistics, 2004, Volume 17, Issue 1.
  • 2Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. Thumbs up? Sentiment classification using machine learning techniques[A]. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 79 86.
  • 3Aidan Finn, Nicholas Kushmerick, and Barry Smyth. Genre classification and domain transfer for information filtering[A]. In: Fabio Crestani, Mark Girolami, and Cornelis J. van Rijsbergen, editors, Proceedings of ECIR-02, 24th European Colloquium on Information Retrieval Research, Glasgow, UK. Springer Verlag, Heidelberg, DE.
  • 4Janyce Wiebe, Rebecca Bruce, Matthew Bell, Melanie Martin, and Theresa Wilson. A corpus study of evaluative and speculative language[A]. In: Proceedings of the 2nd ACL SIGdial Workshop on Discourse and Dialogue, 2001.
  • 5Alina Andreevskaia and Sabine Bergler. Mining Word-Net For a Fuzzy Sentiment: Sentiment Tag Extraction From WordNet Glosses[A].In: Proc. EACL-06, Trento, Italy, 2006.
  • 6Alistair Kennedy and Diana Inkpen. Sentiment Classification of Movie Reviews Using Contextual Valence Shifters[J]. Computational Intelligence, 2006,22 (2) 110-125.
  • 7P.D. Turney and M.L. Littman. Unsupervised learning of semantic orientation from a hundred-billion-word corpus[D]. Technical Report ERB-1094, National Research Council Canada, Institute for Information Technology, 2002.
  • 8P. Subasic and A. Huettner. Affect analysis of text using fuzzy semantic typing[A]. IEEE-FS, 9:483 496, Aug. 2001.
  • 9Hugo Liu, Henry Lieberman, and Ted Selker. A model of textual affect sensing using real-world knowl- edge[A]. In: Proceedings of the Seventh International Conference on Intelligent User Interfaces [C].2003. 125-132.
  • 10Wei-Hao Lin, Theresa Wilson, Janyce Wiebe and Alexander Hauptmann. Which Side are You on? Identifying Perspectives at the Document and Sentence Levels[A]. In: Proceedings of the 10th Conference on Computational Natural Language Learning (CoNLLX)[C]. New York City: June 2006, 109-116,

二级参考文献12

  • 1黄昌宁 等.对自动分词的反思[A]..语言计算与基于内容的文本处理[C].北京:清华大学出版社,2003,7.26-38.
  • 2Yang Yiming,Pederson J O.A Comparative Study on Feature Selection in Text Categorization [A].Proceedings of the 14th International Conference on Machine learning[C].Nashville:Morgan Kaufmann,1997:412-420.
  • 3Y.Yang.Noise reduction in a statistical approach to text categorization[A].Proceedings of the 18th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR95)[C].Seattle:ACM Press,1995:256-263.
  • 4Thorsten Joachims,Text Categorization with Support Vector Machines:Learning with Many Relevant Features[A],In:European Conferrence on Machine Learning (ECML)[C].Berlin:Springer,1998,137-142.
  • 5Mlademnic,D.,Grobelnik,M.Feature Selection for unbalanced class distribution and Nave Bayees[A].Proceedings of the Sixteenth International Conference on Machine Learning[C].Bled:Morgan Kaufmann,1999:258-267.
  • 6梁久祯 兰东俊 扈旻.基于先验知识的网页特征压缩与线性分类器设计[A]..第十二届全国神经计算学术大会论文集[C].北京:人民邮电出版社,2002.494-501.
  • 7何新贵,彭甫阳.中文文本的关键词自动抽取和模糊分类[J].中文信息学报,1999,13(1):9-15. 被引量:54
  • 8王梦云,曹素青.基于字频向量的中文文本自动分类系统[J].情报学报,2000,19(6):644-649. 被引量:17
  • 9范焱,郑诚,王清毅,蔡庆生,刘洁.用Naive Bayes方法协调分类Web网页[J].软件学报,2001,12(9):1386-1392. 被引量:53
  • 10孙丽华,张积东,李静梅.一种改进的kNN方法及其在文本分类中的应用[J].应用科技,2002,29(2):25-27. 被引量:36

共引文献369

同被引文献1323

引证文献139

二级引证文献1384

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部