期刊文献+

改进混合特征模型聚类的文本情感分类算法研究

Text Sentiment Classification Algorithm Research Based on Improved Mixed Feature Model Clustering
在线阅读 下载PDF
导出
摘要 海量文本信息导致文本情感分类准确率低以及实时性差.针对这一问题,提出一种基于混合特征选择的向量空间模型聚类算法.首先将信息增益(Information Gain,IG)和互信息(Mutual Information,MI)与文档的不同词性特征相结合,生成文档的混合特征向量;然后计算文档向量空间模型之间的差异度,根据该差异度对向量空间模型进行聚类,得到聚类中心向量,采用聚类中心向量重新构造文档集的向量空间模型;最终采用支持向量机(Support Vector Machine,SVM)进行文档情感的判定.仿真实验结果表明:该混合特征向量空间模型聚类算法可以有效地降低文档样本特征的维数和数量,加快SVM的训练速度,同时实验结果也表明不同的词性特征和提取算法组合对系统的分类准确率有较大的影响. Abstract: Massive amounts of text information caused low classification accuracy and real-time performance. In order to improve accuracy of text sentiment classification, a novel classification approach based on mixed vector space model clustering was proposed. IG and MI were used to select effective mixed feature vectors firstly. And then documents were clustered according to the diversity degree between VSMs. VSM which was reconstructed by clustering centre vector was used to train SVM. The experiment results show that the meth- od could reduce the dimension and quantity of document sample effectively. By doing this, training speed of SVM is sped up fast. Our experiment results also present that the rule of parts of speech feature selection and extraction algorithm have big effects on classification results.
出处 《中北大学学报(自然科学版)》 CAS 北大核心 2014年第1期41-45,共5页 Journal of North University of China(Natural Science Edition)
基金 甘肃省教育厅基金资助项目(1113-01) 甘肃联合大学科研高水平成果项目(2011GSP01)
关键词 文本情感分类 向量空间模型 K均值聚类算法 支持向量机 信息增益 互信息 text sentiment classification vector space model K-means clustering support vector machine information gain mutual information
  • 相关文献

参考文献4

二级参考文献32

  • 1朱嫣岚,闵锦,周雅倩,黄萱菁,吴立德.基于HowNet的词汇语义倾向计算[J].中文信息学报,2006,20(1):14-20. 被引量:329
  • 2Kantor P. Information Retrieval Techniques[J]. Learned Information, 1994, 29(2): 53-90.
  • 3Mandala R, Tokunaga T, Tanaka H. Combining Multiple Evidence from Different Types of Thesaurus for Query Expansion[C]//Proc. of the 22nd ACM-SIGIR Conference. [S.l.]: ACM Press, 1999: 191-197.
  • 4Buckley C, Mitra M, Walz J, et al. Using Clustering and Super Concepts Within SMART: TREC-6[J]. Information Processing and Management, 2000, 36( 1 ): 109-131.
  • 5Hearst M A. Improving Full-text Precision on Short Queries Using Simple Constraints[C]//Proc. of the 5th Annual Symposium on Document Analysis and Information Retrieval. Las Vegas, NV, USA [s. n.], 1996.
  • 6Salton G. Automatic Text Processing----The Transformation, Analysis and Retrieval of Information by Computer[M]. [S. l.]: Addison-Wesley Publishing Co.. 1989.
  • 7PANG B,LEE L.Opinion mining and sentiment analysis[M].Boston:Now Publishers Inc,2008:8-10.
  • 8HATZIVASSILOGLOU V,MCKEOWN K R.Predicting the semantic orientation of adjectives[C]// Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics.Madrid:ACL,1997:174-181.
  • 9TURNEY P D.Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics.Philadelphia:ACL,2002:417-424.
  • 10KAMPS J,MARX M,MOKKEN R J,et al.Using WordNet to measure semantic orientation of adjectives[C]//Proceedings of the 4th International Conference on Language Reseurces and Evalvation.Lisbon:LREC,2004:1115-1118.

共引文献592

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部