期刊文献+

基尼指数在文本特征选择中的应用研究 被引量:5

Using Gini-Index for feature selection in text categorization
在线阅读 下载PDF
导出
摘要 使用基尼指数原理进行了文本特征选择的研究,构造了基于基尼指数的适合于文本特征选择的评估函数。结合fkNN和SVM两种不同的分类方法,在两个不同的语料集上,与其他著名的文本特征选择方法进行比较和分析实验,结果显示它的性能与现有的特征选择方法不相上下,但在算法时间复杂上获得了良好的性能。 This paper used improved Gini-index for text feature selection, and constructed the measure function based on Gini-Index, then compared it to other four feature selection measures using two kinds of classifiers on two different document corpora. The result of experiments shows that its performance is comparable with other text feature selection approaches. However, it is perfect in the time complexity of algorithm.
出处 《计算机应用》 CSCD 北大核心 2007年第10期2584-2586,2590,共4页 journal of Computer Applications
关键词 文本分类 特征选择 基尼指数 特征评估函数 text categorization feature selection Gini-Index feature selection function
  • 相关文献

参考文献13

  • 1YANG Y,PEDERSEN J O.A comparative study on feature selection in text categorization[C]// Proceedings of the Fourteenth International Conference on Machine Learning.San Francisco:Morgan Kaufmann,1997:412-420.
  • 2陆玉昌,鲁明羽,李凡,周立柱.向量空间法中单词权重函数的分析和构造[J].计算机研究与发展,2002,39(10):1205-1210. 被引量:126
  • 3周茜,赵明生,扈旻.中文文本分类中的特征选择研究[J].中文信息学报,2004,18(3):17-23. 被引量:166
  • 4陈涛,谢阳群.文本分类中的特征降维方法综述[J].情报学报,2005,24(6):690-695. 被引量:79
  • 5BREIMAN L,FRIEDMAN J.Classification and Regression Trees[M].Monterey:Wadsworth International Group,1984.
  • 6范明,孟小峰.数据挖掘概念与技术[M].2版.北京:机械工业出版社,2007:195-196.
  • 7SHANKAR S,KARYPIS G.A feature weight adjustment algorithm for Document Categorizaiton[C]// The KDD2000.Boston:ACM Press,2000.
  • 8尚文倩,黄厚宽,刘玉玲,林永民,瞿有利,董红斌.文本分类中基于基尼指数的特征选择算法研究[J].计算机研究与发展,2006,43(10):1688-1694. 被引量:38
  • 9VAPNIC V.The Nature of Statistical Learning Theory[M].New York:Springer-Verlag,1995.
  • 10YANG Y,LIU X.A re-examination of text categorization methods[C]// The 22nd Annual InternationalACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM Press,1999:42-49.

二级参考文献64

  • 1李荣陆,王建会,陈晓云,陶晓鹏,胡运发.使用最大熵模型进行中文文本分类[J].计算机研究与发展,2005,42(1):94-101. 被引量:98
  • 2VladimirN Vapnik著 张学工译.统计学习理论的本质[M].北京:清华大学出版社,2000.1-125.
  • 3Yang Yiming,Pederson J O.A Comparative Study on Feature Selection in Text Categorization [A].Proceedings of the 14th International Conference on Machine learning[C].Nashville:Morgan Kaufmann,1997:412-420.
  • 4Y.Yang.Noise reduction in a statistical approach to text categorization[A].Proceedings of the 18th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR95)[C].Seattle:ACM Press,1995:256-263.
  • 5Thorsten Joachims,Text Categorization with Support Vector Machines:Learning with Many Relevant Features[A],In:European Conferrence on Machine Learning (ECML)[C].Berlin:Springer,1998,137-142.
  • 6Mlademnic,D.,Grobelnik,M.Feature Selection for unbalanced class distribution and Nave Bayees[A].Proceedings of the Sixteenth International Conference on Machine Learning[C].Bled:Morgan Kaufmann,1999:258-267.
  • 7梁久祯 兰东俊 扈旻.基于先验知识的网页特征压缩与线性分类器设计[A]..第十二届全国神经计算学术大会论文集[C].北京:人民邮电出版社,2002.494-501.
  • 8Apte C, Damerau F J, and Weiss S M. Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 1994, 12:233- 251.
  • 9Yang Yiming, and Pedersen J O. A comparative study on feature selection in text categorization. In- Proceedings of the 14^th International Conference on Machine Learning (ICML-97), 1997. 412 - 420.
  • 10Hwee Tou Ng, Wei Boon Goh, and Kok Leong Low. Feature selection, perceptron learning, and a usability case study for text categorization. In: Proceedings of the 20^th ACM International Conference on Research and Development in Information Retrieval (SIGIR-97), 1997. 67 - 73.

共引文献389

同被引文献69

引证文献5

二级引证文献29

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部