期刊文献+

一种改进的特征选择方法 被引量:1

An improved feature selection
在线阅读 下载PDF
导出
摘要 特征权值的选择是文本分类的基础环节,TFIDF是文档特征权值表示常用方法之一。但其过于简单的词频和反文档频率表达式会忽略在一个类中频繁出现的特征,导致了特征预测能力相互削弱。文中提出了一种改进的特征选择算法(I-TFIDF),能更好的体现特征词条的权重,从而有效提高分类的正确率。实验结果表明I-TFIDF比传统的TFIDF算法具有更好的性能。 The selection of feature weight is a basic link of text categorization. And TFIDF is a kind of common method of feature weight. But the formula of Term Frequency and Inverse Document Frequency is too easy to ignore the terms which appears repeatedly,and can result in the fact that one feature's predictive power is weakened by oth- ers. In this paper, we propose a new improved feature selection method(I -TFIDF). The simulated results show that the presented algorithm has the obvious advantage compared with the traditional IFIDF model and it can improve the accuracy of text categorization.
作者 宋志辉
机构地区 贵州师范学院
出处 《贵州教育学院学报》 2009年第6期54-56,共3页 Journal of Guizhou Educational College(Social Science Edition)
关键词 文本分类 特征项 TFIDF text categorization feature selection TFIDF
  • 相关文献

参考文献8

二级参考文献35

  • 1谌志群,张国煊.文本挖掘研究进展[J].模式识别与人工智能,2005,18(1):65-74. 被引量:54
  • 2任纪生,王作英.基于特征有序对量化表示的文本分类方法[J].清华大学学报(自然科学版),2006,46(4):527-529. 被引量:4
  • 3James Auen.Natural Language Understandin[M].The Benjamin/Cummings Publishing Company, 1991-05.
  • 4Apte C,Damerau F J,Weiss S M.Automated Learning of Decision Rules for Text Categorization[J].ACM Trans On Inform Syst,12(3): 233-251.
  • 5Salton G,Buckley B.Term-weighting Approaches in Automatic Text Retrieval[J].Information Processing and Management, 1998 ; 24(5 ) :513 -523.
  • 6Larkey L S.A Patent Search and Classification System[C].In:proceedings of DL-99,4th ACM Conference on Digital Libraries Berkeley,CA,1999:179-187.
  • 7Salton G,Lesk M E.Computer Evaluation of Indexing and Text Processing[J].Association for Computing Machinery, 1968 ; 15 ( 1 ) : 8-36.
  • 8Yang Y,http://citeseernjneccom/yang97comparativehtml,1997年
  • 9Lang K,Proc the 12th Int Conference on Machine Learning(ICML 95),1995年,331页
  • 10Maron M E.On relevance probabilistic indexing and information retrival [J]. Journal of the ACM, 1960,7(3).

共引文献417

同被引文献5

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部