期刊文献+

文本分类中改进型CHI特征选择方法的研究 被引量:40

Study on improved CHI for feature selection in Chinese text categorization
在线阅读 下载PDF
导出
摘要 分析了影响传统CHI统计方法分类精度的因素,去除了特征项与类别负相关的情况。同时将改进后的方法用于特征词的权重调整,使其分类效果有了明显提高;将分散度、集中度、频度等因素引入到改进后的方法中,提高了其在类分布不均匀语料集上的分类精确度。最后通过实验证明了该方法的有效性和可行性。 This paper analyzes the factors which influence the CHI categorization accuracy and removes the negative correlation between the items and the category.The improved approach is applied to weight adjustment,obviously improving categorization quality.Furthermore,concentration information,distribution information and frequency information are introduced into the improved approach,which increases the categorization accuracy on the corpus of category uneven distribution.The experimental results verify the efficiency and probability of the improved CHI approach.
出处 《计算机工程与应用》 CSCD 北大核心 2011年第4期128-130,194,共4页 Computer Engineering and Applications
基金 航空科学基金项目(No.2006ZC31001)~~
关键词 文本分类 特征选择 CHI统计 权值调整 分散度 集中度 频度 text classification feature selection CHI statistical approach weight adjustment techniques distribution information concentration information frequency information
  • 相关文献

参考文献6

二级参考文献18

  • 1陈治纲,何丕廉,孙越恒,郑小慎.基于向量空间模型的文本分类系统的研究与实现[J].中文信息学报,2005,19(1):36-41. 被引量:43
  • 2徐凤亚,罗振声.文本自动分类中特征权重算法的改进研究[J].计算机工程与应用,2005,41(1):181-184. 被引量:57
  • 3SCHUTZE H,HULL D A,PEDERSEN J O.A comparison of classifiers and document representations for the routing problem[C]// Proceedings of the 18th ACM International Conference on Research and Development in Information Retrieval.[S.l.]:ACM Press,1995:229-237.
  • 4YANG YI-MING.Expert network:Effective and efficient learning from human decisions in text categorization and retrieval[C]// Proceedings of the 17th International ACM SIGIR Conference on Research and Development in Information Retrieval.Dublin:Springer,1994:12-22.
  • 5杨允信.文本文件自动分类之研究[C]// 台湾地区第六届计算语言学研讨会论文集.台湾:[s.n.],1993.
  • 6北京大学网络与分布式实验室.2007全国搜索引擎与网上信息挖掘研讨会(SEWM)中文分类评测指南[DB-OL].[2006-10-01].http:/www.cwirf.org/2007WebTrack/SEWM2007ClassificationTrackGuide.pdf.
  • 7龚笔洪.SEWM’07中文网页分类评测汇总[DB-OL].[2007-03-01].http://www.cwirf.org/2007WebTrack/cct/sewm07.ppt.
  • 8MLADEMNIC D, GROBELNIK M. Feature selection for unbalanced class distribution and naive bayees[ C] // Proceedings of the Sixteenth International Conference on Machine Learning. Bled: Morgan Kaufmann, 1999: 258-267.
  • 9YANG Yi-ming, JAN O P. A Comparative Study on Feature Selection in Text Categorization [ C]// Proceedings of the Fourteenth International Conference on Machine learning. Nashville: Morgan Kaufmann, 1997: 412-420.
  • 10ZHENG Zhao-hui, WU Xiao-yun, ROLINI S. Feature selection for text categorization on imbalaneed data[J]. ACM StGKDD Explorations Newsletter, 2004, 6( 1 ) : 80-89.

共引文献194

同被引文献340

引证文献40

二级引证文献245

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部