期刊文献+

基于粗糙集的快速KNN文本分类算法 被引量:22

Fast KNN Algorithm for Text Classification Based on Rough Set
在线阅读 下载PDF
导出
摘要 传统K最近邻一个明显缺陷是样本相似度的计算量很大,在具有大量高维样本的文本分类中,由于复杂度太高而缺乏实用性。为此,将粗糙集理论引入到文本分类中,利用上下近似概念刻画各类训练样本的分布,并在训练过程中计算出各类上下近似的范围。在分类过程中根据待分类文本向量在样本空间中的分布位置,改进算法可以直接判定一些文本的归属,缩小K最近邻搜索范围。实验表明,该算法可以在保持K最近邻分类性能基本不变的情况下,显著提高分类效率。 The traditional K Nearest Neighbor(KNN) has a fatal defect that time of similarity computing is huge. For text classification task with high dimension and huge samples, it has extremely complexity. This is not practicable for real applications. In this paper, rough set theory is introduced into classification process. The distribution of training samples is described with the concepts of upper approximation and lower approximation and also the range of upper approximation space and lower approximation space of each class are computed in the training process. According to the position of the documents in the sample space, this algorithm can label some documents directly. It reduces the searching range of KNN of some documents in the classification process. The results of experiments show that this algorithm can save largely the classification time and has almost the same classification performance as that of the traditional KNN classification algorithm.
出处 《计算机工程》 CAS CSCD 北大核心 2010年第24期175-177,共3页 Computer Engineering
基金 国家自然科学基金资助项目(60775036 60475019) 博士学科点专项科研基金资助项目(20060247039)
关键词 文本分类 K最近邻 粗糙集 text classification: K Nearest Neighbor(KNN) rough set
  • 相关文献

参考文献6

二级参考文献22

  • 1王珏,苗夺谦,周育健.关于Rough Set理论与应用的综述[J].模式识别与人工智能,1996,9(4):337-344. 被引量:264
  • 2苗夺谦.Rough Set理论及其在机器学习中的应用研究[博士学位论文].北京:中国科学院自动化研究所,1997..
  • 3王珏,J Comput Sci Technol,1998年,13卷,2期,189页
  • 4Miao Duoqian,IEEE ICIPS’97,1997年,1155页
  • 5苗夺谦,博士学位论文,1997年
  • 6陆汝钤,人工智能,1996年
  • 7Wong S K M,Bull Polish Acad Sci,1985年,33卷,693页
  • 8Shin C, Yun U, Kim H, Park S. A hybrid approach of neural network and memory-based learning to data mining. IEEE Trans. on Neural Networks, 2000, 11(3): 637 - 46.
  • 9Wettschereck D, Aha D W, Mohri T. A review and empirical evaluation of feature weighting metbords for a class of lazy learning algorithms. AI Review, 1997, 11 (2): 273 - 314.
  • 10范明 孟小峰.数据挖掘概念与技术:第七章第七节[M].北京:机械工业出版社,2001..

共引文献546

同被引文献168

引证文献22

二级引证文献130

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部