期刊文献+

基于聚类算法的KNN文本分类算法研究 被引量:30

Improved KNN using clustering algorithm
在线阅读 下载PDF
导出
摘要 KNN算法是一种在人工智能领域如专家系统、数据挖掘、模式识别等方面广泛应用的算法。该算法简单有效,易于实现。但是KNN算法在决定测试样本的类别时,是把所求的该测试样本的K个最近邻是等同看待的,即不考虑这K个最近邻能表达所属类别的程度。由于训练样本的分布是不均匀的,每个样本对分类的贡献也就不一样,因此有必要有区别的对待训练样本集合中的每个样本。利用聚类算法,求出训练样本集合中每个训练样本的隶属度,利用隶属度来区别对待测试样本的K个最近邻。通过实验证明,改进后的KNN算法较好的精确性。 KNN is of the best text categorization algorithm and is used widely.The uneven distribution in training set will affect categorization result negatively.This paper prsents an improved KNN method and verifies its effectiveness by the experiments.The classification performance is promoted.
出处 《计算机工程与应用》 CSCD 北大核心 2009年第7期153-155,158,共4页 Computer Engineering and Applications
关键词 K近邻 隶属度 文本分类 K-Nearest Neighbour(KNN) membership degree text classification
  • 相关文献

参考文献4

二级参考文献19

  • 1HanJ KamberM.数据挖掘概念与技术[M].北京:机械工业出版社,2002..
  • 2[1]D D Lewis. Naive (Bayes) at forty: The independence assumption in information retrieval. In: The 10th European Conf on Machine Learning(ECML98), New York: Springer-Verlag, 1998. 4~15
  • 3[2]Y Yang, X Lin. A re-examination of text categorization methods. In: The 22nd Annual Int'l ACM SIGIR Conf on Research and Development in Information Retrieval, New York: ACM Press, 1999
  • 4[3]Y Yang, C G Chute. An example-based mapping method for text categorization and retrieval. ACM Trans on Information Systems, 1994, 12(3): 252~277
  • 5[4]E Wiener. A neural network approach to topic spotting. The 4th Annual Symp on Document Analysis and Information Retrieval (SDAIR 95), Las Vegas, NV, 1995
  • 6[5]R E Schapire, Y Singer. Improved boosting algorithms using confidence-rated predications. In: Proc of the 11th Annual Conf on Computational Learning Theory. Madison: ACM Press, 1998. 80~91
  • 7[6]T Joachims. Text categorization with support vector machines: Learning with many relevant features. In: The 10th European Conf on Machine Learning (ECML-98). Berlin: Springer, 1998. 137~142
  • 8[7]S O Belkasim, M Shridhar, M Ahmadi. Pattern classification using an efficient KNNR. Pattern Recognition Letter, 1992, 25(10): 1269~1273
  • 9[8]V E Ruiz. An algorithm for finding nearest neighbors in (approximately) constant average time. Pattern Recognition Letter, 1986, 4(3): 145~147
  • 10[9]P E Hart. The condensed nearest neighbor rule. IEEE Trans on Information Theory, 1968, IT-14(3): 515~516

共引文献173

同被引文献259

引证文献30

二级引证文献225

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部