期刊文献+

一种基于k最近邻的快速文本分类方法 被引量:15

A Fast Text Categorization Approach Based on k-Nearest Neighbor
在线阅读 下载PDF
导出
摘要 k最近邻方法是一种简单而有效的文本分类方法,但是传统的k最近邻分类方法在训练集数据量很大情况下,全局的最优搜索几乎是不可能的.因此,加速k个最近邻的搜索是k最近邻方法实用的关键.提出了一种基于k最近邻的快速文本分类方法,它能够保证在海量数据集中进行快速有效的分类.实验结果表明,这一方法较传统方法性能有显著提升. k-Nearest Neighbor (k-NN) is one of the simplest and most effective algorithms for text categorization. However, k-NN search requires intensive similarity computations, particularly for large training set, the search of the whole set is unacceptable. Therefore, speeding-up k-NN search is a key for making k-NN categorization useful in practice. In this paper a fast text categorization approach based on k-NN, which can classify textual documents quickly and efficiently on condition of searching in the very large training set is presented. Experiment shows that the new algorithm can greatly improve the performance.
出处 《中国科学院研究生院学报》 CAS CSCD 2005年第5期554-559,共6页 Journal of the Graduate School of the Chinese Academy of Sciences
关键词 文本分类 k最近邻 多维索引 相似检索 text categorization, k-Nearest Neighbor( k-NN), multidimensional index, similarity retrieval
  • 相关文献

参考文献16

  • 1Yang Y, Liu X. A re-examination of text categorization methods. In: Proceedings of 22nd Annual International ACMSIGIR Conference on Research and Development in Information Retrieval ( SIGIR'99 ) . Berkeley: ACM Press, 1999. 42 ~ 49
  • 2He J,Tan AH, Tan CL. A comparative study on Chinese text categorization methods. In: Proceedings of the International Workshop on Text and Web Mining. Singapore: Melbourne,2000. 24~ 35
  • 3Cover TM, Hart PE. Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 1968, IT-13: 21 ~ 27
  • 4Hart PE. Condensed nearest neighbor rule. IEEE Transactions on Information Theory, 1968, IT-14:515 ~ 516
  • 5Li RL, Hu YF. Noise reduction to text categorization based on density for k NN. In: Proceeding of the Second International Conference on Machine Learning and Cybernetics. Xi'an,2003. 3119~ 3124
  • 6Hwang WJ, Wen KW. Fast k NN classification algorithm based on partial distance search. Electronics Letters, 1998,34(21 ) :2006 ~ 2063
  • 7Baek SJ, Sung KM. Fast K-nearest-neighbour search algorithm for nonparametric classification. Electronics Letters ,2000,36(21 ) :1821 ~ 1822
  • 8Grabowski S. Voting over multiple k-NN classifier. TCSET'2002. 2002. 223 ~ 225
  • 9Denoeux T. A k-nearest neighbor classification rule based on dempster-shafer theory. IEEE Trans on Systems, Man, and Cybernetics, 1995,25 (5):804 ~ 813
  • 10Zhang B, Srihari SN. A fast algorithm finding k-nearest neighbors with non-metric dissimilarity. In: Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition( IWFHR' 02). 2002

二级参考文献13

  • 1王飞龙.模式识别基础[M].武汉:湖北科技出版社,1983..
  • 2[1]Bentley, J.L. Multidimensional binary search trees used for associative searching. Communications of the ACM, 1975,18(9):509~517.
  • 3[2]Guttman, A. R-Tree: a dynamic index structure for spatial searching. In: Yormark, B., ed. Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM Press, 1984. 47~54.
  • 4[3]Beckman, N., Kriegel H.P., et al. The R*-tree: an efficient and robust access method for points and rectangles. In: Garcia-Molina, H., Jagadish, H.V., eds. Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM Press, 1990. 322~331.
  • 5[4]Berchtold, S., Keim, D.A., Kriegel, H.P. The X-tree: an index structure for highdimensional data. In: Vijayaraman, T.M., Buchmann,A.P., et al., eds. Proceedings of the 22th International Conference on VLDB. CA: Morgan Kaufmann Publishers, 1996. 28~39.
  • 6[5]White, D.A., Jain, R. Similarity indexing with the SS-tree. In: Proceedings of the 12th International Conference on Data Engineering. 1996. 516~523.
  • 7[6]Uhlmann, J. Satisfying general proximity/similarity queries with metric trees. Information Processing Letters, 1991,40:175~179.
  • 8[7]Baeza-Yates, R., Cunto, W., Manber U., et al. Proximity matching using fixed-queries trees. In: Gochemore, M., Gusfield, D., eds. Proceedings of the 5th Symposium on Combinatorial Pattern Matching. Lecture Notes in Computer Science 807, Springer-Verlag, 1994. 198~212.
  • 9[8]Brin, S. New neighbor search in large metric space. In: Dayal, U., Peter, P.M.D., et al, eds. Proceedings of the VLDB'95. CA: Morgan Kaufmann Publishers, 1995. 574~584.
  • 10[9]Ciaccia, P., Patella, M., Zezula, P. M-Tree: an efficient access method for similarity search in metric space. In: Jarke, M., Karey, M.J., eds. Proceedings of the VLDB'97. CA: Morgan Kaufmann Publishers, 1997. 426~435.

共引文献28

同被引文献107

引证文献15

二级引证文献70

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部