摘要
指出传统KNN(k-nearest neighbor)算法的两大不足:一是计算开销大,分类效率低;二是在进行相似性度量和类别判断时,等同对待各特征项以及近邻样本,影响分类准确程度。针对第一点不足,提出三种改进策略,分别为:基于特征降维的改进、基于训练集的改进和基于近邻搜索方法的改进;针对第二点不足,提出两种改进策略,分别为:基于特征加权的改进和基于类别判断策略的改进。对每种改进策略中的代表方法进行介绍并加以评述。
The paper points out that the traditional k-nearest neighbor(KNN) algorithm has two shortcomings, one is its high compu- tational complexity, and another is that it gives equal importance to each feature items and neighbor samples during the process of simi- larity measure and category judgment. According to the first shortcoming, three kinds of improvement strategy are put forward, which are feature reduction, optimization of training set and improvement of neighbor searching method. According to the second shortcoming, two kinds of improvement strategy are put forward, which are feature weighting and sample weighting. Representative method of each strategy is also introduced and commented objectively.
出处
《图书情报工作》
CSSCI
北大核心
2012年第21期97-100,118,共5页
Library and Information Service
基金
国家社会科学基金项目"自动文本分类技术研究"(项目编号:08CTQ003)研究成果之一
关键词
KNN分类
特征降维
特征加权
训练集优化
快速算法
KNN categorization ,dimension reduction, feature weighting, training set optimizing, fast KNN