摘要
KNN方法是性能最好的文本分类方法之一,但它在分类时要计算待分类文档与所有训练样本的相似度,时间复杂度较大。文中提出了一种基于CBR的文本自动分类方法,先用聚类方法把训练样本库转换为范例库,然后用KNN思想分类。实验结果显示该方法分类的平均召回率和准确率达到了87.07%和89.17%;并且通过分析算法的时间复杂度得知,该方法的分类速度比KNN方法有很大的提高,因此具有很好的实用价值。
K-Nearest Neighbor(KNN) is one of the top-performing classifiers, but it has a large time complexity on calculating the similarity between the document and all training samples. An automatic text categorization mechanism based on CBR was presented, the training sample library was converted to the case library and the document was classified by KNN. In experiments, the average recall and precision were 87.07% and 89.17% respectively. In addition, by analyzing the time complexity, this mechanism can perform much more quickly than the KNN method.
出处
《计算机应用》
CSCD
北大核心
2005年第9期2028-2030,共3页
journal of Computer Applications
基金
国家自然科学基金资助项目(70171052)
皖泰开发项目资助(143-150401)
关键词
基于范例推理
文本自动分类
K近邻
聚类
case-based reasoning(CBR)
automatic text categorization
K-nearest neighbor
clustering