期刊文献+

基于CBR的文本自动分类研究 被引量:2

Study of automatic text categorization based on CBR
在线阅读 下载PDF
导出
摘要 KNN方法是性能最好的文本分类方法之一,但它在分类时要计算待分类文档与所有训练样本的相似度,时间复杂度较大。文中提出了一种基于CBR的文本自动分类方法,先用聚类方法把训练样本库转换为范例库,然后用KNN思想分类。实验结果显示该方法分类的平均召回率和准确率达到了87.07%和89.17%;并且通过分析算法的时间复杂度得知,该方法的分类速度比KNN方法有很大的提高,因此具有很好的实用价值。 K-Nearest Neighbor(KNN) is one of the top-performing classifiers, but it has a large time complexity on calculating the similarity between the document and all training samples. An automatic text categorization mechanism based on CBR was presented, the training sample library was converted to the case library and the document was classified by KNN. In experiments, the average recall and precision were 87.07% and 89.17% respectively. In addition, by analyzing the time complexity, this mechanism can perform much more quickly than the KNN method.
出处 《计算机应用》 CSCD 北大核心 2005年第9期2028-2030,共3页 journal of Computer Applications
基金 国家自然科学基金资助项目(70171052) 皖泰开发项目资助(143-150401)
关键词 基于范例推理 文本自动分类 K近邻 聚类 case-based reasoning(CBR) automatic text categorization K-nearest neighbor clustering
  • 相关文献

参考文献8

  • 1YANG Y, LIN X. A re-examination of text categorization methods [A]. The 22nd nnual Int'l ACM SIGIR Conf. On Research and Development in Information Retrieval[C]. New York: ACM Press,1999.
  • 2DEVLJVER P, KITTLER J. Pattern Recognition: A Statistical Approach[M]. Englewood Cliffs: Prentice Hall, 1982.
  • 3KUNCHEVA LI. Editing for the k-nearest neighbors rule by a genetic algorithms[J]. Pattern Recognition Letters, 1995, 16(8): 809 -814.
  • 4KUNCHEVA LI. Fitness functions in editing KNN reference set by genetic algorithms[J]. Pattern Recognition, 1997, 30(6): 1041 -1049.
  • 5MANTARAS RL, PLAZA E. Case-Based Reasoning: An overview [J]. AI Communications Journal, 1997, 10(1): 21 - 29.
  • 6庞剑锋,卜东波,白硕.基于向量空间模型的文本自动分类系统的研究与实现[J].计算机应用研究,2001,18(9):23-26. 被引量:295
  • 7KAUFMAN L, ROUSSEEUW PJ. Finding Groups in Data, An Itroduction to Cluster Analysis[M]. JohnWiley & Sons, Brussels, Belgium, 1990.
  • 8.[EB/OL].http:∥www. nlp. org. cn/docs/download. php doc_id = 281 [ EB/OL],.

二级参考文献8

  • 1黄萱青 吴立德.独立于语种的文本分类方法[M].,2000.37-43.
  • 2鲁松 白硕 等.文本中词语权重计算方法的改进[M].,2000.31-36.
  • 3卜东波.聚类/分类理论研究及其在大模型文本挖掘的应用:博士论文[M].,2000..
  • 4黄萱菁,2000 International Conference on Multilingual Information Processing,2000年,37页
  • 5鲁松,2000 International Conference on Multilingual Information Processing,2000年,31页
  • 6卜东波,博士学位论文,2000年
  • 7Yang Yiming,Proceedings of ACMSIGIR Conference on Research and Development in Information Retrieval(SIGIR),1999年,42页
  • 8Yang Yiming,J Information Retrieval,1999年,1卷,1/2期,67页

共引文献294

同被引文献22

  • 1姚志强,余嘉元.基于范例推理:原理、研究及应用[J].宁波大学学报(教育科学版),2004,26(4):13-18. 被引量:14
  • 2万武南,王晓京,宋春雨,刘旸.基于范例推理的合同网模型[J].小型微型计算机系统,2005,26(9):1578-1581. 被引量:3
  • 3耿焕同,李杰.范例推理在文本自动分类中的应用研究[J].情报理论与实践,2007,30(6):837-840. 被引量:1
  • 4Frontini M, Griffin J, Towers S. A knowledge-based system for fault localization in wide area networks[C] //Proc of Symposium on Integrated Network Management Ⅱ. 1991:519-530.
  • 5Bandini S, Bogni D, Manzoni S. Knowledge-based alarm correlation in traffic monitoring and control[C] //Proc of the 5th IEEE International Conference on Intelligent Transportation Systems. [S. l.] :IEEE Press, 2002:702-707.
  • 6Lewis L. A case-based reasoning approach to the resolution of faults in communication networks[C] //Proc of IEEE INFOCOM. 1993:1422-1429.
  • 7Kliger S, Yemini S, Yemini Y, et al. A coding approach to event correlation[C] //Proc of the 4th International Symposium on Integrated Network Management Ⅳ. 1995:266-277.
  • 8Deng R H, Lazar A A, Wang Weiguo. A probabilistic approach to fault diagnosis in linear lightwave networks[J] . IEEE Journal on Selected Areas in Communications, 1993, 11(9):1438-1448.
  • 9张治洪,童溶,王仲元,王巍.基于范例推理的结核病专家系统[J].天津理工学院学报,1997,13(3):74-79. 被引量:5
  • 10杨明,黄华,夏建刚,王雯.网络故障定位技术的研究现状与分析[J].技术与市场,2009,16(3):20-22. 被引量:3

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部