期刊文献+

改进KNN算法在垃圾邮件过滤中的应用 被引量:14

Application of Improved KNN Algorithm in Spam E-mail Filtering
在线阅读 下载PDF
导出
摘要 提出一种改进的KNN算法,并将其用于垃圾邮件的过滤问题。经实验证明,改进的算法能够降低K值和训练文本的分布对过滤效果的影响,减少垃圾邮件的误判和漏判,具有较好的过滤性能。 In this paper, an improved K - Nearest Neighbor (KNN) is proposed and is applied to filter spam email. It's proved that the improved algorithm is less sensitive to the parameter K and the distribution of the training set, helps reducing the misclassification, and performances well in experiments.
作者 张俊丽 张帆
出处 《现代图书情报技术》 CSSCI 北大核心 2007年第4期75-78,共4页 New Technology of Library and Information Service
基金 2006年国家社科基金项目"网络信息过滤研究"(项目编号:06BTQ024)的研究成果之一
关键词 KNN 垃圾邮件过滤 文本分类 KNN Anti - spam email Text classification
  • 相关文献

参考文献13

  • 1王斌,潘文锋.基于内容的垃圾邮件过滤技术综述[J].中文信息学报,2005,19(5):1-10. 被引量:129
  • 2Joachims T. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. European Conference on Machine Learning, 1998
  • 3Li Baoli,Chen Yuzhong,Yu Shiwen. A Comparative Study on Automatic Categorization Methods for Chinese Search Engine. In:Proceedings of the Eighth Joint International Computer Conference, 2002 : 117- 120
  • 4Androutsopoulos I,Koutsias J, Chandrinos K V,Spyropoulos C D. An Experimental Comparison of Naive Bayesian and Keyword - Based Anti - Spare Filtering with Encrypted Personal E - mail Messages. In :Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2000: 160-167
  • 5Cover T M, Hart P E. Nearest Neighbor Pattern Classification. IEEE Trans. Inform. Theory, 1967 ( 13 ) :23
  • 6Salton G, Wong A, Yang C S. A Vector Model for Automatic Indexing. Communication of ACM,1975,18( 11 ) :613 -620
  • 7Sahami M,Dumais S, Heckerman D, Horvitz E. A Bayesian Approach to Filtering Junk E-Mail. AAAI Technical Report, 1998(5) : 55 -62
  • 8Mitchell T M. Machine Learning. New York: McGraw- Hill, 1997
  • 9Salton G, McGill M J. Introduction to Modern Information Retrieval.McGraw Hill, Computer Series, 1983
  • 10徐洪伟,方勇,音春.垃圾邮件过滤技术分析[J].通信技术,2003,36(10):126-128. 被引量:14

二级参考文献37

  • 1李渝勤,孙丽华.基于规则的自动分类在文本分类中的应用[J].中文信息学报,2004,18(4):9-14. 被引量:20
  • 2[1]Postel, Jon. RFC706: On the Junk Mail Problem. November 1975 Network Working Group. 10 April 2002, http://www. faqs. org. /rfcs/rfc706. html
  • 3[2]Androutsopoulos . An Evaluation of Naive Bavesian Anti - Spam Filtering. Proceeding of the workshop on Machine Learning in the New Information Age, Barcelona, Spain, 2000:9 ~ 17
  • 4[3]Jefferson P. Naive - Bayes vs Rule - Learning in Classification of Email University of Texas at Austin. Artificial Intelligence Lab,Technical Report AITR - 99 - 284
  • 5[4]Lee G. How Hotmail Keeps Its Email Empire From Spam's Clutches.Wall Street Journal, 2002; (July)
  • 6[5]Johnson K.Internet Email协议开发指南.第2版,北京:机械工业出版社,2003:200~252
  • 7M. DeSouza, J. Fitzgerald, C. Kempand G. Truong, A Decision Tree based Spam Filtering Agent[EB] . from http:∥www. cs. mu. oz. au/481/2001- projects/gntr/index. html, 2001.
  • 8N. Littlestone, Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm[J]. Machine Learning, 2(4) :285- 318, 1988[J].
  • 9R. Krishnamurthy and C. Orasan, A corpus-based investigation of junk emails[A]. In: Proceedings of Language Resources and Evaluation Conference (LREC 2002)[C]. Las Palmas de Gran Canaria, Spain, pp. 1773- 1780,May 2002.
  • 10M. Sahami, S. Dumais, D. Heckerman and E. Horvitz, A Bayesian approach to filtering junk e-mail[A]. In:Proc. of AAAI Workshop on Learning for Text Categorization[C]. pp. 55-62, 1998.

共引文献141

同被引文献101

引证文献14

二级引证文献31

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部