期刊文献+

基于内容的垃圾邮件过滤技术综述 被引量:129

A Survey of Content-based Anti-spam Email Filtering
在线阅读 下载PDF
导出
摘要 垃圾邮件问题日益严重,受到研究人员的广泛关注。基于内容的过滤是当前解决垃圾邮件问题的主流技术之一。目前基于内容的垃圾邮件过滤主要包括基于规则的方法和基于概率统计的方法。本文综述了目前用于垃圾邮件过滤研究的各种语料和评价方法,并总结了目前使用的垃圾邮件过滤技术以及它们之间的对比实验,包括Ripper、决策树、Rough Set、Rocchio、Boosting、Bayes、kNN、SVM、Winnow等等。实验结果表明,Boosting、Flexible Bayes、SVM、Winnow方法是目前较好的垃圾邮件过滤方法,它们在评测语料上的结果已经达到很高水平,但是,要走向真正实用化,还有很多的工作要做。 The volume of junk emails on the Intemet has grown tremendously in the past few years and is causing serious problems. Content-based filtering is one of the mainstream technologies used so far. This paper aims to provide an overview on the state of art in this research field, including benchmark corpora, evaluation methods and filtering approaches. Many filtering approaches, including Ripper, Decision Trees, Rough Sets, Rocchio, Boosting, Bayes, kNN, SVM and Winnow, are discussed and compared in this paper. The experimental results show that some approaches, such as Boosting, Flexible Bayes, SVM, Winnow, can achieve very good results on research corpora. However, much more work should be done for practical use.
作者 王斌 潘文锋
出处 《中文信息学报》 CSCD 北大核心 2005年第5期1-10,共10页 Journal of Chinese Information Processing
基金 国家973项目资助(2004CB318109)
关键词 计算机应用 中文信息处理 综述 垃圾邮件 反垃圾邮件 信息过滤 文本分类 computer application Chinese information processing overview junk email anti-spare information filtering text classification
  • 相关文献

参考文献32

  • 1A. Kolcz and J. Alspector, SVM-based Filtering of E-mail Spam with Content-specific Misclassification Costs[A].In: Proc. ICDM-2001 Workshop on Text Mining (TextDM 2001)[C]. Nov. 2001.
  • 2D. Mertz, Six appQroaches to eliminating unwanted e-mail[EB].from http:∥www-900 ibm. com/developerWorks /cn/linux/other/l-spamf/index_eng.shtml, September, 1999.
  • 3G. Sakkis, I. Androutsopoulos, G. Paliouras, V. Karkaletsis, C. D. Spyropoulos, andP. Stamatopoulos,A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists, Information Retrieval[J]. Vol. 6, No. 1,pp. 49- 73,Kluwer Academic Publishers, 2003.
  • 4H. Katirai, Filtering Junk E-Mail: A Performance Comparison between Genetic Programming & Naive Bayes[D].available online at: http:∥members. rogers. com/hoomank/katirai99filtering. pdf, Sep. 1999.
  • 5H. Drucker, D. Wu, and V. N. Vapnik, Support Vector Machines for Spam Categorization[J]. IEEEE Transactions on Neural Networks, Vol. 20, No. 5, pp. 1048-1054, Sep. 1999.
  • 6I. Androutsopoulos, J. Koutsias, K.V. Chandrinos and C.D. Spyropoulos, An Experimental Comparison of Naive Bayesian and Keyword-Based Anti-Spam Filtering with Encrypted Personal E-mail Messages[A]. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2000)[C], Athens, Greece, pp. 160-167, 2000.
  • 7Ⅰ. Androutsopoulos, G. Paliouras, V. Karkaletsis, G. Sakkis, C.D. Spyropoulos and P. Stamatopoulos, Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach[A]. In: Proc. 4th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2000)[C]. pp. 1 - 13,Sep. 2000.
  • 8Ⅰ. Androutsopoulos, J. Koutsias, K.V. Chandrinos, G. Paliouras and C.D. Spyropoulos, An Evaluation of Naive Bayesian Anti-Spam Filtering[A] .In: Proc. of the Workshop on Machine Learning in the New Information Age, 11th European Conference on Machine Learning (ECML 2000) [C]. pp. 9 - 17, May 2000.
  • 9Ⅰ. Androutsopoulos, G. Paliouras and E. Michelakis, Learning to Filter Unsolicited Commercial E-Mail[EB]. Technical report 2004/2, NCSR "Demokritos", 2004.
  • 10J.M.G. Hidalgo, Evaluating Cost-Sensitive Unsolicited Bulk Email Categorization[A]. In: Proceedings of ACM Symposium on Applied Computing (SAC 2002)[C]. pp. 615-620, Mar. 2002.

二级参考文献31

共引文献148

同被引文献862

引证文献129

二级引证文献311

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部