期刊文献+

基于多任务学习的邮件过滤系统的研究 被引量:4

Research of Spam Filter System Based on Multitask Learning
在线阅读 下载PDF
导出
摘要 随着电子邮件的广泛使用,如何有效地避免和防范垃圾邮件的侵扰已成为一个亟待解决的问题。受机器学习在邮件过滤中研究和应用的启发,利用多任务学习(multitask learning)的特性,将判断一个用户的邮件是否为垃圾邮件看作一个任务(task),利用多任务学习中任务相关性假设,提出一种基于多任务学习的邮件过滤系统。实验表明,该系统对中英文邮件语料都是可靠和有效的,尤其对于同一邮件列表(mail list)中的用户的邮件。 With the widespread use of e-mail,how to effectively avoid and prevent junk e-mail has become very urgent.Inspired by the research and application of machine learning in spam filter,a spam filter based on multitask learning is proposed,considering whether a user's e-mail is spam or legitimate as a task.Using tasks relevance coefficient,the system classifies emails,with the assumption of task relevance in multitask learning.Experiments show that the system is reliable and effective for both English and Chinese corpus,especially for the mails in a mail list.
出处 《计算机技术与发展》 2010年第10期137-140,共4页 Computer Technology and Development
基金 国家自然科学基金(60805022) 国家高技术研究发展计划(863)(2007AA01Z178) 南京邮电大学青兰计划(NY206034)
关键词 多任务学习 任务相关性 邮件过滤 分类 multitask learning task relevance spam filter classification
  • 相关文献

参考文献19

  • 1王斌,潘文锋.基于内容的垃圾邮件过滤技术综述[J].中文信息学报,2005,19(5):1-10. 被引量:129
  • 2Mitchell T M.Machine Learning[M].New York:McGraw-Hill,1997.
  • 3Guzella T S,Caminhas W M.A review of machine learning approaches to Spam filtering[J].Expert Systems with Applications,2009,36(7):10206-10222.
  • 4Caruana R.Multitask learning[J].Machine Learning,1997,28(1):41-75.
  • 5Sahami M,Dumains S,Heckerman D,et al.A Bayesian approach to filtering junk E-mail[R].[s.l.] :AAI Press,1998.
  • 6Drucker H,Wu D,Vapnik V N.Support vector machines for spam categorization[J].IEEE Transactions on Neural Networks,1999,10(5):1048-1054.
  • 7Clark J,Koprinska I,Poon J.A neural network based approach to automated e-mail classification[C] ∥In Proc of the IEEE/WIC int conf on web intell.[s.l.] :[s.n.] ,2003.
  • 8Goodman J,Yih W.Online discriminative spam filter training[C] ∥In Proc of the third conf on email and anti-spam.[s.l.] :[s.n.] ,2006.
  • 9Sakkis G,Androutsopoulos I,Paliouras G,et al.A memory-based approach to anti-spam filtering for mailing lists[J].Information Retrieval,2003,6(1):49-73.
  • 10Oda T,White T.Developing an immunity to spam[J].Lecture Notes in Computer Science,2003,2723:231-242.

二级参考文献32

  • 1李渝勤,孙丽华.基于规则的自动分类在文本分类中的应用[J].中文信息学报,2004,18(4):9-14. 被引量:20
  • 2M. DeSouza, J. Fitzgerald, C. Kempand G. Truong, A Decision Tree based Spam Filtering Agent[EB] . from http:∥www. cs. mu. oz. au/481/2001- projects/gntr/index. html, 2001.
  • 3N. Littlestone, Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm[J]. Machine Learning, 2(4) :285- 318, 1988[J].
  • 4R. Krishnamurthy and C. Orasan, A corpus-based investigation of junk emails[A]. In: Proceedings of Language Resources and Evaluation Conference (LREC 2002)[C]. Las Palmas de Gran Canaria, Spain, pp. 1773- 1780,May 2002.
  • 5M. Sahami, S. Dumais, D. Heckerman and E. Horvitz, A Bayesian approach to filtering junk e-mail[A]. In:Proc. of AAAI Workshop on Learning for Text Categorization[C]. pp. 55-62, 1998.
  • 6W. Cohen, Fast effective rule induction[A]. In: Machine Learning Proceedings of the Twelfth International Conference[C]. Lake Taho, California, Mongan Kanfmann, pp. 115-123, 1995.
  • 7W. Cohen, Learning rules that classify email[A]. In: Proceedings of the AAAI spring symposium of Machine Learning in Information Access, Palo Alto[C]. California, pp. 18 - 25. 1996.
  • 8X. Carreras and L. Marquez, Boosting Trees for Anti-Spam Email Filtering[A]. In: Proceedings of Euro Conference Recent Advances in NLP (RANLP-2001)[C]. pp. 58-64, Sep. 2001.
  • 9T. Nicholas, Using AdaBoost and Decision Stumps to Identify Spam E-mail[ EB]. Stanford University Course Project (Spring 2002/2003) Report, from http: ∥nlp. stanford. edu/courses/cs224n/2003/fp/.
  • 10Y. Diao, H. LuandD. Wu, A Comparative Study of Classification Based PersonalE-mail Filtering[A]. In: Proceedings of PAKDD-2000[C], pp.408-419, Apr. 2000.

共引文献128

同被引文献39

  • 1曾锐利,李刚,林凌.智能交通监控系统中多目标跟踪算法[J].电子器件,2007,30(6):2159-2162. 被引量:5
  • 2张利鹏,曹犟,徐明星,郑方.防止假冒者闯入说话人识别系统[J].清华大学学报(自然科学版),2008,48(S1):699-703. 被引量:13
  • 3任建峰,郭雷,李刚.多类支持向量机的自然图像分类[J].西北工业大学学报,2005,23(3):295-298. 被引量:7
  • 4成新民,曾毓敏,赵力.一种改进的AMDF求取语音基音的方法[J].微电子学与计算机,2005,22(11):162-164. 被引量:16
  • 5Suzuki T, Sugiyama M, Kanamori T, et al. Mutual information estimation reveals global associations between stimuli and bio- logical processes[ J]. BMC Bioinformatics ,2009,10( 1 ) : S52.
  • 6Suzuki T, Sugiyama M. Suglcient dimension reduction via squared-loss mutual information estimation [ C ]//Intl. Conf. on Artificial Intelligence and Statistics. [ s. 1. ] : [ s. n. ] ,2010:804-811.
  • 7Sugiyama M, Suzuki T, Nakajima S, et al. Direct importance estimation for covariate shift adaptation [ J ]. Annals of the In- stitute of Statistical Mathematics,2008,60(4) :699-746.
  • 8Yamada M, Sugiyama M. Direct importance estimation with Gaussian mixture models [ J ]. IEICE Trans. on Information and Systems ,2009, E92-D(10) :2159-2162.
  • 9Tipping M E, Bishop C M. Mixtures of probabilistic principal component analyzers [ J ]. Neural Computation, 1999,11 ( 2 ) : 443 -482.
  • 10Bishop C M. Pattern recognition and machine learning [ M ]. New York, USA : Springer-Verlag,2006.

引证文献4

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部