基于多任务学习的邮件过滤系统的研究被引量：4

Research of Spam Filter System Based on Multitask Learning

下载PDF

导出

摘要随着电子邮件的广泛使用,如何有效地避免和防范垃圾邮件的侵扰已成为一个亟待解决的问题。受机器学习在邮件过滤中研究和应用的启发,利用多任务学习(multitask learning)的特性,将判断一个用户的邮件是否为垃圾邮件看作一个任务(task),利用多任务学习中任务相关性假设,提出一种基于多任务学习的邮件过滤系统。实验表明,该系统对中英文邮件语料都是可靠和有效的,尤其对于同一邮件列表(mail list)中的用户的邮件。 With the widespread use of e-mail,how to effectively avoid and prevent junk e-mail has become very urgent.Inspired by the research and application of machine learning in spam filter,a spam filter based on multitask learning is proposed,considering whether a user＇s e-mail is spam or legitimate as a task.Using tasks relevance coefficient,the system classifies emails,with the assumption of task relevance in multitask learning.Experiments show that the system is reliable and effective for both English and Chinese corpus,especially for the mails in a mail list.

作者许棣华王志坚

机构地区河海大学计算机信息与工程学院南京邮电大学计算机学院

出处《计算机技术与发展》 2010年第10期137-140,共4页 Computer Technology and Development

基金国家自然科学基金(60805022) 国家高技术研究发展计划(863)(2007AA01Z178) 南京邮电大学青兰计划(NY206034)

关键词多任务学习任务相关性邮件过滤分类 multitask learning task relevance spam filter classification

分类号 TP393 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献19

1王斌,潘文锋.基于内容的垃圾邮件过滤技术综述[J].中文信息学报,2005,19(5):1-10. 被引量：129
2Mitchell T M.Machine Learning[M].New York:McGraw-Hill,1997.
3Guzella T S,Caminhas W M.A review of machine learning approaches to Spam filtering[J].Expert Systems with Applications,2009,36(7):10206-10222.
4Caruana R.Multitask learning[J].Machine Learning,1997,28(1):41-75.
5Sahami M,Dumains S,Heckerman D,et al.A Bayesian approach to filtering junk E-mail[R].[s.l.] :AAI Press,1998.
6Drucker H,Wu D,Vapnik V N.Support vector machines for spam categorization[J].IEEE Transactions on Neural Networks,1999,10(5):1048-1054.
7Clark J,Koprinska I,Poon J.A neural network based approach to automated e-mail classification[C] ∥In Proc of the IEEE/WIC int conf on web intell.[s.l.] :[s.n.] ,2003.
8Goodman J,Yih W.Online discriminative spam filter training[C] ∥In Proc of the third conf on email and anti-spam.[s.l.] :[s.n.] ,2006.
9Sakkis G,Androutsopoulos I,Paliouras G,et al.A memory-based approach to anti-spam filtering for mailing lists[J].Information Retrieval,2003,6(1):49-73.
10Oda T,White T.Developing an immunity to spam[J].Lecture Notes in Computer Science,2003,2723:231-242.

二级参考文献32

1李渝勤,孙丽华.基于规则的自动分类在文本分类中的应用[J].中文信息学报,2004,18(4):9-14. 被引量：20
2M. DeSouza, J. Fitzgerald, C. Kempand G. Truong, A Decision Tree based Spam Filtering Agent[EB] . from http:∥www. cs. mu. oz. au/481/2001- projects/gntr/index. html, 2001.
3N. Littlestone, Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm[J]. Machine Learning, 2(4) :285- 318, 1988[J].
4R. Krishnamurthy and C. Orasan, A corpus-based investigation of junk emails[A]. In: Proceedings of Language Resources and Evaluation Conference (LREC 2002)[C]. Las Palmas de Gran Canaria, Spain, pp. 1773- 1780,May 2002.
5M. Sahami, S. Dumais, D. Heckerman and E. Horvitz, A Bayesian approach to filtering junk e-mail[A]. In:Proc. of AAAI Workshop on Learning for Text Categorization[C]. pp. 55-62, 1998.
6W. Cohen, Fast effective rule induction[A]. In: Machine Learning Proceedings of the Twelfth International Conference[C]. Lake Taho, California, Mongan Kanfmann, pp. 115-123, 1995.
7W. Cohen, Learning rules that classify email[A]. In: Proceedings of the AAAI spring symposium of Machine Learning in Information Access, Palo Alto[C]. California, pp. 18 - 25. 1996.
8X. Carreras and L. Marquez, Boosting Trees for Anti-Spam Email Filtering[A]. In: Proceedings of Euro Conference Recent Advances in NLP (RANLP-2001)[C]. pp. 58-64, Sep. 2001.
9T. Nicholas, Using AdaBoost and Decision Stumps to Identify Spam E-mail[ EB]. Stanford University Course Project (Spring 2002/2003) Report, from http: ∥nlp. stanford. edu/courses/cs224n/2003/fp/.
10Y. Diao, H. LuandD. Wu, A Comparative Study of Classification Based PersonalE-mail Filtering[A]. In: Proceedings of PAKDD-2000[C], pp.408-419, Apr. 2000.

共引文献128

1张登科,易秀双,王兴伟.一种基于相似度测量的新垃圾邮件发现机制[J].中国海洋大学学报（自然科学版）,2008,38(S1):147-150. 被引量：1
2刘洋,曹津宁,刘昊,秦玉平.基于贝叶斯方法的垃圾邮件处理模型研究[J].长春工程学院学报（自然科学版）,2007,8(3):75-76.
3张平.追求[J].就业与保障,2005(11):1-1.
4王金宝.基于增量学习和阈值优化的自适应信息过滤研究[J].计算机应用,2006,26(5):1099-1101.
5庄锁法,陈兴梅.客户端防范垃圾邮件策略的探讨[J].电脑知识与技术,2006(8):172-172.
6张洪军,段会川.基于支持向量机的电子邮件分类模型设计[J].信息技术与信息化,2006(5):89-90. 被引量：1
7徐卫.一种垃圾邮件过滤网关的设计[J].电脑知识与技术,2006(12):64-65.
8黄鹏鹤.垃圾邮件内容过滤测试平台的设计与实现[J].仪器仪表用户,2007,14(1):93-94.
9陈超,陈盛雄.一种基于SMO算法的垃圾邮件过滤系统设计[J].福建电脑,2007,23(3):131-132. 被引量：1
10张俊丽,张帆.改进KNN算法在垃圾邮件过滤中的应用[J].现代图书情报技术,2007(4):75-78. 被引量：14

同被引文献39

1曾锐利,李刚,林凌.智能交通监控系统中多目标跟踪算法[J].电子器件,2007,30(6):2159-2162. 被引量：5
2张利鹏,曹犟,徐明星,郑方.防止假冒者闯入说话人识别系统[J].清华大学学报（自然科学版）,2008,48(S1):699-703. 被引量：13
3任建峰,郭雷,李刚.多类支持向量机的自然图像分类[J].西北工业大学学报,2005,23(3):295-298. 被引量：7
4成新民,曾毓敏,赵力.一种改进的AMDF求取语音基音的方法[J].微电子学与计算机,2005,22(11):162-164. 被引量：16
5Suzuki T, Sugiyama M, Kanamori T, et al. Mutual information estimation reveals global associations between stimuli and bio- logical processes[ J]. BMC Bioinformatics ,2009,10( 1 ) : S52.
6Suzuki T, Sugiyama M. Suglcient dimension reduction via squared-loss mutual information estimation [ C ]//Intl. Conf. on Artificial Intelligence and Statistics. [ s. 1. ] : [ s. n. ] ,2010:804-811.
7Sugiyama M, Suzuki T, Nakajima S, et al. Direct importance estimation for covariate shift adaptation [ J ]. Annals of the In- stitute of Statistical Mathematics,2008,60(4) :699-746.
8Yamada M, Sugiyama M. Direct importance estimation with Gaussian mixture models [ J ]. IEICE Trans. on Information and Systems ,2009, E92-D(10) :2159-2162.
9Tipping M E, Bishop C M. Mixtures of probabilistic principal component analyzers [ J ]. Neural Computation, 1999,11 ( 2 ) : 443 -482.
10Bishop C M. Pattern recognition and machine learning [ M ]. New York, USA : Springer-Verlag,2006.

引证文献4

1兰远东,邓辉舫.基于Kullback-Leibler与PCA的概率密度比值估计[J].计算机技术与发展,2012,22(6):107-110.
2刘成,彭进业.基于多任务学习的自然图像分类研究[J].计算机应用研究,2012,29(7):2773-2775. 被引量：6
3柯宏宇,高奕宁,郝雪营,黄涛.基于信道信息的回放攻击检测研究[J].计算机技术与发展,2021,31(6):118-122.
4叶鸿源,胡改玲,姜凌奇,冀泓江.Mean Shift算法在目标跟踪领域的应用研究[J].计算机科学与应用,2020,10(7):1391-1399. 被引量：2

二级引证文献8

1李龙军,王布宏,夏春和,沈海鸥.基于多任务学习方向图可重构稀疏阵列天线设计[J].系统工程与电子技术,2015,37(12):2669-2676. 被引量：1
2张超,胡斌,郑炜豪.脑皮层形态学多特征融合的轻度认知障碍的分类研究[J].小型微型计算机系统,2016,37(11):2558-2561. 被引量：1
3黎启祥,肖燕珊,郝志峰,阮奕邦.基于抗噪声的多任务多示例学习算法研究[J].广东工业大学学报,2018,35(3):47-53. 被引量：4
4李鑫,张俊,.基于深度信念网络和多任务学习的人脸识别[J].电脑知识与技术（过刊）,2016,22(17):165-168. 被引量：2
5张璟,张奕,张永梅,杨飞,李旭茹.一种人脸检测与识别方法的设计与实现[J].电脑知识与技术,2016,12(8X):194-198. 被引量：1
6马建阳,张宝鹏.基于多任务学习的多源数据分类研究[J].计算机应用研究,2018,35(11):3228-3231. 被引量：5
7高华,邬春学.一种快速自动多目标图像分割算法[J].软件导刊,2020,19(11):212-216. 被引量：1
8张博.基于决策树分类的视觉目标精准跟踪算法[J].探测与控制学报,2022,44(6):87-92. 被引量：4

1许棣华,王志坚,张艳丽,辛莉.一种基于相关系数的多层邮件过滤系统[J].计算机应用研究,2010,27(12):4715-4717. 被引量：1

计算机技术与发展

2010年第10期

浏览历史

内容加载中请稍等...

基于多任务学习的邮件过滤系统的研究被引量：4

参考文献19

二级参考文献32

共引文献128

同被引文献39

引证文献4

二级引证文献8

相关作者

相关机构

相关主题

浏览历史

基于多任务学习的邮件过滤系统的研究 被引量：4

参考文献19

二级参考文献32

共引文献128

同被引文献39

引证文献4

二级引证文献8

相关作者

相关机构

相关主题

浏览历史

基于多任务学习的邮件过滤系统的研究被引量：4