期刊文献+

垃圾邮件过滤中潜在语义索引的应用

The Application of Latent Semantic Indexing in the Task of Spam Filtering
在线阅读 下载PDF
导出
摘要 将潜在语义索引(LSI)应用于垃圾邮件过滤领域,并将其与向量空间模型(VSM)和经典的邮件过滤器Spa-mAssassin系统进行比较.另外,对基于词提取技术的邮件文本特征集合和SpamAssassin系统提取的邮件"元特征"集合进行了对比.实验结果表明,LSI与VSM均取得了较SpamAssassin系统更优的分类效果. The classification performance of latent semantic indexing (LSI) applied to the task of spare filtering is studied. Comparisons to the simple vector space model (VSM) and to the extremely widespread, de-facto standard for spare filtering, the SpamAssassin system, are summarized. Moreover, a set of purely textual features of E-mail messages that are based on standard word- and token-extraction techniques, and a set of application-specific "meta features" of E-mail messages as extracted by the SpamAssassin system are compared. It is shown that VSM and LSI achieve significantly better classification results than SpamAssassin.
作者 王鹏鸣
出处 《郑州大学学报(理学版)》 CAS 北大核心 2010年第2期78-82,共5页 Journal of Zhengzhou University:Natural Science Edition
基金 教育部人文社会科学研究规划项目 编号09YJA630036 教育部人文社会科学研究青年基金项目 编号09YJC740027
关键词 垃圾邮件过滤 潜在语义索引 向量空间模型 spam filtering latent semantic indexing vector space model
  • 相关文献

参考文献6

  • 1Androutsopoulos I,Koutsias J ,Chandrinos K,et al. An experimental comparison of naive Bayesian and keyword-based anti-spare filtering with personal E-mail messages[C]//Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York..ACM Press,2000:160-167.
  • 2门昌骞,王文剑.一种基于多学习器标记的半监督SVM学习方法[J].广西师范大学学报(自然科学版),2008,26(1):186-189. 被引量:9
  • 3Gee K R. Using latent semantic indexing to filter spam[C]//Proceedings of the ACM Symposium on Applied Computing. New York:ACM Press, 2003:460-464.
  • 4Berry M W,Drmae Z ,Jessup E R. Matrices, vector spaces, and information retrieval[J]. SIAM Review, 1999,41(2): 335-362.
  • 5裴红星,李纪云,王学武,徐文凯,孙岳.基坑降水引起地表沉降过程数值模拟[J].郑州大学学报(理学版),2009,41(3):89-92. 被引量:10
  • 6Gansterer W N, Janecek A G K,Lechner P. A reliable component-based architecture for E-mail filtering[C]//Proceedings of the 2nd International Conference on Availability, Reliability and Security. Washington D C: IEEE Computer Society, 2007 :43-50.

二级参考文献11

  • 1孔怡青,王士同.半监督学习贝叶斯分类(英文)[J].广西师范大学学报(自然科学版),2006,24(4):99-102. 被引量:1
  • 2JOACHIMS T. Transductive inference for text classification using support vector machines[C]//Proceedings of the Sixteenth International Conference on Machine Learning. San Fransisco :Morgan Kaufmann, 1999 : 200-209.
  • 3BELKIN M,MATVEEVA I,NIYOGI P. Regression and regularization on large graphs [C]//Proceedings of the 17th Annual Conference on Learning Theory. New York :ACM Press, 2004 : 185-192.
  • 4中国教育和科研计算机网紧急响应组.CCERT中文邮件样本集[EB/OL].北京:中国教育和科研计算机网紧急响应组,2005[2008-01-10].http://www.ccert.edu.cn/spam/sa/datasets.htm.
  • 5VAPNIK V. The nature of statistical learning theory[M]. New York :Springer, 1995 : 110-117.
  • 6CHAPELLE O ,SCHOLKOPF B,ZIEN A. Semi-supervised learning[M]. Cambridge ,MA :MIT Press, 2006 : 1-13.
  • 7BLUM A,MITCHELL T. Combining labeled and unlabeled data with co-training[C]//Proceedings of the llth Annual Conference on Computational Learning Theory. New York :ACM Press, 1998: 92-100.
  • 8DEMPSTER A P,LAIRD N M,RUBIN D B. Maximum likelihood from incomplete data via the EM algorithm[J].Journal of the Royal Statistical Society,Series B,1977,39(1):1-38.
  • 9孙新瑞,许海军,李亚勤,李新建.新的数值积分法及其在光学常数计算中的应用[J].郑州大学学报(理学版),2007,39(4):150-155. 被引量:2
  • 10赵剑豪.干扰井群水位预测在基坑降水中的应用[J].福建建设科技,2003(1):8-10. 被引量:8

共引文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部