期刊文献+

基于MapReduce的SVM改进算法及在邮件过滤中的实现 被引量:1

The Improved SVM Algorithm Based on MapReduce and Its Implemention on Spam Filtering
原文传递
导出
摘要 针对海量文本邮件的挖掘过滤需要更大的存储空间、以及更强的计算能力,提出一种基于Hadoop云计算平台的垃圾邮件过滤方法。其思想:把相对孤立的数据集合并成易于云平台处理的大文件集合;依据评估函数构建文本向量,将邮件转换为结构化的描述;基于MapReduce分布式编程模型改进SVM算法,利用集群整体的计算能力求解最优平面。实验表明:该方法能利用廉价的计算机集群代替昂贵的高性能机器实现海量邮件数据的挖掘过滤;并且,分类效率能随着集群规模的扩增而提升较快。 Aiming at that the massive text e - mail mining filter requires more storage space and greater computing power, a method of implementing spare filtering based on Hadoop platform is pro- posed . The data is merged into one big file to be processed ; the feature words of every email is se- lected according to the evaluation function to create the txt vector and convert the e - mail to a struc- tured description. Using the improved SVM based on MapReduce to distribute the load into clusters, and solving the optimal plane using the whole cluster computing power. The experiments show that the improved SVM algorithm can take advantage of the cheap computer cluster to replace expensive high performance machine to implement e - mail mining filter; and the classification emciency is im- proved fast with expansion of the cluster scale.
出处 《无线通信技术》 2013年第2期52-56,62,共6页 Wireless Communication Technology
基金 国家自然科学基金(61202110)项目
关键词 邮件过滤 MAPREDUCE SVM算法 HADOOP 文本分类 e - mail filtering MapReduce SVM algorithm Hadoop text classification
  • 相关文献

参考文献16

二级参考文献60

  • 1单丽莉,刘秉权,孙承杰.文本分类中特征选择方法的比较与改进[J].哈尔滨工业大学学报,2011,43(S1):319-324. 被引量:25
  • 2胡佳妮,徐蔚然,郭军,邓伟洪.中文文本分类中的特征选择算法研究[J].光通信研究,2005(3):44-46. 被引量:48
  • 3赵晓力.反垃圾邮件法的立法原则[J].信息网络安全,2005(12):18-20. 被引量:5
  • 4张宏烈.支持向量机在字符识别中的应用研究[J].微计算机信息,2006(04Z):245-247. 被引量:11
  • 5CARUANA G, LI MAOZHEN, QI HAO. SpamCloud: a MapReduce based anti-spam architecture [ C]// FSKD'10: Proceedings of 7th International Conference on Fuzzy Systems and Knowledge Discovery. Yantai, China: [s. n. ], 2010, 6:3003-3006.
  • 6BEGRICHE Y, LABIOD H. A prior distribution for anti-spam statistical Bayesian model [ C]// N2S'09: International Conference on Network and service Security. Piscataway, NJ: IEEE, 2009:1 -5.
  • 7DEAN J, GHEMAWAT S. MapReduee: simplified data processing on large clusters [ C]// OSDI'04: Proceedings of the 6th USENIX Symposium on Operating Systems Design and Implementation. [ S. l.] : USENIX, 2004. 137 - 150.
  • 8DEAN J, GHEMAWAT S. MapReduce: a flexible data processing tool [ J]. Communications of the ACM, 2010, 53(1) : 72 - 77.
  • 9RISH I. An empirical study of the naive Bayes classifier [ C]//Proceedings of UCAI Workshop on Empirical Methods in Artificial Intelligence. [S.I.]: IJCA1, 2001:41-46.
  • 10中国教育和科研计算机网紧急响应组(CCERT)【EB/OL].[2011—01—15】.http://www.ccert.edu.cn/spam/sa/datasets.htm.

共引文献44

同被引文献3

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部