期刊文献+

基于MapReduce的并行贝叶斯分类算法的设计与实现 被引量:5

Design and Implementation of Parallel Bayes Classification Algorithm Using MapReduce
在线阅读 下载PDF
导出
摘要 针对现代大规模文本文档分类在单机计算机上训练和测试过程计算时间长,本文设计和实现了一种基于MapReduce架构的并行贝叶斯文本分类算法。在用普通PC搭建的Hadoop集群上研究实验,结果表明,基于MapReduce架构的贝叶斯文本自动分类算法处理大规模的文档自动分类时,在保证分类效果的情况下,并能获得接近线性的加速比。 Aiming to improve the computational time in training and testing process on large scale documents, a implementation of parallel bayes classification algorithm based on MapReduce is proposed.We studied the performance of our parallel algorithm on a large hadoop cluster.We report both timing and accuracy results which indicate that the proposed parallel algorithm based on MapReduce is capable of handling large document collections.
出处 《微计算机信息》 2010年第9期190-191,176,共3页 Control & Automation
关键词 MAPREDUCE 文本分类 HADOOP 贝叶斯 mapreduce text classification hadoop bayes
  • 相关文献

参考文献7

  • 1Dean J, Ghemawat S.MapReduce: Simplifed Data Processing on Large Clusters[C]//Proc. of the 6th Symposium on Operating System Design and hnplementation, San Francisco. 2004.
  • 2Christopher D. Manning, Prabhakar Raghavan and Hinrich Schutze. Introduction to Information Retrieval. Cambridge University Press. 2008.
  • 3Cutting D. Scalable Computing with MapReduce [C]//Proc. of O'Reilly Open Source Convention, Poland. 2005.
  • 4Tom M.Mitchell.曾华军,张银奎等译.机器学习[M].北京:机械工业出版社.2003.
  • 5Cheng-Tao Chu, Sang Kyun Kim, Yi-An Lin. Map-Reduce for Machine Learning on Multicore. [C]//Proceedings of Neural Information Processing Systems Conference (NIPS). Vancouver, Canada. 2006.
  • 6David Lewis. Na i ve(bayes) at forty:The independence assumption in information retrieval. [C]//In ECML98: Tenth European Conference On Machine Learning. Chemitz, Germany. 1998.
  • 7张冬慧,孙波,徐照财,程显毅.文本自动分类关键技术研究[J].微计算机信息,2008,24(6):197-199. 被引量:12

二级参考文献3

  • 1[1]Harry Zhang,Charles X.Ling.A Fundamental Issue of Naive Bayes,Advances in Artificial Intelligence,AI2003[C],Halifax,Canada,2003(6):591?595.
  • 2[2]Han-joon Kim,Jae-young Chang.Improving Naive Bayes Text Classifier with Modified EM Algorithm[C].ISMIS 2003:326-333.
  • 3[6]Salton G,McGill M.J.Introduction to Modern Information Retrieval[M].NewYork,McGraw-Hill,1983.

共引文献11

同被引文献46

  • 1陈立伟,李春燕.一种基于多尺度语义分析的图像识别方法[J].计算机应用研究,2009,26(2):799-800. 被引量:1
  • 2Jeffrey Dean, Sanjay Ghemanwat, MapReduce: Simplified Data Processing on Large Clusters.
  • 3Kenneth Heafield Hadoop Design and K-Means Clustering Google Inc January 15 2008.
  • 4Bradley, Fayyad, Refining Initial Points for K-Means Cluster- ing 1998.5.
  • 5Dummler, Rauber, Runger, Mapping Algorithms for Muhipro- cessor Tasks on Multi-core Clusters.
  • 6TOM Wbite. Hadoop: The Definitive Guide. US: O'Reilly. 2005.
  • 7Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. Communications of the ACM, 2005,51(1): 107-113.
  • 8Dhruba B. The Hadoop Distributed File System: Architecture and Design.2007.
  • 9Dean J, Ghemawat S. Distributed programming with Mapreduce. In: Oram A, Wilson G, eds. Beautiful Code. Sebastopol: O'Reilly Media, Inc., 2007: 371-384.
  • 10李应安.基于MapReduce的聚类算法的并行化研究.微计算机信息,2010,9.

引证文献5

二级引证文献28

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部