期刊文献+

一种周期性MapReduce作业的负载均衡策略 被引量:15

Load Balancing Strategy on Periodical MapReduce Job
在线阅读 下载PDF
导出
摘要 MapReduce任务负载均衡主要是通过分区函数来实现的,Hadoop默认的分区函数并不能很好地保证redu-cer的负载均衡。针对周期性的业务处理提出了一种基于权重计算的负载均衡策略,周期性任务的数据分布与历史数据相比具有相似性。本策略根据历史数据运行的信息运算出数据权重信息(文中用权重表示每条记录的处理复杂度),再通过Map阶段抽样分析当前这批数据的分布特征来预测待处理数据带权重的整体近似分布情况,从而指导Reduce分区,以保证其负载均衡。通过简单的例子仿真了整个策略的运作过程,并且对比了与TeraSort思路的不同点。最后通过分析用户访问视频的日志证明了文中提到的策略比默认的策略性能提高了接近1倍。 The MapReduce task load balancing in Hadoop mainly depends on the partition function. The Hadoop default partition function is not efficient in practical business processing. This paper presented a load balancing strategy based on the weight value of the periodic jobs. Because the data' s distribution is similar in each period, we calculated the weight from historical data's profile. Through analyzing a sample data in Map phase to predict the whole data weighted integral approximate distribution, the strategy guids the Reduce partition to ensure its load balancing. We also presented the difference between TeraSort strategy and the new strategy. The experimental results with the view video logs show that the performance of our strategy is improved about 2 times compared with the default strategy.
作者 傅杰 都志辉
出处 《计算机科学》 CSCD 北大核心 2013年第3期38-40,共3页 Computer Science
基金 国家自然科学基金((61272087 61073008 60773148 60503039) 北京市基金(4082016 4122039)资助
关键词 MAPREDUCE TeraSort 负载均衡 周期性 MapReduce, TeraSort, Load balance, Periodic
  • 相关文献

参考文献11

  • 1White T.Hadoop:The definitive guide[OL].http://books.google.com,2010.
  • 2Borthakur D.TheHadoop Distributed File System:Architecture and Design[OL].http://cloudcomputing.googlecode.com,2007.
  • 3Dean J,Ghemawat S.MapReduce:Simplified Data Processing on Large Clusters[C] //OSDI'04,Proceedings of the 6th Coference on Symposium Opearting Systems Design & Implementation.Sep.2004.
  • 4Lammel M R.Google's MapReduce programming model-Revisited[J].Data Programmability Team,2007,68(3):208-237.
  • 5Armbrust M,Fox A,Griffith R.Above the Clouds:A Berkeley View of Cloud Computing[M].ACM,2010.
  • 6Seo S,et al.HPMR:Prefetching and Pre-shuffling SharedMapReduce Computation Environment[C] //the Proceedings of 11th IEEEInternational Conference on Cluster Computing.Sep.2009.
  • 7Jiang D,Ooi B C,Shi L,et al.The Performance of MapReduce:An Indepth Study[C] //Int' l Conference on Very Large Data Bases (VLDB).2010.
  • 8Dittrich J,Jindal A.Schad Hadoop+ +:Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing)[J].VLDB 2010/PVLDB,2010,34(1/2):515-529.
  • 9Liu Xu-hui,Han Ji-zhong.Implementing WebGIS on Hadoop:A case study of improving small file I/O performance on HDFS[C] //Cluster Computing and Workshops,2009.IEEE International Conference on.2009:1-8.
  • 10Lee K-H,Lee Y-J,Choi H,et al.Parallel data processing with MapReduce:a survey[J].ACM SIGMOD Record,2011,40 (4):11-20.

同被引文献119

  • 1周家帅,王琦,高军.一种基于动态划分的MapReduce负载均衡方法[J].计算机研究与发展,2013,50(S1):369-377. 被引量:11
  • 2董新华,李瑞轩,周湾湾,王聪,薛正元,廖东杰.Hadoop系统性能优化与功能增强综述[J].计算机研究与发展,2013,50(S2):1-15. 被引量:72
  • 3韩蕾,孙徐湛,吴志川,陈立军.MapReduce上基于抽样的数据划分最优化研究[J].计算机研究与发展,2013,50(S2):77-84. 被引量:13
  • 4符丽锦,覃华,邓海,等.一种改进的Apriori算法[J].广西科技学院学报,2013,29(1):123-127.
  • 5Wegener D, Mock M, Adranale D. Toolkit based high-performance data mining of large data on Ma- pReduce clusters[C]//IEEE International Confer- ence on Data Mining-ICDM. Washington: IEEE, 2009.
  • 6S Chakrabarti. Data mining for hypertext: a tutorial survey[J]. SIGKDD Exploration, 2009,1 (3) : 4-12.
  • 7Zou Quan, Li Xu-Bin, Jiang Wen Rui. Survey of MapReduce frame operation in bioinformatics[J]. Briefings in bioinformatics, 2013,15(6) : 189-199.
  • 8Sumithra R, Paul S. Using distributed apriori as- sociation rule and classical apriori mining algo-rithms for grid based knowledge diseovery[C]// Computing Communication and Networking Tech nologies (ICCCNT), IEEE 2010 International Conference on Data Mining. IEEE,2010.
  • 9LinJing, Lu Yongquan, Wang Jintao. An im- proved apriori algorithm for early warning of e- quipment failue[C]//IEEE International Confer- ence on Computer Science and Information Tech- nology (ICCSIT). IEEE Computer Society, 2009.
  • 10Gunarathne T, Tak-Lon Wu, Qiu J. Mapreduce in the clouds for science[C]//IEEE second Interna tional Conference on Cloud Computing Technology and Science (Cloud COM), 2010 : 565-572.

引证文献15

二级引证文献126

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部