期刊文献+

MapReduce大数据处理平台与算法研究进展 被引量:97

Research Advance on Map Reduce Based Big Data Processing Platforms and Algorithms
在线阅读 下载PDF
导出
摘要 综述了近年来基于MapReduce编程模型的大数据处理平台与算法的研究进展.首先介绍了12个典型的基于MapReduce的大数据处理平台,分析对比它们的实现原理和适用场景,抽象其共性;随后介绍基于MapReduce的大数据分析算法,包括搜索算法、数据清洗/变换算法、聚集算法、连接算法、排序算法、偏好查询、最优化算法、图算法、数据挖掘算法,将这些算法按照MapReduce实现方式分类,分析影响算法性能的因素;最后,将大数据处理算法抽象为外存算法,并对外存算法的特征加以梳理,提出了普适的外存算法性能优化方法的研究思路和问题,以供研究人员参考.具体包括优化外存算法的磁盘I/O、优化外存算法的局部性以及设计增量式迭代算法.现有的大数据处理平台和算法研究多集中在基于资源分配和任务调度的平台动态性能优化、特定算法并行化、特定算法性能优化等领域,所提出的外存算法性能优化属于静态优化方法,是现有研究的良好补充,为研究人员提供了广阔的研究空间. This paper introduces the research advance on MapReduce based big data processing platforms. Frist, twelve typical MapReduce based data processing platforms are descripted, their implementation principles and application areas are compared, and their commonalities are concluded. Second, the MapReduce based big data processing algorithms, including search algorithms, data cleansing/transformation algorithms, aggregation algorithms, join algorithms, sorting algorithms, optimization algorithms, preference query algorithms, graph algorithms, and data mining algorithms, are studied. These algorithms are classified by their MapReduce implementations, and the factors that affect their performance are analyzed. Finally, big data processing algorithms are abstracted as the out-of-core algorithms whose performance features are well analyzed. The considerations, ideas and challenges of universal optimizations on the performance of out-of-core algorithms are proposed as references for researchers. These optimizations include optimizing algorithms' I/O cost and locality, and designing incremental iterative algorithms. Comparing the current topics, such as resource allocation and task scheduling based dynamic optimizations on platform, parallelization for specific algorithms, and performance optimizations on iterative algorithms, the proposed static optimizations serve as complements that highlight new areas for the researchers.
出处 《软件学报》 EI CSCD 北大核心 2017年第3期514-543,共30页 Journal of Software
基金 国家自然科学基金(61672143 61433008 61402090 61502090)~~
关键词 大数据 MAPREDUCE 外存算法 大数据处理 算法性能优化 big data MapReduce out-of-core algorithm big data processing performace optimization on algorithms
  • 相关文献

参考文献6

二级参考文献212

  • 1金连,王宏志,黄沈滨,高宏.基于Map-Reduce的大数据缺失值填充算法[J].计算机研究与发展,2013,50(S1):312-321. 被引量:18
  • 2霍然,王宏志,朱鎔,李建中,高宏.基于Map-Reduce的大数据实体识别算法[J].计算机研究与发展,2013,50(S2):170-179. 被引量:9
  • 3周红福,宫学庆,郑凯,周傲英.基于高维空间的在线高效子空间Skyline算法——CSky[J].计算机学报,2007,30(8):1409-1417. 被引量:8
  • 4Acharya Swarup, Gibbons Phillip, Poosala Viswanath, Ra maswamy Sridhar. Join synopses for approximate query an swering//Proceedings of the 1999 ACM SIGMOD Interna tional Conference on Management of Data (SIGMOD' 99) Philadelphia, Pennsylvania, USA, ACM, 1999. 275-286.
  • 5Hass Peter, Hellerstein Joseph. Ripple joins for online ag gregation//Proceedings of the 1999 ACM SIGMOD Interna tional Conference on Management of Data (SIGMOD' 99) Philadelphia, Pennsylvania, USA, ACM, 1999: 287-298.
  • 6Luo Gang, Ellmann Curt, Haas Peter, Naughton Jeffrey. A sealable hash ripple join algorithm//Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD'02). Madison, Wisconsin, USA, 2002. 252-262.
  • 7Jermaine Christopher, Dobra Alin, Arumugam Subramanian, Joshi Shantanu, Pol Abhijit. A disk-based join with probabilistic guarantees//Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data (SIGMOD'05). Baltimore, Maryland, USA, 2005. 563- 574.
  • 8Stonebraker Mike, Abadi Daniel, Batkin Adam et al. C-Store A column oriented DBMS//Proceedings of the 31st Interna tional Conference on Very Large Data Bases (VLDB' 05) Trondheim, Norway, 2005:553-564.
  • 9Hellerstein Joseph, Hass Peter, Wang Helen. Online aggregation//Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data (SIGMOD' 97). Tucson, Arizona, USA, ACM, 1997: 171-182.
  • 10Cheng Siyao, Li Jianzhong, Ren Qianqian, Yu Lei. Bernoulli sampling based (epsilon, delta)-approximate aggregation in large-scale sensor networks//Proceedings of the 29th IEEE International Conference on Computer Communications (INFOCOM'10). San Diego, CA, USA, IEEE, 2010. 1181- 1189.

共引文献199

同被引文献715

引证文献97

二级引证文献525

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部