Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Framework 被引量：4

Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Framework

导出

摘要 The buzz-word big-data refers to the large-scale distributed data processing applications that operate on exceptionally large amounts of data. Google＇s MapReduce and Apache＇s Hadoop, its open-source implementation, are the defacto software systems for big-data applications. An observation of the MapReduce framework is that the framework generates a large amount of intermediate data. Such abundant information is thrown away after the tasks finish, because MapReduce is unable to utilize them. In this paper, we propose Dache, a data-aware cache framework for big-data applications. In Dache, tasks submit their intermediate results to the cache manager. A task queries the cache manager before executing the actual computing work. A novel cache description scheme and a cache request and reply protocol are designed. We implement Dache by extending Hadoop. Testbed experiment results demonstrate that Dache significantly improves the completion time of MapReduce jobs. The buzz-word big-data refers to the large-scale distributed data processing applications that operate on exceptionally large amounts of data. Google＇s MapReduce and Apache＇s Hadoop, its open-source implementation, are the defacto software systems for big-data applications. An observation of the MapReduce framework is that the framework generates a large amount of intermediate data. Such abundant information is thrown away after the tasks finish, because MapReduce is unable to utilize them. In this paper, we propose Dache, a data-aware cache framework for big-data applications. In Dache, tasks submit their intermediate results to the cache manager. A task queries the cache manager before executing the actual computing work. A novel cache description scheme and a cache request and reply protocol are designed. We implement Dache by extending Hadoop. Testbed experiment results demonstrate that Dache significantly improves the completion time of MapReduce jobs.

作者 Yaxiong Zhao Jie Wu Cong Liu

机构地区 Google Inc. Temple University Sun Yat-Sen University

出处《Tsinghua Science and Technology》 SCIE EI CAS 2014年第1期39-50,共12页 清华大学学报（自然科学版（英文版）

基金 supported in part by the Natural Science Foundation of USA(Nos.ECCS 1128209,CNS 1138963,CNS 1065444,and CCF 1028167)

关键词 big-data MAPREDUCE HADOOP CACHING big-data MapReduce Hadoop caching

分类号 TP311.52 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献28

1J. Dean and S. Ghemawat, Mapreduce: Simplified data processing on large clusters, Commun. of ACM, vol. 51, no. 1, pp. 107-113,2008.
2Hadoop, http://hadoop.apache.org/, 2013.
3Java programming language, http://www.java.coml. 2013.
4P. Th. Eugster, P. A. Felber, R. Guerraoui, and A.-M. Kermarrec, The many faces of publish/subscribe, ACM Comput. Surv., vol. 35, no. 2, pp. 114-131,2003.
5Cache algorithms, http://en.wikipedia.org/wikilCache algorithms, 2013.
6Amawon web services, http://aws.amazon.coml, 2013.
7Google compute engine, http://cloud.google.coml products/computeengine.html,2013.
8G. Ramalingam and T. Reps. A categorized bibliography on incremental computation, in Proc. of POPL '93, New York, NY, USA, 1993.
9F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data, in Proc. of OSDI'2006, Berkeley, CA, USA, 2006.
10S. Ghemawat, H. Gobioff, and S.-T. Leung, The google file system, SIGOPS Oper. Syst. Rev., vol. 37, no. 5, pp. 29-43, 2003.

同被引文献24

1陶雪娇,胡晓峰,刘洋.大数据研究综述[J].系统仿真学报,2013,25(S1):142-146. 被引量：344
2王习特,申德荣,聂铁铮,寇月,于戈.共享的MapReduce环境下批量作业的调度算法研究[J].计算机研究与发展,2013,50(S1):332-341. 被引量：2
3李德毅,刘常昱,杜鹢,韩旭.不确定性人工智能[J].软件学报,2004,15(11):1583-1594. 被引量：415
4张健,王蔚.基于支持度与置信度阈值优化技术的关联分类算法[J].计算机应用,2007,27(12):3032-3034. 被引量：9
5Lazer D,Kennedy R,King G,et al. Big data. The parable of Google Flu: traps in big data analysis[J]. Science, 2014, 343:1203-1205.
6曹正凤.数据分析基础[M].电子工业出版社,2015.
7潘巍,李战怀,伍赛,陈群.基于消息传递机制的MapReduce图算法研究[J].计算机学报,2011,34(10):1768-1784. 被引量：45
8亓开元,赵卓峰,房俊,马强.针对高速数据流的大规模数据实时处理方法[J].计算机学报,2012,35(3):477-490. 被引量：95
9樊伟红,李晨晖,张兴旺,秦晓珠,郭自宽.图书馆需要怎样的“大数据”[J].图书馆杂志,2012,31(11):63-68. 被引量：238
10饶君,吴斌,东昱晓.MapReduce环境下的并行复杂网络链路预测[J].软件学报,2012,23(12):3175-3186. 被引量：14

引证文献4

1白鹏,杨新湦,张亚宜,牟龙芳.大数据背景下的空管实验室建设探索[J].实验技术与管理,2015,32(2):228-230. 被引量：10
2韩艳,王静宇,谭跃生.奇偶直方图负载均衡超立方对等云MapReduce模型[J].计算机应用研究,2016,33(4):1075-1078.
3周烈瑜.大数据技术在空管系统的实际应用探讨[J].科技资讯,2016,14(11):8-9. 被引量：5
4刘怡,张磊.基于LT码的分布式矩阵计算研究[J].计算机工程,2024,50(8):328-335.

二级引证文献14

1黄伟,李晓玲.基于大数据和多模态智能技术的计算机视觉实验设计[J].实验技术与管理,2016,33(9):122-125. 被引量：8
2白鹏,王婕,刘永欣,张亚宜.国家级空管实验教学示范中心的建设与实践[J].实验室研究与探索,2017,36(2):157-161. 被引量：4
3琚生根,孙界平,陈黎,师维.大数据下计算机网络虚拟实验智能分析平台设计框架[J].实验室研究与探索,2017,36(12):113-115. 被引量：14
4张婷,姚之洁,王冠云,周皓昕.创新导向的设计实验室开放式管理探索[J].实验技术与管理,2018,35(5):247-249. 被引量：19
5谭经纬,潘卫军,李直霖,冉斌,左青海.基于信息融合的空管大数据构建路径研究[J].创新科技,2018,18(3):65-69. 被引量：3
6孙元.基于云计算的空管体系架构研究及流量管理系统验证[J].通讯世界,2018,25(10):40-41. 被引量：2
7王博.基于大数据的空管设备异态数据集成研究[J].科技资讯,2017,15(2):26-27.
8严勇杰,丁辉.空管大数据价值在业务运行中的体现[J].指挥信息系统与技术,2019,10(1):7-12. 被引量：4
9韩剑峰.大数据技术在民航空管战术流量管理系统的应用[J].软件工程,2019,22(8):10-13.
10白鹏.机场塔台和程序管制模拟实践教学课程改革与创新创业研究[J].当代教育实践与教学研究（电子版）,2017,0(9X):142-142. 被引量：4

1宏碁宣布进军VR领域将与雷蛇联手[J].新电脑,2016,0(5):84-84.
2张新.Google Buzz,你爱不爱[J].软件和信息服务,2010(3):18-18.
3Foxmail不能发送电子邮件[J].电脑爱好者（普及版）,2010(A02):142-143.
4网络[J].电脑爱好者,2008,0(18):75-75.
5Authors＇ reply[J].Chinese Journal of Traumatology,2013,16(5):320-320.
6尘缘雅境系统上传漏洞的利用[J].黑客防线,2004(9).
73Com电子商务网络企业解决方案[J].计算机网络世界,2000,10(4):56-57.
8Song Li.China’s National Land Observation Satellites and Their Data Applications[J].Aerospace China,2010,11(3):5-9.
9聂常红.浅析一个表单多个Submit按钮的简单实现方法[J].中国科技信息,2007(7):122-123.
10谷歌将关闭社交分享工具Buzz[J].数码设计,2013(6):29-29.

Tsinghua Science and Technology

2014年第1期

浏览历史

内容加载中请稍等...

Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Framework 被引量：4

参考文献28

同被引文献24

引证文献4

二级引证文献14

相关作者

相关机构

相关主题

浏览历史