期刊文献+

Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Framework 被引量:4

Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Framework
原文传递
导出
摘要 The buzz-word big-data refers to the large-scale distributed data processing applications that operate on exceptionally large amounts of data. Google's MapReduce and Apache's Hadoop, its open-source implementation, are the defacto software systems for big-data applications. An observation of the MapReduce framework is that the framework generates a large amount of intermediate data. Such abundant information is thrown away after the tasks finish, because MapReduce is unable to utilize them. In this paper, we propose Dache, a data-aware cache framework for big-data applications. In Dache, tasks submit their intermediate results to the cache manager. A task queries the cache manager before executing the actual computing work. A novel cache description scheme and a cache request and reply protocol are designed. We implement Dache by extending Hadoop. Testbed experiment results demonstrate that Dache significantly improves the completion time of MapReduce jobs. The buzz-word big-data refers to the large-scale distributed data processing applications that operate on exceptionally large amounts of data. Google's MapReduce and Apache's Hadoop, its open-source implementation, are the defacto software systems for big-data applications. An observation of the MapReduce framework is that the framework generates a large amount of intermediate data. Such abundant information is thrown away after the tasks finish, because MapReduce is unable to utilize them. In this paper, we propose Dache, a data-aware cache framework for big-data applications. In Dache, tasks submit their intermediate results to the cache manager. A task queries the cache manager before executing the actual computing work. A novel cache description scheme and a cache request and reply protocol are designed. We implement Dache by extending Hadoop. Testbed experiment results demonstrate that Dache significantly improves the completion time of MapReduce jobs.
出处 《Tsinghua Science and Technology》 SCIE EI CAS 2014年第1期39-50,共12页 清华大学学报(自然科学版(英文版)
基金 supported in part by the Natural Science Foundation of USA(Nos.ECCS 1128209,CNS 1138963,CNS 1065444,and CCF 1028167)
关键词 big-data MAPREDUCE HADOOP CACHING big-data MapReduce Hadoop caching
  • 相关文献

参考文献28

  • 1J. Dean and S. Ghemawat, Mapreduce: Simplified data processing on large clusters, Commun. of ACM, vol. 51, no. 1, pp. 107-113,2008.
  • 2Hadoop, http://hadoop.apache.org/, 2013.
  • 3Java programming language, http://www.java.coml. 2013.
  • 4P. Th. Eugster, P. A. Felber, R. Guerraoui, and A.-M. Kermarrec, The many faces of publish/subscribe, ACM Comput. Surv., vol. 35, no. 2, pp. 114-131,2003.
  • 5Cache algorithms, http://en.wikipedia.org/wikilCache algorithms, 2013.
  • 6Amawon web services, http://aws.amazon.coml, 2013.
  • 7Google compute engine, http://cloud.google.coml products/computeengine.html,2013.
  • 8G. Ramalingam and T. Reps. A categorized bibliography on incremental computation, in Proc. of POPL '93, New York, NY, USA, 1993.
  • 9F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data, in Proc. of OSDI'2006, Berkeley, CA, USA, 2006.
  • 10S. Ghemawat, H. Gobioff, and S.-T. Leung, The google file system, SIGOPS Oper. Syst. Rev., vol. 37, no. 5, pp. 29-43, 2003.

同被引文献24

引证文献4

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部