期刊文献+

基于GE码的HDFS优化方案 被引量:7

HDFS optimization program based on GE coding
在线阅读 下载PDF
导出
摘要 针对Hadoop分布式文件系统(HDFS)数据容灾效率和小文件问题,提出了基于纠删码的解决方案。该方案引用了新型纠删码(GE码)的编码和译码模块,对HDFS中的文件进行编码分片,生成很多个Slice并随机均匀的分配保存到集群中,代替原来HDFS系统的多副本容灾策略。该方法中引入了Slice的新概念,将Slice进行分类合保存在block中并然后通过对Slice建立二级索引来解决小文件问题;该研究方法中抛弃了三备份机制,而是在集群出现节点失效的情况下,通过收集与失效文件相关的任意70%左右的Slice进行原始数据的恢复。通过相关的集群实验结果表明,该方法在容灾效率、小文件问题、存储成本以及安全性上对HDFS作了很大的优化。 Concerning Hadoop Distributed File System (HDFS) data disaster recovery efficiency and small fines, this paper presented an improved solution based on coding and the solution introduced a coding module of erasure GE to HDFS. Different from the multiple-replication strategy adopted by the original system, the module encoded files of HDFS into a great number of slices, and saved them dispersedly into the clusters of the storage system in distributed fashion. The research methods introduced the new concept of the slice, slice was classified and merged to save in the block and the secondary index of slice was established to solve the small files issue. In the case of cluster failure, the original data would be recovered via decoding by collecting any 70% of the slice, the method also introduced the dynamic replication strategies, through dynamically creating and deleting replications to keep the whole cluster in a good load-balancing status and settle the hotspot issues. The experiments on analogous clusters of storage system show the feasibility and advantages of new measures in proposed solution.
出处 《计算机应用》 CSCD 北大核心 2013年第3期730-733,共4页 journal of Computer Applications
基金 国家863计划项目(2008AAO1Z402)
关键词 HADOOP分布式文件系统 纠删码 数据容灾 两级索引 Hadoop Distributed File System (HDFS) erasure code data disaster recovery secondary index
  • 相关文献

参考文献4

二级参考文献55

  • 1郭天杰,曹强,谢长生.远程镜像技术和方法研究[J].计算机工程与科学,2006,28(10):38-41. 被引量:6
  • 2Minwen Ji, Alistair Veitch, John Wilkes. Seneca: remote mirroring done write [ C ]//Proceedings of USENIX Technical Conference. Berkeley: USENIX, 2003: 253-268.
  • 3EMC Corporation. Using EMC snapview and mirrorview for remote backup [ Z]. Hopkinton: EMC Corporation, 2002.
  • 4IBM中国信息支持中心.容灾白皮书[z].北京:IBM中国信息支持中心,2006.
  • 5Hewlett-Packard Company. HP disk array xp1024 [ Z]. USA: Hewlett-Packard Company, 2002.
  • 6IBM Corporation. DFSMS/MVS version 1 remote copy administrator's guide and reference [ M ]. New York: IBM Corporation, 1997.
  • 7Veritas Software Corp. VERITAS volume replicator 3.5 : administrator' s guide (solaris) [ M 1. USA: Veritas Software Corp , 2002.
  • 8Greg Scbuweiler. Building a SAN backup solution[ M ]. San Francisco: Miller Freeman, Inc, 2000 :$-14.
  • 9Hewlett-Packard Company access storage appliance [ Z]. Company, 2002. HP open view continuous ~ USA: Hewlett-Packard.
  • 10Minwen Ji, Alistair Veitch, John Wilkes. Seneca: remote mirroring done write [ C ]//Proceedings of USENIX Technical Conference. Berkeley: USENIX, 2003 : 253- 268.

共引文献105

同被引文献47

  • 1董新华,李瑞轩,周湾湾,王聪,薛正元,廖东杰.Hadoop系统性能优化与功能增强综述[J].计算机研究与发展,2013,50(S2):1-15. 被引量:72
  • 2Panian Z. A new data management challenge: How to handle big datal /Proceedings of the International Conference on Humanities. Geography and Economics. Dubai , UAE. 2013: 47-51.
  • 3Rousseau R. A view on big data and its relation to informetrics. ChineseJournal of Library and Information Science. 2012. 5(3): 12-26.
  • 4Zhu Yun-Feng , Lee P P C. Hu Yu-Chong , et al. On the speedup of single-disk failure recovery in XOR-coded storage systems: Theory and practice//Proceedings of the 28th IEEE Conference on Massive. London. UK. 2012: 106-114.
  • 5Cui Ii-Feng , Zhang Yong , Li Chao. Xing Chun-Xiao. A packaging approach for massive amounts of small geospatial files with HDFS//Proceedings of the Web-Age Information Management. Beijing. China. 2012: 210-215.
  • 6Dong Bo, Zheng Qing-Hua. Tian Feng , et al. Performance models and dynamic characteristics analysis for HDFS write and read operations: A systematic view.Journal of Systems and Software. 2014. 93: 132-151.
  • 7Wang Yong-Gang , Wang Sheng. Research and implementation on spatial data storage and operation based on hadoop platform//Proceedings of the 2010 2nd IITA International Conference on Geoscience and Remote Sensing. Qingdao. China. 2010: 275-278.
  • 8Harold C L. Shivnath B.Jeffrey S C. Automated control for elastic storage//Proceedings of the 7th International Conference on Autonomic Computing (lCAC' 10). Washington USA. 2010: 1-10.
  • 9Zhao Tie-Zhu , Yuan Hua-Qiang. Performance analysis of distributed file systems for data-intensive applications// Proceedings of the 2013 IEEE International Conference on Computer Science and Automation Engineering. Guangzhou , China. 2012: 1417-1420.
  • 10Ashish T. Zheng S. et al. Data warehousing and analytics infrastructure at facebook//Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD'10). Indiana. USA. 2010: 1013-1020.

引证文献7

二级引证文献129

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部