摘要
针对传统ETL工具集中式执行方式的不足,提出了一种基于MapReduce的分布式ETL体系结构——MDETL(MapReduce Distributed ETL)。该体系结构采用MapReduce并发处理海量数据的并行编程模型,结合分布式ETL的集群运算方法,实现了集群分布式执行ETL流程,从而提高了整个ETL系统的灵活性和吞吐率,并具有较好的可扩展性和负载平衡性能,提高了执行效率。
Aiming at deficiency of centralized execution mode of traditional extraction-transformation-loading (ETL) tools, this paper put forward the architecture of distributed ETL based on MapReduce MDETL(MapReduce Distributed ETL). The ETL architecture which uses a parallel programming model of massive data parallel processing with cluster computing methods of distributed ETL,achieves the cluster distributed ETL processing. It improves the whole ETL system's flexibility and throughput rate, and has better expansibility and load-balancing, raises the performance efficiency.
出处
《计算机科学》
CSCD
北大核心
2013年第6期152-154,共3页
Computer Science
基金
国家自然科学基金项目(70971137)资助