期刊文献+

海量关系数据去重处理技术研究与优化 被引量:2

Massive Relational Data Deduplication Processing Technology Research and Optimization
在线阅读 下载PDF
导出
摘要 论文针对传统海量关系数据去重处理技术的局限性,选取上下线日志数据这种典型关系数据为研究对象,采用基于MapReduce的海量关系数据去重处理技术,对海量关系数据进行并行处理,并对该平台实现进行了论述。通过对Map阶段进行归并产生的中间结果实现数据共享,且在Reduce阶段重写partition的方式解决海量去重堆内存溢出的问题。最后通过实验对比不同海量关系数据处理的效率,验证了MapReduce技术在处理海量关系数据的高效性。 Aiming at the limitations of traditional mass-relational data processing technology, the typical relational data of the online log data are selected as the research object in this paper, which adopts the massive relational data processing technology based on MapReduce's massive relational data de-processing technology, the massive relationship data are processed in parallel, and the realization of the platform is discussed. Data share is achieved through the intermediate results which are generated by merging the map phase, and the problem of heap space out of memory is solved through rewriting partition in the Reduce stage. At last, by comparing the proficiency of through different mass relations process, the high effectiveness of the MapReduce technology in pro- cessing the massive relational data is validated in this paper.
作者 黄奇鹏 卢山 HUANG Qipeng;LU Shan(Wuhan Research Institute of Posts and Telecommunications,Wuhan 430074;Nanjing Fiberhome Software Technology Co.,Ltd.,Nanjing 210019)
出处 《计算机与数字工程》 2018年第10期2061-2065,共5页 Computer & Digital Engineering
关键词 MAPREDUCE 数据处理 数据去重 关系数据 MapReduce data processing duplicated data deletion relational data
  • 相关文献

参考文献11

二级参考文献143

共引文献1485

同被引文献11

引证文献2

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部