期刊文献+

基于分布式计算框架的不一致数据修复算法

Inconsistency repair algorithm based on distributed computing framework
在线阅读 下载PDF
导出
摘要 针对大数据环境下的数据不一致性问题,提出了基于MapReduce的不一致数据检测与修复算法。在传统函数依赖上引入语义约束的条件函数依赖(CFD),首先按照表达形式的不同把条件函数依赖分为常量条件函数依赖和变量条件函数依赖;然后对条件函数依赖集的一致性问题进行检测,确保条件函数依赖集之间不会产生冲突;接下来采用修改等价类的目标值解决条件函数依赖的违反;最后结合MapReduce不同阶段的运行特点,在map端和reduce端分别对违反常量条件函数依赖和变量条件函数依赖数据进行修复。实验结果表明在错误率相同的情况下,基于条件函数依赖的算法比传统算法的准确率更高、扩展性更好。 Focusing on the problem of data inconsistency in big data environment,an inconsistency detection and repair algorithm based on MapReduce was proposed and implemented.Firstly,the Conditional Function Dependencies(CFDs)that introduced semantic constraints on traditional conditional function were divided into constant conditional function dependencies and variable conditional function dependencies according to different expression forms.Then,the consistency problem of the conditional function dependency set was detected to ensure that there is no conflict between conditional function dependency sets,and the target value of the equivalence class was modified to solve the violation of the conditional function dependency.Finally,combined with the running characteristics of different stages of MapReduce,the data of the violation of the constant conditional function dependencies and the variable conditional function dependencies were repaired on the map side and the reduce side respectively.The experimental results show that under the same error rate,the algorithm based on conditional function dependence has higher accuracy and better scalability than the traditional algorithm.
作者 于祥祥 钟勇 李振东 韩啸 YU Xiangxiang;ZHONG Yong;LI Zhendong;HAN Xiao(Chengdu Institute of Computer Application,Chinese Academy of Sciences,Chengdu Sichuan 610041,China;University of Chinese Academy of Sciences,Beijing 100049,China)
出处 《计算机应用》 CSCD 北大核心 2019年第S02期164-168,共5页 journal of Computer Applications
基金 四川省科技支撑计划项目(2014GZ0013)
关键词 大数据 数据质量 不一致 条件函数依赖 MAPREDUCE big data data quality inconsistency Conditional Functional Dependency(CFD) MapReduce
  • 相关文献

参考文献6

二级参考文献386

  • 1张奥千,宋韶旭,王建民.基于数据质量规则的缺失结果解释约减[J].计算机研究与发展,2013,50(S1):221-229. 被引量:2
  • 2金连,王宏志,黄沈滨,高宏.基于Map-Reduce的大数据缺失值填充算法[J].计算机研究与发展,2013,50(S1):312-321. 被引量:18
  • 3李石君,于俊清,欧伟杰.基于HTML模式代数的Web信息提取方法[J].计算机研究与发展,2006,43(9):1644-1650. 被引量:8
  • 4Han J,Kamber M.数据挖掘:概念与技术[M].北京:机械工业出版社,2007.
  • 5Han Jiawei,Kamber M.数据挖掘概念与技术[M].北京:机 械工业出版社,2010.
  • 6Nature. Big Data [EB/OL]. [2012-10-02]. http,//www. nature, com/news/specials/bigdata/index, html.
  • 7Bryant R E, Katz R H, Lazowska E D. Big-Data computing : Creating revolutionary breakthroughs in commerce, science, and society [R]. [2012-10-02]. http:// www. cra. org/ccc/docs/init/Big_Data, pdf.
  • 8Science. Special online collection: Dealing with data [EB/OL]. [2012-10-02]. http://www, sciencemag, org/site/ special/data/, 2011.
  • 9Agrawal D, Bernstein P, Bertino E, et al. Challenges and opportunities with big data A community white paper developed by leading researchers across the United States [R/OL]. [2012-10-02]. http://cra, org/ccc/docs/init/bigdata whitepaper, pdf.
  • 10Manyika J, Chui M, Brown B, et al. Big data: The next frontier for innovation, competition, and productivity [R/OL]. [ 2012-10-02 ]. http://www, mekinsey, corn/ Insights]MGI[Research/Teehnology _ and _ Innovation]Big _ data The next frontier for innovation.

共引文献2708

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部