摘要
针对大数据环境下的数据不一致性问题,提出了基于MapReduce的不一致数据检测与修复算法。在传统函数依赖上引入语义约束的条件函数依赖(CFD),首先按照表达形式的不同把条件函数依赖分为常量条件函数依赖和变量条件函数依赖;然后对条件函数依赖集的一致性问题进行检测,确保条件函数依赖集之间不会产生冲突;接下来采用修改等价类的目标值解决条件函数依赖的违反;最后结合MapReduce不同阶段的运行特点,在map端和reduce端分别对违反常量条件函数依赖和变量条件函数依赖数据进行修复。实验结果表明在错误率相同的情况下,基于条件函数依赖的算法比传统算法的准确率更高、扩展性更好。
Focusing on the problem of data inconsistency in big data environment,an inconsistency detection and repair algorithm based on MapReduce was proposed and implemented.Firstly,the Conditional Function Dependencies(CFDs)that introduced semantic constraints on traditional conditional function were divided into constant conditional function dependencies and variable conditional function dependencies according to different expression forms.Then,the consistency problem of the conditional function dependency set was detected to ensure that there is no conflict between conditional function dependency sets,and the target value of the equivalence class was modified to solve the violation of the conditional function dependency.Finally,combined with the running characteristics of different stages of MapReduce,the data of the violation of the constant conditional function dependencies and the variable conditional function dependencies were repaired on the map side and the reduce side respectively.The experimental results show that under the same error rate,the algorithm based on conditional function dependence has higher accuracy and better scalability than the traditional algorithm.
作者
于祥祥
钟勇
李振东
韩啸
YU Xiangxiang;ZHONG Yong;LI Zhendong;HAN Xiao(Chengdu Institute of Computer Application,Chinese Academy of Sciences,Chengdu Sichuan 610041,China;University of Chinese Academy of Sciences,Beijing 100049,China)
出处
《计算机应用》
CSCD
北大核心
2019年第S02期164-168,共5页
journal of Computer Applications
基金
四川省科技支撑计划项目(2014GZ0013)