摘要
针对现有关系数据库中分布式大数据集成冲突消解研究的不足,提出一种新的集成冲突消解算法。依据关系数据库中分布式大数据的集成过程对冲突进行分类,将其划分成语义冲突、模式冲突以及实例冲突。针对语义冲突,通过句法融合、逻辑树融合和频率融合法实现冲突消解。通过属性有向图对关系数据库中模式数据和实例数据的属性进行描述。从属性关系参与分布式大数据集成冲突的状态分析,通过关系的权重值对属性关系的重要程度进行量化处理。通过有向图全部关系的权重和对所有属性有向图的重要程度进行描述。综合分析冲突数与权重定义代价函数,在此基础上给出关系数据库分布式大数据集成冲突消解详细过程。实验结果表明,所提算法冲突识别和消解性能高。
A new integrated conflict resolution algorithm is proposed for the research on the conflict resolution of distributed large data integration in existing relational databases. According to the integration process of distributed large data in relational database,conflicts are classified and classified into semantic conflict,model conflict and instance conflict. Aiming at semantic conflict,conflict resolution is realized by syntax fusion,logical tree fusion and frequency fusion. The properties of schema data and instance data in relational databases are described by attribute directed graphs. From the relational analysis of attribute relation to the conflict of distributed large data integration,the importance of attribute relation is quantified by the weight value of relation. The weight of all relations and the importance of the directed graph for all attributes are described by means of directed graphs. Based on the analysis of the conflict number and the weight,the cost function is defined,and the detailed process of distributed data integration and conflict resolution in relational database is given. Experimental results show that the proposed algorithm has high performance in conflict recognition and resolution.
出处
《科学技术与工程》
北大核心
2018年第3期63-67,共5页
Science Technology and Engineering
关键词
关系数据库
分布式
大数据
集成
冲突消解
relational database distributed big data integration conflict resolution