摘要
在异构环境下,目前数据起源研究主要基于OPM模型来表示数据在ETL中的来源过程,存在着起源概念不统一、词汇使用混乱以及无法提供标准化访问等问题。基于W3C的PROV模型,提出了ETL起源信息的统一表达机制。该机制首先对ETL过程的起源概念及其关系进行了统一描述。然后,针对ETL过程特殊的语义表达需求,建立了多粒度的ETL起源词汇表。最终,建立在RDF之上的标准化查询机制提高了起源信息的可访问性。
In heterogeneous environment, data provenance information in ETL is represented on the basis of OPM. However, there is still a lack of consensus on conceptual representation of ETL provenance information, usage of provenance vocabulary and a consolidated ac- cess mode. A unified provenance representation mechanism, which was based on PROV, was proposed for ETL. Firstly, it presented a concept representation mechanism for ETL, which demonstrated primary provenance concepts and their relationships. Secondly, it con- structed a multi-granularity vocabulary to fulfill the requirement of expressing provenance information on different abstraction levels. Fi- nally, a standard access mode was proposed in which provenance information was organized into two levels, the bottom one was described with RDF, and the above level was formed based on query of the former.
出处
《四川大学学报(工程科学版)》
EI
CAS
CSCD
北大核心
2015年第5期123-129,共7页
Journal of Sichuan University (Engineering Science Edition)
基金
国家自然科学基金面上项目资助(61170306)