摘要
本体是概念模型的明确的规范说明,能够精确地描述概念体系和领域知识。为了将异构数据源中的数据识别出来并进行语义相关的集成,提出了一种基于本体集成异构数据源的方法。首先将各个数据源中的数据以XML文档形式进行描述,然后将各个XML文档的文档类型定义(DTD)转化为DIM数据模型表示,最后通过语义聚类、全局模式生成等步骤,实现XML文档的基于本体的语义集成。文中提出的方法以普林斯顿大学的心理学家、语言学家和计算机工程师联合设计的一种基于认知语言学的英语词典为本体库,可有效地识别出异构数据源中的具有等价语义或相近语义的数据,从而更准确地对异构数据源中的数据进行集成。
An ontology is an explicit specification of a conceptualization, which could represent the conceptualization and domain knowledge more clearly. In this paper an approach of ontology - based integration of heterogeneous data source is proposed to recognize and integrate the data in the heterogeneous data source. Firstly, every data source is described as XML documents, and then each document type definition (DTD) of the XML documents is converted into a data model called DIM, finally the integrated XML document could be got through several steps, such as semantic clustering, global schema generate and so on. The method proposed in this paper is based on the electric English dictionary designed by the psychologists, linguists, computer engineers of Princeton University, which could recognize the data that contains the same or similar semantic and integrate the heterogeneous data source more accurate .
出处
《计算机技术与发展》
2008年第2期34-37,共4页
Computer Technology and Development