We propose a three-step technique to achieve this purpose. First, we utilize a collection of XML namespaces organized into hierarchical structure as a medium for expressing data semantics. Second, we define the format...We propose a three-step technique to achieve this purpose. First, we utilize a collection of XML namespaces organized into hierarchical structure as a medium for expressing data semantics. Second, we define the format of resource descriptor for the information source discovery scheme so that we can dynamically register and/or deregister the Web data sources on the fly. Third, we employ an inverted-index mechanism to identify the subset of information sources that are relevant to a particular user query. We describe the design, architecture, and implementation of our approach—IWDS, and illustrate its use through case examples. Key words integration - heterogeneity - Web data source - XML namespace CLC number TP 311.13 Foundation item: Supported by the National Key Technologies R&D Program of China(2002BA103A04)Biography: WU Wei (1975-), male, Ph.D candidate, research direction: information integration, distribute computing展开更多
Data warehouse provides storage and management for mass data, but data schema evolves with time on. When data schema is changed, added or deleted, the data in data warehouse must comply with the changed data schema, ...Data warehouse provides storage and management for mass data, but data schema evolves with time on. When data schema is changed, added or deleted, the data in data warehouse must comply with the changed data schema, so data warehouse must be re organized or re constructed, but this process is exhausting and wasteful. In order to cope with these problems, this paper develops an approach to model data cube with XML, which emerges as a universal format for data exchange on the Web and which can make data warehouse flexible and scalable. This paper also extends OLAP algebra for XML based data cube, which is called X OLAP.展开更多
为了实现 Web 内部分布、异构数据之间的互操作和全局操作,必须对不同数据源进行集成。在分析了各集成模式的优缺点之后,提出了一种基于 XML 的虚拟化的 Web 数据集成方法。该方法采用 XML 作为集成数据的公共数据格式,通过在不同的数...为了实现 Web 内部分布、异构数据之间的互操作和全局操作,必须对不同数据源进行集成。在分析了各集成模式的优缺点之后,提出了一种基于 XML 的虚拟化的 Web 数据集成方法。该方法采用 XML 作为集成数据的公共数据格式,通过在不同的数据源和 XML 文档数据模型之间建立映射,实现了一种虚拟化的数据集成方法。这种数据集成方法简化了 Web 数据集成的实现。最后通过一个实例方案验证了方法的可行性和有效性。展开更多
The paper advances a system framework of Web data mining based on XML. This system framework inte-grates Information Retrieval with Information Extraction, and utilizes traditional data mining methods to completeWeb d...The paper advances a system framework of Web data mining based on XML. This system framework inte-grates Information Retrieval with Information Extraction, and utilizes traditional data mining methods to completeWeb data mining through XML.展开更多
文摘We propose a three-step technique to achieve this purpose. First, we utilize a collection of XML namespaces organized into hierarchical structure as a medium for expressing data semantics. Second, we define the format of resource descriptor for the information source discovery scheme so that we can dynamically register and/or deregister the Web data sources on the fly. Third, we employ an inverted-index mechanism to identify the subset of information sources that are relevant to a particular user query. We describe the design, architecture, and implementation of our approach—IWDS, and illustrate its use through case examples. Key words integration - heterogeneity - Web data source - XML namespace CLC number TP 311.13 Foundation item: Supported by the National Key Technologies R&D Program of China(2002BA103A04)Biography: WU Wei (1975-), male, Ph.D candidate, research direction: information integration, distribute computing
文摘Data warehouse provides storage and management for mass data, but data schema evolves with time on. When data schema is changed, added or deleted, the data in data warehouse must comply with the changed data schema, so data warehouse must be re organized or re constructed, but this process is exhausting and wasteful. In order to cope with these problems, this paper develops an approach to model data cube with XML, which emerges as a universal format for data exchange on the Web and which can make data warehouse flexible and scalable. This paper also extends OLAP algebra for XML based data cube, which is called X OLAP.
文摘为了实现 Web 内部分布、异构数据之间的互操作和全局操作,必须对不同数据源进行集成。在分析了各集成模式的优缺点之后,提出了一种基于 XML 的虚拟化的 Web 数据集成方法。该方法采用 XML 作为集成数据的公共数据格式,通过在不同的数据源和 XML 文档数据模型之间建立映射,实现了一种虚拟化的数据集成方法。这种数据集成方法简化了 Web 数据集成的实现。最后通过一个实例方案验证了方法的可行性和有效性。
文摘The paper advances a system framework of Web data mining based on XML. This system framework inte-grates Information Retrieval with Information Extraction, and utilizes traditional data mining methods to completeWeb data mining through XML.