To construct mediators for data integration systems that integrate structured and semi-structured data, and to facilitate the reformulation and decomposition of the query, the presented system uses the XML processing ...To construct mediators for data integration systems that integrate structured and semi-structured data, and to facilitate the reformulation and decomposition of the query, the presented system uses the XML processing language (XPL) for the mediator. With XPL, it is easy to construct mediators for data integration based on XML, and it can accelerate the work in the mediator.展开更多
In e-commerce the multidimensional data analysis for OLAP (on-line analytical processing) based on the web data needs integrating various data sources such as XML (extensible markup language) data and relational data ...In e-commerce the multidimensional data analysis for OLAP (on-line analytical processing) based on the web data needs integrating various data sources such as XML (extensible markup language) data and relational data on the conceptual level. A conceptual data description approach of multidimensional data model was presented in order to conduct multidimensional data analysis of OLAP for multiple subjects. The UML (unified modeling language) galaxy diagram, describing the multidimensional structure of the conceptual integrating data at the conceptual level, was constructed. The approach was illuminated using a case of 2__roots UML galaxy diagram that takes one retailer and several suppliers of PC products into consideration.展开更多
We propose a three-step technique to achieve this purpose. First, we utilize a collection of XML namespaces organized into hierarchical structure as a medium for expressing data semantics. Second, we define the format...We propose a three-step technique to achieve this purpose. First, we utilize a collection of XML namespaces organized into hierarchical structure as a medium for expressing data semantics. Second, we define the format of resource descriptor for the information source discovery scheme so that we can dynamically register and/or deregister the Web data sources on the fly. Third, we employ an inverted-index mechanism to identify the subset of information sources that are relevant to a particular user query. We describe the design, architecture, and implementation of our approach—IWDS, and illustrate its use through case examples. Key words integration - heterogeneity - Web data source - XML namespace CLC number TP 311.13 Foundation item: Supported by the National Key Technologies R&D Program of China(2002BA103A04)Biography: WU Wei (1975-), male, Ph.D candidate, research direction: information integration, distribute computing展开更多
Because of advances in data collection and storage,statistical analysis in modern scientific research and practice now has opportunities to utilize external information such as summary statistics from similar studies....Because of advances in data collection and storage,statistical analysis in modern scientific research and practice now has opportunities to utilize external information such as summary statistics from similar studies.A likelihood approach based on a parametric model assumption has been developed in the literature to utilize external summary information when the populations for external and main internal data are assumed to be the same.In this article,we instead consider the generalized estimation equation(GEE)approach for statistical inference,which is semiparametric or nonparametric,and show how to utilize external summary information even when internal and external data populations are not the same.Our approach is coupling the internal data and external summary information to form additional estimation equations and then applying the generalized method of moments(GMM).We show that the proposed GMM estimator is asymptotically normal and,under some conditions,is more efficient than the GEE estimator without using external summary information.Estimators of the asymptotic covariance matrix of the GMM estimators are also proposed.Simulation results are obtained to confirm our theory and quantify the improvements by utilizing external data.An example is also included for illustration.展开更多
Data analysis in modern scientific research and practice has shifted from analysing a single dataset to coupling several datasets.We propose and study a kernel regression method that can handle the challenge of hetero...Data analysis in modern scientific research and practice has shifted from analysing a single dataset to coupling several datasets.We propose and study a kernel regression method that can handle the challenge of heterogeneous populations.It greatly extends the constrained kernel regression[Dai,C.-S.,&Shao,J.(2023).Kernel regression utilizing external information as constraints.Statistica Sinica,33,in press]that requires a homogeneous population of different datasets.The asymptotic normality of proposed estimators is established under some conditions and simulation results are presented to confirm our theory and to quantify the improvements from datasets with heterogeneous populations.展开更多
Effective integration and wide sharing of geospatial data is an important and basic premise to facilitate the research and applications of geographic information science.However,the semantic heterogeneity of geospatia...Effective integration and wide sharing of geospatial data is an important and basic premise to facilitate the research and applications of geographic information science.However,the semantic heterogeneity of geospatial data is a major problem that significantly hinders geospatial data integration and sharing.Ontologies are regarded as a promising way to solve semantic problems by providing a formalized representation of geographic entities and relationships between them in a manner understandable to machines.Thus,many efforts have been made to explore ontology-based geospatial data integration and sharing.However,there is a lack of a specialized ontology that would provide a unified description for geospatial data.In this paper,with a focus on the characteristics of geospatial data,we propose a unified framework for geospatial data ontology,denoted GeoDataOnt,to establish a semantic foundation for geospatial data integration and sharing.First,we provide a characteristics hierarchy of geospatial data.Next,we analyze the semantic problems for each characteristic of geospatial data.Subsequently,we propose the general framework of GeoDataOnt,targeting these problems according to the characteristics of geospatial data.GeoDataOnt is then divided into multiple modules,and we show a detailed design and implementation for each module.Key limitations and challenges of GeoDataOnt are identified,and broad applications of GeoDataOnt are discussed.展开更多
Integration between file systems and multidatabase systems is a necessary approach to support data sharing from distributed and heterogeneous data sources. We first analyses problems about data integration between fil...Integration between file systems and multidatabase systems is a necessary approach to support data sharing from distributed and heterogeneous data sources. We first analyses problems about data integration between file systems and multidatabase systems. Then, A common data model named XIDM(XML\|based Integrating Dada Model), which is XML oriented, is presented. XIDM bases on a series of XML standards, especially XML Schema, and can well describe semistructured data. So XIDM is powerfully practicable and multipurpose.展开更多
A heterogeneous wireless sensor network comprises a number of inexpensive energy constrained wireless sensor nodes which collect data from the sensing environment and transmit them toward the improved cluster head in ...A heterogeneous wireless sensor network comprises a number of inexpensive energy constrained wireless sensor nodes which collect data from the sensing environment and transmit them toward the improved cluster head in a coordinated way. Employing clustering techniques in such networks can achieve balanced energy consumption of member nodes and prolong the network lifetimes.In classical clustering techniques, clustering and in-cluster data routes are usually separated into independent operations. Although separate considerations of these two issues simplify the system design, it is often the non-optimal lifetime expectancy for wireless sensor networks. This paper proposes an integral framework that integrates these two correlated items in an interactive entirety. For that,we develop the clustering problems using nonlinear programming. Evolution process of clustering is provided in simulations. Results show that our joint-design proposal reaches the near optimal match between member nodes and cluster heads.展开更多
Gastrointestinal cancers,including esophageal,gastric,colorectal,liver,gallbladder,cholangiocarcinoma,and pancreatic cancers,pose a significant global health challenge due to their high mortality rates and poor progno...Gastrointestinal cancers,including esophageal,gastric,colorectal,liver,gallbladder,cholangiocarcinoma,and pancreatic cancers,pose a significant global health challenge due to their high mortality rates and poor prognosis,particularly when diagnosed at advanced stages.These malignancies,characterized by diverse clinical presentations and etiologies,require innovative approaches for improved management.Bayesian networks(BN)have emerged as a powerful tool in this field,offering the ability to manage uncertainty,integrate heterogeneous data sources,and support clinical decision-making.This review explores the application of BN in addressing critical challenges in gastrointestinal cancers,including the identification of risk factors,early detection,treatment optimization,and prognosis prediction.By integrating genetic predispositions,lifestyle factors,and clinical data,BN hold the potential to enhance survival rates and improve quality of life through personalized treatment strategies.Despite their promise,the widespread adoption of BN is hindered by challenges such as data quality limitations,computational complexities,and the need for greater clinical acceptance.The review concludes with future research directions,emphasizing the development of advanced BN algorithms,the integration of multi-omics data,and strategies to ensure clinical applicability,aiming to fully realize the potential of BN in personalized medicine for gastrointestinal cancers.展开更多
Multiple efforts have been performed worldwide around diverse aspects of land administra-tion.However,land administration data and systems’notorious heterogeneity remains a longstanding challenge to develop a harmoni...Multiple efforts have been performed worldwide around diverse aspects of land administra-tion.However,land administration data and systems’notorious heterogeneity remains a longstanding challenge to develop a harmonized vision.In this sense,the traditional Spatial Data Infrastructures adoption is not enough to overcome this challenge since data sources’heterogeneity implies needs related to harmonization interoperability,sharing,and integration in land administration development.This paper proposes a graph-based represen-tation of knowledge for integrating multiple and heterogeneous data sources(tables,shape-files,geodatabases,and WFS services)belonging to two Colombian agencies within a decentralized land administration scenario.These knowledge graphs are developed on an ontology-based knowledge representation using national and international standards for land administration.Our approach aims to prevent data isolation,enable cross-datasets integration,accomplish machine-processable data,and facilitate the reuse and exploitation of multi-jurisdictional datasets in a single approach.A real case study demonstrates the applicability of the land administration data cycle deployed.展开更多
It is widely recognized that exchange, distribution, and integration of biological data are the keys to improve bioinformatics and genome biology in post-genomic era. However, the problem of exchanging and integrating...It is widely recognized that exchange, distribution, and integration of biological data are the keys to improve bioinformatics and genome biology in post-genomic era. However, the problem of exchanging and integrating biological data is not solved satisfactorily. The extensible Markup Language (XML) is rapidly spreading as an emerging standard for structuring documents to exchange and integrate data on the World Wide Web (WWW). Web service is the next generation of WWW and is founded upon the open standards of W3C (World Wide Web Consortium) and IETF (Internet Engineering Task Force). This paper presents XML and Web Services technologies and their use for an appropriate solution to the problem of bioinformatics data exchange and integration .展开更多
文摘To construct mediators for data integration systems that integrate structured and semi-structured data, and to facilitate the reformulation and decomposition of the query, the presented system uses the XML processing language (XPL) for the mediator. With XPL, it is easy to construct mediators for data integration based on XML, and it can accelerate the work in the mediator.
文摘In e-commerce the multidimensional data analysis for OLAP (on-line analytical processing) based on the web data needs integrating various data sources such as XML (extensible markup language) data and relational data on the conceptual level. A conceptual data description approach of multidimensional data model was presented in order to conduct multidimensional data analysis of OLAP for multiple subjects. The UML (unified modeling language) galaxy diagram, describing the multidimensional structure of the conceptual integrating data at the conceptual level, was constructed. The approach was illuminated using a case of 2__roots UML galaxy diagram that takes one retailer and several suppliers of PC products into consideration.
文摘We propose a three-step technique to achieve this purpose. First, we utilize a collection of XML namespaces organized into hierarchical structure as a medium for expressing data semantics. Second, we define the format of resource descriptor for the information source discovery scheme so that we can dynamically register and/or deregister the Web data sources on the fly. Third, we employ an inverted-index mechanism to identify the subset of information sources that are relevant to a particular user query. We describe the design, architecture, and implementation of our approach—IWDS, and illustrate its use through case examples. Key words integration - heterogeneity - Web data source - XML namespace CLC number TP 311.13 Foundation item: Supported by the National Key Technologies R&D Program of China(2002BA103A04)Biography: WU Wei (1975-), male, Ph.D candidate, research direction: information integration, distribute computing
基金supported by National Natural Science Foundation of China(Grant No.11831008)National Natural Science Foundation of China(Grant No.12271272)+1 种基金National Science Foundation of USA(Grant No.DMS-1914411)supported by the Fundamental Research Funds for the Central Universities。
文摘Because of advances in data collection and storage,statistical analysis in modern scientific research and practice now has opportunities to utilize external information such as summary statistics from similar studies.A likelihood approach based on a parametric model assumption has been developed in the literature to utilize external summary information when the populations for external and main internal data are assumed to be the same.In this article,we instead consider the generalized estimation equation(GEE)approach for statistical inference,which is semiparametric or nonparametric,and show how to utilize external summary information even when internal and external data populations are not the same.Our approach is coupling the internal data and external summary information to form additional estimation equations and then applying the generalized method of moments(GMM).We show that the proposed GMM estimator is asymptotically normal and,under some conditions,is more efficient than the GEE estimator without using external summary information.Estimators of the asymptotic covariance matrix of the GMM estimators are also proposed.Simulation results are obtained to confirm our theory and quantify the improvements by utilizing external data.An example is also included for illustration.
基金supported by the National Natural Science Foundation of China[Grant Number 11831008]the U.S.National Science Foundation[Grant Number DMS-1914411].
文摘Data analysis in modern scientific research and practice has shifted from analysing a single dataset to coupling several datasets.We propose and study a kernel regression method that can handle the challenge of heterogeneous populations.It greatly extends the constrained kernel regression[Dai,C.-S.,&Shao,J.(2023).Kernel regression utilizing external information as constraints.Statistica Sinica,33,in press]that requires a homogeneous population of different datasets.The asymptotic normality of proposed estimators is established under some conditions and simulation results are presented to confirm our theory and to quantify the improvements from datasets with heterogeneous populations.
基金This work was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences[grant number XDA23100100]National Natural Science Foundation of China[grant number 41771430],[grant number 41631177]China Scholarship Council[grant number 201804910732].
文摘Effective integration and wide sharing of geospatial data is an important and basic premise to facilitate the research and applications of geographic information science.However,the semantic heterogeneity of geospatial data is a major problem that significantly hinders geospatial data integration and sharing.Ontologies are regarded as a promising way to solve semantic problems by providing a formalized representation of geographic entities and relationships between them in a manner understandable to machines.Thus,many efforts have been made to explore ontology-based geospatial data integration and sharing.However,there is a lack of a specialized ontology that would provide a unified description for geospatial data.In this paper,with a focus on the characteristics of geospatial data,we propose a unified framework for geospatial data ontology,denoted GeoDataOnt,to establish a semantic foundation for geospatial data integration and sharing.First,we provide a characteristics hierarchy of geospatial data.Next,we analyze the semantic problems for each characteristic of geospatial data.Subsequently,we propose the general framework of GeoDataOnt,targeting these problems according to the characteristics of geospatial data.GeoDataOnt is then divided into multiple modules,and we show a detailed design and implementation for each module.Key limitations and challenges of GeoDataOnt are identified,and broad applications of GeoDataOnt are discussed.
基金Supported by the Beforehand Research for National Defense of China(94J3. 4. 2. JW0 5 15 )
文摘Integration between file systems and multidatabase systems is a necessary approach to support data sharing from distributed and heterogeneous data sources. We first analyses problems about data integration between file systems and multidatabase systems. Then, A common data model named XIDM(XML\|based Integrating Dada Model), which is XML oriented, is presented. XIDM bases on a series of XML standards, especially XML Schema, and can well describe semistructured data. So XIDM is powerfully practicable and multipurpose.
基金supported by National Natural Science Foundation of China(Nos.61304131 and 61402147)Grant of China Scholarship Council(No.201608130174)+2 种基金Natural Science Foundation of Hebei Province(Nos.F2016402054 and F2014402075)the Scientific Research Plan Projects of Hebei Education Department(Nos.BJ2014019,ZD2015087 and QN2015046)the Research Program of Talent Cultivation Project in Hebei Province(No.A2016002023)
文摘A heterogeneous wireless sensor network comprises a number of inexpensive energy constrained wireless sensor nodes which collect data from the sensing environment and transmit them toward the improved cluster head in a coordinated way. Employing clustering techniques in such networks can achieve balanced energy consumption of member nodes and prolong the network lifetimes.In classical clustering techniques, clustering and in-cluster data routes are usually separated into independent operations. Although separate considerations of these two issues simplify the system design, it is often the non-optimal lifetime expectancy for wireless sensor networks. This paper proposes an integral framework that integrates these two correlated items in an interactive entirety. For that,we develop the clustering problems using nonlinear programming. Evolution process of clustering is provided in simulations. Results show that our joint-design proposal reaches the near optimal match between member nodes and cluster heads.
基金Supported by Open Funds for Shaanxi Provincial Key Laboratory of Infection and Immune Diseases,No.2023-KFMS-1.
文摘Gastrointestinal cancers,including esophageal,gastric,colorectal,liver,gallbladder,cholangiocarcinoma,and pancreatic cancers,pose a significant global health challenge due to their high mortality rates and poor prognosis,particularly when diagnosed at advanced stages.These malignancies,characterized by diverse clinical presentations and etiologies,require innovative approaches for improved management.Bayesian networks(BN)have emerged as a powerful tool in this field,offering the ability to manage uncertainty,integrate heterogeneous data sources,and support clinical decision-making.This review explores the application of BN in addressing critical challenges in gastrointestinal cancers,including the identification of risk factors,early detection,treatment optimization,and prognosis prediction.By integrating genetic predispositions,lifestyle factors,and clinical data,BN hold the potential to enhance survival rates and improve quality of life through personalized treatment strategies.Despite their promise,the widespread adoption of BN is hindered by challenges such as data quality limitations,computational complexities,and the need for greater clinical acceptance.The review concludes with future research directions,emphasizing the development of advanced BN algorithms,the integration of multi-omics data,and strategies to ensure clinical applicability,aiming to fully realize the potential of BN in personalized medicine for gastrointestinal cancers.
基金supported by Colfuturo and Ministerio de Tecnologías de la Información y las Comunicaciones de Colombia,CYTED program-520RT0010[Red GeoLIBERO-Consolidación de una red de geomática libre aplicada a las necesidades de Iberoamérica],and SIP-IPN 20210677[Generación de grafos de conocimiento sobre eventos meteorológicos urbanos].
文摘Multiple efforts have been performed worldwide around diverse aspects of land administra-tion.However,land administration data and systems’notorious heterogeneity remains a longstanding challenge to develop a harmonized vision.In this sense,the traditional Spatial Data Infrastructures adoption is not enough to overcome this challenge since data sources’heterogeneity implies needs related to harmonization interoperability,sharing,and integration in land administration development.This paper proposes a graph-based represen-tation of knowledge for integrating multiple and heterogeneous data sources(tables,shape-files,geodatabases,and WFS services)belonging to two Colombian agencies within a decentralized land administration scenario.These knowledge graphs are developed on an ontology-based knowledge representation using national and international standards for land administration.Our approach aims to prevent data isolation,enable cross-datasets integration,accomplish machine-processable data,and facilitate the reuse and exploitation of multi-jurisdictional datasets in a single approach.A real case study demonstrates the applicability of the land administration data cycle deployed.
文摘It is widely recognized that exchange, distribution, and integration of biological data are the keys to improve bioinformatics and genome biology in post-genomic era. However, the problem of exchanging and integrating biological data is not solved satisfactorily. The extensible Markup Language (XML) is rapidly spreading as an emerging standard for structuring documents to exchange and integrate data on the World Wide Web (WWW). Web service is the next generation of WWW and is founded upon the open standards of W3C (World Wide Web Consortium) and IETF (Internet Engineering Task Force). This paper presents XML and Web Services technologies and their use for an appropriate solution to the problem of bioinformatics data exchange and integration .