Accurately evaluating the lifespan of the Printed Circuit Board(PCB)in airborne equipment is an essential issue for aircraft design and operation in the marine atmospheric environment.This paper presents a novel evalu...Accurately evaluating the lifespan of the Printed Circuit Board(PCB)in airborne equipment is an essential issue for aircraft design and operation in the marine atmospheric environment.This paper presents a novel evaluation method by fusing Accelerated Degradation Testing(ADT)data,degradation data,and life data of small samples based on the uncertainty degradation process.An uncertain life model of PCB in airborne equipment is constructed by employing the uncertain distribution that considers the accelerated factor of multiple environmental conditions such as temperature,humidity,and salinity.In addition,a degradation process model of PCB in airborne equipment is constructed by employing the uncertain process of fusing ADT data and field data,in which the performance characteristics of dynamic cumulative change are included.Based on minimizing the pth sample moments,an integrated method for parameter estimation of the PCB in airborne equipment is proposed by fusing the multi-source data of life,degradation,and ADT.An engineering case illustrates the effectiveness and advantage of the proposed method.展开更多
To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,al...To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.展开更多
With the rapid development of Web, there are more and more Web databases available for users to access. At the same time, job searchers often have difficulties in first finding the right sources and then querying over...With the rapid development of Web, there are more and more Web databases available for users to access. At the same time, job searchers often have difficulties in first finding the right sources and then querying over them, providing such an integrated job search system over Web databases has become a Web application in high demand. Based on such consideration, we build a deep Web data integration system that supports unified access for users to multiple job Web sites as a job meta-search engine. In this paper, the architecture of the system is given first, and the key components in the system are introduced.展开更多
In this paper we propose a service-oriented architecture for spatial data integration (SOA-SDI) in the context of a large number of available spatial data sources that are physically sitting at different places, and d...In this paper we propose a service-oriented architecture for spatial data integration (SOA-SDI) in the context of a large number of available spatial data sources that are physically sitting at different places, and develop web-based GIS systems based on SOA-SDI, allowing client applications to pull in, analyze and present spatial data from those available spatial data sources. The proposed architecture logically includes 4 layers or components; they are layer of multiple data provider services, layer of data in-tegration, layer of backend services, and front-end graphical user interface (GUI) for spatial data presentation. On the basis of the 4-layered SOA-SDI framework, WebGIS applications can be quickly deployed, which proves that SOA-SDI has the potential to reduce the input of software development and shorten the development period.展开更多
Currently,ocean data portals are being developed around the world based on Geographic Information Systems(GIS) as a source of ocean data and information.However,given the relatively high temporal frequency and the int...Currently,ocean data portals are being developed around the world based on Geographic Information Systems(GIS) as a source of ocean data and information.However,given the relatively high temporal frequency and the intrinsic spatial nature of ocean data and information,no current GIS software is adequate to deal effectively and efficiently with spatiotemporal data.Furthermore,while existing ocean data portals are generally designed to meet the basic needs of a broad range of users,they are sometimes very complicated for general audiences,especially for those without training in GIS.In this paper,a new technical architecture for an ocean data integration and service system is put forward that consists of four layers:the operation layer,the extract,transform,and load(ETL) layer,the data warehouse layer,and the presentation layer.The integration technology based on the XML,ontology,and spatiotemporal data organization scheme for the data warehouse layer is then discussed.In addition,the ocean observing data service technology realized in the presentation layer is also discussed in detail,including the development of the web portal and ocean data sharing platform.The application on the Taiwan Strait shows that the technology studied in this paper can facilitate sharing,access,and use of ocean observation data.The paper is based on an ongoing research project for the development of an ocean observing information system for the Taiwan Strait that will facilitate the prevention of ocean disasters.展开更多
In e-commerce the multidimensional data analysis for OLAP (on-line analytical processing) based on the web data needs integrating various data sources such as XML (extensible markup language) data and relational data ...In e-commerce the multidimensional data analysis for OLAP (on-line analytical processing) based on the web data needs integrating various data sources such as XML (extensible markup language) data and relational data on the conceptual level. A conceptual data description approach of multidimensional data model was presented in order to conduct multidimensional data analysis of OLAP for multiple subjects. The UML (unified modeling language) galaxy diagram, describing the multidimensional structure of the conceptual integrating data at the conceptual level, was constructed. The approach was illuminated using a case of 2__roots UML galaxy diagram that takes one retailer and several suppliers of PC products into consideration.展开更多
In e-commerce the multidimensional data analysis based on the Web data needs integrating various data sources such as XML data and relational data on the conceptual level. A conceptual data description approach to mul...In e-commerce the multidimensional data analysis based on the Web data needs integrating various data sources such as XML data and relational data on the conceptual level. A conceptual data description approach to multidimensional data model the UML galaxy diagram is presented in order to conduct multidimensional data analysis for multiple subjects. The approach is illuminated using a case of 2_roots UML galaxy diagram that takes marketing analysis of TV products involved one retailer and several suppliers into consideration.展开更多
Guyana’s capacity to address the impacts of climate change on its coastal environment requires the ability to mon-itor,quantify and understand coastal change over short-,medium-and long-term.Understanding the drivers...Guyana’s capacity to address the impacts of climate change on its coastal environment requires the ability to mon-itor,quantify and understand coastal change over short-,medium-and long-term.Understanding the drivers of change in coastal and marine environment can be achieved through the accurate measurement and critical anal-yses of morphologies,flows,processes and responses.This manuscript presents a strategy developed to create a central resource,database and web-based platform to integrate data and information on the drivers and the changes within Guyana coastal and marine environment.The strategy involves four complimentary work pack-ages including data collection,development of a platform for data integration,application of the data for coastal change analyses and consultation with stakeholders.The last aims to assess the role of the integrated data sys-tems to support strategic governance and sustainable decision-making.It is hoped that the output of this strategy would support the country’s climate-focused agencies,organisations,decision-makers,and researchers in their tasks and endeavours.展开更多
At present, with the sustainable development of society, the value of forestry resources has gradually attracted peoples attention. The unified registration and management of forest property rights can make its owners...At present, with the sustainable development of society, the value of forestry resources has gradually attracted peoples attention. The unified registration and management of forest property rights can make its ownership clearer, and the enthusiasm of employees can be fully stimulated. Taking unified registration of real estate as the starting point, this paper first introduces the background of registration of real estate with forest property rights, then analyzes the advantages and disadvantages of registration methods, and points out that the key to orderly carry out all work is to adopt the combination of actual measurement and illustration. Finally, it discusses how to integrate the data obtained from actual measurement and illustration, and summarizes the process of data integration and matters needing attention based on the accumulated experience in practice. It is hoped that it can help relevant personnel and provide theoretical basis for future work such as forest right confirmation and registration.展开更多
Plant morphogenesis relies on precise gene expression programs at the proper time and position which is orchestrated by transcription factors(TFs)in intricate regulatory networks in a cell-type specific manner.Here we...Plant morphogenesis relies on precise gene expression programs at the proper time and position which is orchestrated by transcription factors(TFs)in intricate regulatory networks in a cell-type specific manner.Here we introduced a comprehensive single-cell transcriptomic atlas of Arabidopsis seedlings.This atlas is the result of meticulous integration of 63 previously published scRNA-seq datasets,addressing batch effects and conserving biological variance.This integration spans a broad spectrum of tissues,including both below-and above-ground parts.Utilizing a rigorous approach for cell type annotation,we identified 47 distinct cell types or states,largely expanding our current view of plant cell compositions.We systematically constructed cell-type specific gene regulatory networks and uncovered key regulators that act in a coordinated manner to control cell-type specific gene expression.Taken together,our study not only offers extensive plant cell atlas exploration that serves as a valuable resource,but also provides molecular insights into gene-regulatory programs that varies from different cell types.展开更多
The study of plant diversity is often hindered by the challenge of integrating data from different sources and different data types.A standardized data system would facilitate detailed exploration of plant distributio...The study of plant diversity is often hindered by the challenge of integrating data from different sources and different data types.A standardized data system would facilitate detailed exploration of plant distribution patterns and dynamics for botanists,ecologists,conservation biologists,and biogeographers.This study proposes a gridded vector data integration method,combining grid-based techniques with vectorization to integrate diverse data types from multiple sources into grids of the same scale.Here we demonstrate the methodology by creating a comprehensive 1°×1°database of western China that includes plant distribution information and environmental factor data.This approach addresses the need for a standardized data system to facilitate exploration of plant distribution patterns and dynamic changes in the region.展开更多
Background Medical informatics accumulated vast amounts of data for clinical diagnosis and treatment.However,limited access to follow-up data and the difficulty in integrating data across diverse platforms continue to...Background Medical informatics accumulated vast amounts of data for clinical diagnosis and treatment.However,limited access to follow-up data and the difficulty in integrating data across diverse platforms continue to pose significant barriers to clinical research progress.In response,our research team has embarked on the development of a specialized clinical research database for cardiology,thereby establishing a comprehensive digital platform that facilitates both clinical decision-making and research endeavors.Methods The database incorporated actual clinical data from patients who received treatment at the Cardiovascular Medicine Department of Chinese PLA General Hospital from 2012 to 2021.It included comprehensive data on patients'basic information,medical history,non-invasive imaging studies,laboratory test results,as well as peri-procedural information related to interventional surgeries,extracted from the Hospital Information System.Additionally,an innovative artificial intelligence(AI)-powered interactive follow-up system had been developed,ensuring that nearly all myocardial infarction patients received at least one post-discharge follow-up,thereby achieving comprehensive data management throughout the entire care continuum for highrisk patients.Results This database integrates extensive cross-sectional and longitudinal patient data,with a focus on higher-risk acute coronary syndrome patients.It achieves the integration of structured and unstructured clinical data,while innovatively incorporating AI and automatic speech recognition technologies to enhance data integration and workflow efficiency.It creates a comprehensive patient view,thereby improving diagnostic and follow-up quality,and provides high-quality data to support clinical research.Despite limitations in unstructured data standardization and biological sample integrity,the database's development is accompanied by ongoing optimization efforts.Conclusion The cardiovascular specialty clinical database is a comprehensive digital archive integrating clinical treatment and research,which facilitates the digital and intelligent transformation of clinical diagnosis and treatment processes.It supports clinical decision-making and offers data support and potential research directions for the specialized management of cardiovascular diseases.展开更多
Effective integration and wide sharing of geospatial data is an important and basic premise to facilitate the research and applications of geographic information science.However,the semantic heterogeneity of geospatia...Effective integration and wide sharing of geospatial data is an important and basic premise to facilitate the research and applications of geographic information science.However,the semantic heterogeneity of geospatial data is a major problem that significantly hinders geospatial data integration and sharing.Ontologies are regarded as a promising way to solve semantic problems by providing a formalized representation of geographic entities and relationships between them in a manner understandable to machines.Thus,many efforts have been made to explore ontology-based geospatial data integration and sharing.However,there is a lack of a specialized ontology that would provide a unified description for geospatial data.In this paper,with a focus on the characteristics of geospatial data,we propose a unified framework for geospatial data ontology,denoted GeoDataOnt,to establish a semantic foundation for geospatial data integration and sharing.First,we provide a characteristics hierarchy of geospatial data.Next,we analyze the semantic problems for each characteristic of geospatial data.Subsequently,we propose the general framework of GeoDataOnt,targeting these problems according to the characteristics of geospatial data.GeoDataOnt is then divided into multiple modules,and we show a detailed design and implementation for each module.Key limitations and challenges of GeoDataOnt are identified,and broad applications of GeoDataOnt are discussed.展开更多
New challenges including how to share information on heterogeneous devices appear in data-intensive pervasive computing environments. Data integration is a practical approach to these applications. Dealing with incons...New challenges including how to share information on heterogeneous devices appear in data-intensive pervasive computing environments. Data integration is a practical approach to these applications. Dealing with inconsistencies is one of the important problems in data integration. In this paper we motivate the problem of data inconsistency solution for data integration in pervasive environments. We define data qualit~ criteria and expense quality criteria for data sources to solve data inconsistency. In our solution, firstly, data sources needing high expense to obtain data from them are discarded by using expense quality criteria and utility function. Since it is difficult to obtain the actual quality of data sources in pervasive computing environment, we introduce fuzzy multi-attribute group decision making approach to selecting the appropriate data sources. The experimental results show that our solution has ideal effectiveness.展开更多
Background: More and more high-throughput datasets are available from multiple levels of measuring gene regulations. The reverse engineering of gene regulatory networks from these data offers a valuable research para...Background: More and more high-throughput datasets are available from multiple levels of measuring gene regulations. The reverse engineering of gene regulatory networks from these data offers a valuable research paradigm to decipher regulatory mechanisms. So far, numerous methods have been developed for reconstructing gene regulatory networks. Results: In this paper, we provide a review of bioinformatics methods for inferring gene regulatory network from omics data. To achieve the precision reconstruction of gene regulatory networks, an intuitive alternative is to integrate these available resources in a rational framework. We also provide computational perspectives in the endeavors of inferring gene regulatory networks from heterogeneous data. We highlight the importance of multi-omics data integration with prior knowledge in gene regulatory network inferences. Conclusions: We provide computational perspectives of inferring gene regulatory networks from multiple omics data and present theoretical analyses of existing challenges and possible solutions. We emphasize on prior knowledge and data integration in network inferences owing to their abilities of identifying regulatory causality.展开更多
Land cover is recognized as one of the fundamental terrestrial datasets required in land system change and other ecosystem related researches across the globe. The regional differentiation and spatial-temporal variati...Land cover is recognized as one of the fundamental terrestrial datasets required in land system change and other ecosystem related researches across the globe. The regional differentiation and spatial-temporal variation of land cover has significant impact on regional natural environment and socio-economic sustainable development. Under this context, we reconstructed the history land cover data in Siberia to provide a comparable datasets to the land cover datasets in China and abroad. In this paper, the European Space Agency(ESA) Global Land Cover Map(GlobCover), Landsat Thematic Mapper(TM), Enhanced Thematic Mapper(ETM), Multispectral Scanner(MSS) images, Google Earth images and other additional data were used to produce the land cover datasets in 1975 and 2010 in Siberia. Data evaluation show that the total user′s accuracy of land cover data in 2010 was 86.96%, which was higher than ESA GlobCover data in Siberia. The analysis on the land cover changes found that there were no big land cover changes in Siberia from 1975 to 2010 with only a few conversions between different natural forest types. The mainly changes are the conversion from deciduous needleleaf forest to deciduous broadleaf forest, deciduous needleleaf forest to mixed forest, savannas to deciduous needleleaf forest etc., indicating that the dominant driving factor of land cover changes in Siberia was natural element rather than human activities at some extent, which was very different from China. However, our purpose was not just to produce the land cover datasets at two time period or explore the driving factors of land cover changes in Siberia, we also paid attention on the significance and application of the datasets in various fields such as global climate change, geopolitics, cross-border cooperation and so on.展开更多
To construct mediators for data integration systems that integrate structured and semi-structured data, and to facilitate the reformulation and decomposition of the query, the presented system uses the XML processing ...To construct mediators for data integration systems that integrate structured and semi-structured data, and to facilitate the reformulation and decomposition of the query, the presented system uses the XML processing language (XPL) for the mediator. With XPL, it is easy to construct mediators for data integration based on XML, and it can accelerate the work in the mediator.展开更多
Geophysical techniques can help to bridge the inherent gap that exists with regard to spatial resolution and coverage for classical hydrological methods. This has led to the emergence of a new and rapidly growing rese...Geophysical techniques can help to bridge the inherent gap that exists with regard to spatial resolution and coverage for classical hydrological methods. This has led to the emergence of a new and rapidly growing research domain generally referred to as hydrogeophysics. Given the differing sensitivities of various geophysical techniques to hydrologically relevant parameters, their inherent trade-off between resolution and range, as well as the notoriously site-specific nature of petrophysical parameter relations, the fundamental usefulness of multi-method surveys for reducing uncertainties in data analysis and interpretation is widely accepted. A major challenge arising from such endeavors is the quantitative integration of the resulting vast and diverse database into a unified model of the probed subsurface region that is consistent with all available measurements. To this end, we present a novel approach toward hydrogeophysical data integration based on a Monte-Carlo-type conditional stochastic simulation method that we consider to be particularly suitable for high-resolution local-scale studies. Monte Carlo techniques are flexible and versatile, allowing for accounting for a wide variety of data and constraints of differing resolution and hardness, and thus have the potential of providing, in a geostatistical sense, realistic models of the pertinent target parameter distributions. Compared to more conventional approaches, such as co-kriging or cluster analysis, our approach provides significant ad- vancements in the way that larger-scale structural information eontained in the hydrogeophysieal data can be accounted for. After outlining the methodological background of our algorithm, we present the results of its application to the integration of porosity log and tomographic crosshole georadar data to generate stochastic realizations of the detailed local-scale porosity structure. Our procedure is first tested on pertinent synthetic data and then applied to a field dataset collected at the Boise Hydrogeophysical Research Site. Finally, we compare the performance of our data integration approach to that of more conventional methods with regard to the prediction of flow and transport phenomena in highly heterogeneous media and discuss the implications arising.展开更多
Genome-wide association mapping studies(GWAS)based on Big Data are a potential approach to improve marker-assisted selection in plant breeding.The number of available phenotypic and genomic data sets in which medium-s...Genome-wide association mapping studies(GWAS)based on Big Data are a potential approach to improve marker-assisted selection in plant breeding.The number of available phenotypic and genomic data sets in which medium-sized populations of several hundred individuals have been studied is rapidly increasing.Combining these data and using them in GWAS could increase both the power of QTL discovery and the accuracy of estimation of underlying genetic effects,but is hindered by data heterogeneity and lack of interoperability.In this study,we used genomic and phenotypic data sets,focusing on Central European winter wheat populations evaluated for heading date.We explored strategies for integrating these data and subsequently the resulting potential for GWAS.Establishing interoperability between data sets was greatly aided by some overlapping genotypes and a linear relationship between the different phenotyping protocols,resulting in high quality integrated phenotypic data.In this context,genomic prediction proved to be a suitable tool to study relevance of interactions between genotypes and experimental series,which was low in our case.Contrary to expectations,fewer associations between markers and traits were found in the larger combined data than in the individual experimental series.However,the predictive power based on the marker-trait associations of the integrated data set was higher across data sets.Therefore,the results show that the integration of medium-sized to Big Data is an approach to increase the power to detect QTL in GWAS.The results encourage further efforts to standardize and share data in the plant breeding community.展开更多
Background Various blood metabolites are known to be useful indicators of health status in dairy cattle,but their routine assessment is time-consuming,expensive,and stressful for the cows at the herd level.Thus,we eva...Background Various blood metabolites are known to be useful indicators of health status in dairy cattle,but their routine assessment is time-consuming,expensive,and stressful for the cows at the herd level.Thus,we evaluated the effectiveness of combining in-line near infrared(NIR)milk spectra with on-farm(days in milk[DIM]and parity)and genetic markers for predicting blood metabolites in Holstein cattle.Data were obtained from 388 Holstein cows from a farm with an AfiLab system.NIR spectra,on-farm information,and single nucleotide polymorphisms(SNP)markers were blended to develop calibration equations for blood metabolites using the elastic net(ENet)approach,considering 3 mod els:(1)Model 1(M1)including only NIR information,(2)Model 2(M2)with both NIR and on-farm information,and(3)Model 3(M3)combining NIR,on-farm and genomic information.Dimension reduction was considered for M3 by preselecting SNP markers from genome-wide association study(GWAS)results.Results Results indicate that M2 improved the predictive ability by an average of 19%for energy-related metabolites(glucose,cholesterol,NEFA,B H B,urea,and c reatinin e),20%for liver functio n/hepatic damage,7%for inflammation/innate immunity.24%for oxidative stress metabolites,and 23%for minerals compared to M1,Meanwhile,M3 further enhanced the predictive ability by 34%for energy-related metabolites,32%for liver function/hepatic damage,22%for inflammation/innate immunity,42.1%for oxidative stress metabolites,and 41%for mineralse compared to M1.We found improved predictive ability of M3 using selected SNP markers from GWAS results using a threshold of>2.0by 5%for energy-related metabolites,9%for liver function/hepatic damage,8%for inflammation/innate immunity,22%for oxidative stress metabolites,and 9%for minerals.Slight redu ctions were observed fo r phosphorus(2%),ferricreducing antioxidant power(1%),and glucose(3%).Furthermore,it was found that prediction accuracies are influenced by using more restrictive thresholds(-log_(10)^(P-value)>2.5 and 3.0),with a lower increase in the predictive ability.Conclusion Our results highlighted the potential of combining several sources of information,such as genetic markers,on-farm information,and in-line NIR infrared data improves the predictive ability of blood metabolites in dairy cattle,representing an effective strategy for large-scale in-line health monitoring in commercial herds.展开更多
基金supported by the National Natural Science Foundation of China(No.62073009).
文摘Accurately evaluating the lifespan of the Printed Circuit Board(PCB)in airborne equipment is an essential issue for aircraft design and operation in the marine atmospheric environment.This paper presents a novel evaluation method by fusing Accelerated Degradation Testing(ADT)data,degradation data,and life data of small samples based on the uncertainty degradation process.An uncertain life model of PCB in airborne equipment is constructed by employing the uncertain distribution that considers the accelerated factor of multiple environmental conditions such as temperature,humidity,and salinity.In addition,a degradation process model of PCB in airborne equipment is constructed by employing the uncertain process of fusing ADT data and field data,in which the performance characteristics of dynamic cumulative change are included.Based on minimizing the pth sample moments,an integrated method for parameter estimation of the PCB in airborne equipment is proposed by fusing the multi-source data of life,degradation,and ADT.An engineering case illustrates the effectiveness and advantage of the proposed method.
基金Weaponry Equipment Pre-Research Foundation of PLA Equipment Ministry (No. 9140A06050409JB8102)Pre-Research Foundation of PLA University of Science and Technology (No. 2009JSJ11)
文摘To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.
基金Supportted by the Natural Science Foundation ofChina (60573091 ,60273018) National Basic Research and Develop-ment Programof China (2003CB317000) the Key Project of Minis-try of Education of China (03044) .
文摘With the rapid development of Web, there are more and more Web databases available for users to access. At the same time, job searchers often have difficulties in first finding the right sources and then querying over them, providing such an integrated job search system over Web databases has become a Web application in high demand. Based on such consideration, we build a deep Web data integration system that supports unified access for users to multiple job Web sites as a job meta-search engine. In this paper, the architecture of the system is given first, and the key components in the system are introduced.
基金Supported by the Research Fund of Key GIS Lab of the Education Ministry (No. 200610)
文摘In this paper we propose a service-oriented architecture for spatial data integration (SOA-SDI) in the context of a large number of available spatial data sources that are physically sitting at different places, and develop web-based GIS systems based on SOA-SDI, allowing client applications to pull in, analyze and present spatial data from those available spatial data sources. The proposed architecture logically includes 4 layers or components; they are layer of multiple data provider services, layer of data in-tegration, layer of backend services, and front-end graphical user interface (GUI) for spatial data presentation. On the basis of the 4-layered SOA-SDI framework, WebGIS applications can be quickly deployed, which proves that SOA-SDI has the potential to reduce the input of software development and shorten the development period.
基金Supported by National High Technology Research and Development Program of China (863 Program) (Nos. 2009AA12Z225,2009AA12Z208)the National Natural Science Foundation of China (No. 61074132)
文摘Currently,ocean data portals are being developed around the world based on Geographic Information Systems(GIS) as a source of ocean data and information.However,given the relatively high temporal frequency and the intrinsic spatial nature of ocean data and information,no current GIS software is adequate to deal effectively and efficiently with spatiotemporal data.Furthermore,while existing ocean data portals are generally designed to meet the basic needs of a broad range of users,they are sometimes very complicated for general audiences,especially for those without training in GIS.In this paper,a new technical architecture for an ocean data integration and service system is put forward that consists of four layers:the operation layer,the extract,transform,and load(ETL) layer,the data warehouse layer,and the presentation layer.The integration technology based on the XML,ontology,and spatiotemporal data organization scheme for the data warehouse layer is then discussed.In addition,the ocean observing data service technology realized in the presentation layer is also discussed in detail,including the development of the web portal and ocean data sharing platform.The application on the Taiwan Strait shows that the technology studied in this paper can facilitate sharing,access,and use of ocean observation data.The paper is based on an ongoing research project for the development of an ocean observing information system for the Taiwan Strait that will facilitate the prevention of ocean disasters.
文摘In e-commerce the multidimensional data analysis for OLAP (on-line analytical processing) based on the web data needs integrating various data sources such as XML (extensible markup language) data and relational data on the conceptual level. A conceptual data description approach of multidimensional data model was presented in order to conduct multidimensional data analysis of OLAP for multiple subjects. The UML (unified modeling language) galaxy diagram, describing the multidimensional structure of the conceptual integrating data at the conceptual level, was constructed. The approach was illuminated using a case of 2__roots UML galaxy diagram that takes one retailer and several suppliers of PC products into consideration.
基金This project was supported by China Postdoctoral Science Foundation (2005037506) and the National Natural ScienceFoundation of China (70472029)
文摘In e-commerce the multidimensional data analysis based on the Web data needs integrating various data sources such as XML data and relational data on the conceptual level. A conceptual data description approach to multidimensional data model the UML galaxy diagram is presented in order to conduct multidimensional data analysis for multiple subjects. The approach is illuminated using a case of 2_roots UML galaxy diagram that takes marketing analysis of TV products involved one retailer and several suppliers into consideration.
基金We appreciate United Nations Development Programme-Indonesia and Archipelagic&Island States(AIS)Forum for the 2021 Archipelagic&Island States Innovation Challenges Award given for this idea on Joint Research Programme in Climate Change Mitigation and Adaptation.
文摘Guyana’s capacity to address the impacts of climate change on its coastal environment requires the ability to mon-itor,quantify and understand coastal change over short-,medium-and long-term.Understanding the drivers of change in coastal and marine environment can be achieved through the accurate measurement and critical anal-yses of morphologies,flows,processes and responses.This manuscript presents a strategy developed to create a central resource,database and web-based platform to integrate data and information on the drivers and the changes within Guyana coastal and marine environment.The strategy involves four complimentary work pack-ages including data collection,development of a platform for data integration,application of the data for coastal change analyses and consultation with stakeholders.The last aims to assess the role of the integrated data sys-tems to support strategic governance and sustainable decision-making.It is hoped that the output of this strategy would support the country’s climate-focused agencies,organisations,decision-makers,and researchers in their tasks and endeavours.
文摘At present, with the sustainable development of society, the value of forestry resources has gradually attracted peoples attention. The unified registration and management of forest property rights can make its ownership clearer, and the enthusiasm of employees can be fully stimulated. Taking unified registration of real estate as the starting point, this paper first introduces the background of registration of real estate with forest property rights, then analyzes the advantages and disadvantages of registration methods, and points out that the key to orderly carry out all work is to adopt the combination of actual measurement and illustration. Finally, it discusses how to integrate the data obtained from actual measurement and illustration, and summarizes the process of data integration and matters needing attention based on the accumulated experience in practice. It is hoped that it can help relevant personnel and provide theoretical basis for future work such as forest right confirmation and registration.
基金supported by the National Natural Science Foundation of China (No.32070656)the Nanjing University Deng Feng Scholars Program+1 种基金the Priority Academic Program Development (PAPD) of Jiangsu Higher Education Institutions,China Postdoctoral Science Foundation funded project (No.2022M711563)Jiangsu Funding Program for Excellent Postdoctoral Talent (No.2022ZB50)
文摘Plant morphogenesis relies on precise gene expression programs at the proper time and position which is orchestrated by transcription factors(TFs)in intricate regulatory networks in a cell-type specific manner.Here we introduced a comprehensive single-cell transcriptomic atlas of Arabidopsis seedlings.This atlas is the result of meticulous integration of 63 previously published scRNA-seq datasets,addressing batch effects and conserving biological variance.This integration spans a broad spectrum of tissues,including both below-and above-ground parts.Utilizing a rigorous approach for cell type annotation,we identified 47 distinct cell types or states,largely expanding our current view of plant cell compositions.We systematically constructed cell-type specific gene regulatory networks and uncovered key regulators that act in a coordinated manner to control cell-type specific gene expression.Taken together,our study not only offers extensive plant cell atlas exploration that serves as a valuable resource,but also provides molecular insights into gene-regulatory programs that varies from different cell types.
基金supported by the Second Tibetan Plateau Scientific Expedition and Research(STEP)program(2019QZKK0502)the National Natural Science Foundation of China(32322006)+1 种基金the Major Program for Basic Research Project of Yunnan Province(202103AF140005 and 202101BC070002)the Practice Innovation Fund for Professional Degree Graduates of Yunnan University(ZC-22222401).
文摘The study of plant diversity is often hindered by the challenge of integrating data from different sources and different data types.A standardized data system would facilitate detailed exploration of plant distribution patterns and dynamics for botanists,ecologists,conservation biologists,and biogeographers.This study proposes a gridded vector data integration method,combining grid-based techniques with vectorization to integrate diverse data types from multiple sources into grids of the same scale.Here we demonstrate the methodology by creating a comprehensive 1°×1°database of western China that includes plant distribution information and environmental factor data.This approach addresses the need for a standardized data system to facilitate exploration of plant distribution patterns and dynamic changes in the region.
基金Noncommunicable Chronic Diseases-National Science and Technology Major Project(2023ZD0503906)。
文摘Background Medical informatics accumulated vast amounts of data for clinical diagnosis and treatment.However,limited access to follow-up data and the difficulty in integrating data across diverse platforms continue to pose significant barriers to clinical research progress.In response,our research team has embarked on the development of a specialized clinical research database for cardiology,thereby establishing a comprehensive digital platform that facilitates both clinical decision-making and research endeavors.Methods The database incorporated actual clinical data from patients who received treatment at the Cardiovascular Medicine Department of Chinese PLA General Hospital from 2012 to 2021.It included comprehensive data on patients'basic information,medical history,non-invasive imaging studies,laboratory test results,as well as peri-procedural information related to interventional surgeries,extracted from the Hospital Information System.Additionally,an innovative artificial intelligence(AI)-powered interactive follow-up system had been developed,ensuring that nearly all myocardial infarction patients received at least one post-discharge follow-up,thereby achieving comprehensive data management throughout the entire care continuum for highrisk patients.Results This database integrates extensive cross-sectional and longitudinal patient data,with a focus on higher-risk acute coronary syndrome patients.It achieves the integration of structured and unstructured clinical data,while innovatively incorporating AI and automatic speech recognition technologies to enhance data integration and workflow efficiency.It creates a comprehensive patient view,thereby improving diagnostic and follow-up quality,and provides high-quality data to support clinical research.Despite limitations in unstructured data standardization and biological sample integrity,the database's development is accompanied by ongoing optimization efforts.Conclusion The cardiovascular specialty clinical database is a comprehensive digital archive integrating clinical treatment and research,which facilitates the digital and intelligent transformation of clinical diagnosis and treatment processes.It supports clinical decision-making and offers data support and potential research directions for the specialized management of cardiovascular diseases.
基金This work was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences[grant number XDA23100100]National Natural Science Foundation of China[grant number 41771430],[grant number 41631177]China Scholarship Council[grant number 201804910732].
文摘Effective integration and wide sharing of geospatial data is an important and basic premise to facilitate the research and applications of geographic information science.However,the semantic heterogeneity of geospatial data is a major problem that significantly hinders geospatial data integration and sharing.Ontologies are regarded as a promising way to solve semantic problems by providing a formalized representation of geographic entities and relationships between them in a manner understandable to machines.Thus,many efforts have been made to explore ontology-based geospatial data integration and sharing.However,there is a lack of a specialized ontology that would provide a unified description for geospatial data.In this paper,with a focus on the characteristics of geospatial data,we propose a unified framework for geospatial data ontology,denoted GeoDataOnt,to establish a semantic foundation for geospatial data integration and sharing.First,we provide a characteristics hierarchy of geospatial data.Next,we analyze the semantic problems for each characteristic of geospatial data.Subsequently,we propose the general framework of GeoDataOnt,targeting these problems according to the characteristics of geospatial data.GeoDataOnt is then divided into multiple modules,and we show a detailed design and implementation for each module.Key limitations and challenges of GeoDataOnt are identified,and broad applications of GeoDataOnt are discussed.
基金supported by the National Natural Science Foundation of China under Grant No. 60970010the National Basic Research 973 Program of China under Grant No. 2009CB320705the Specialized Research Fund for the Doctoral Program of Higher Education of China under Grant No. 20090073110026
文摘New challenges including how to share information on heterogeneous devices appear in data-intensive pervasive computing environments. Data integration is a practical approach to these applications. Dealing with inconsistencies is one of the important problems in data integration. In this paper we motivate the problem of data inconsistency solution for data integration in pervasive environments. We define data qualit~ criteria and expense quality criteria for data sources to solve data inconsistency. In our solution, firstly, data sources needing high expense to obtain data from them are discarded by using expense quality criteria and utility function. Since it is difficult to obtain the actual quality of data sources in pervasive computing environment, we introduce fuzzy multi-attribute group decision making approach to selecting the appropriate data sources. The experimental results show that our solution has ideal effectiveness.
基金Thanks are due to the three anonymous reviewers for their constructive comments. This work was partially supported by the National Natural Science Foundation of China (Nos. 61572287 and 61533011), the Shandong Provincial Key Research and Development Program (2018GSF 118043), the Natural Science Foundation of Shandong Province, China (ZR2015FQ001), the Fundamental Research Funds of Shandong University (Nos. 2015QY001 and 2016JC007), the Scientific Research Foundation for the Returned Overseas Chinese Scholars, Ministry of Education of China.
文摘Background: More and more high-throughput datasets are available from multiple levels of measuring gene regulations. The reverse engineering of gene regulatory networks from these data offers a valuable research paradigm to decipher regulatory mechanisms. So far, numerous methods have been developed for reconstructing gene regulatory networks. Results: In this paper, we provide a review of bioinformatics methods for inferring gene regulatory network from omics data. To achieve the precision reconstruction of gene regulatory networks, an intuitive alternative is to integrate these available resources in a rational framework. We also provide computational perspectives in the endeavors of inferring gene regulatory networks from heterogeneous data. We highlight the importance of multi-omics data integration with prior knowledge in gene regulatory network inferences. Conclusions: We provide computational perspectives of inferring gene regulatory networks from multiple omics data and present theoretical analyses of existing challenges and possible solutions. We emphasize on prior knowledge and data integration in network inferences owing to their abilities of identifying regulatory causality.
基金Under the auspices of National Natural Science Foundation of China(No.41271416)Strategic Priority Research Program of Chinese Academy of Sciences(No.XDA05090310)
文摘Land cover is recognized as one of the fundamental terrestrial datasets required in land system change and other ecosystem related researches across the globe. The regional differentiation and spatial-temporal variation of land cover has significant impact on regional natural environment and socio-economic sustainable development. Under this context, we reconstructed the history land cover data in Siberia to provide a comparable datasets to the land cover datasets in China and abroad. In this paper, the European Space Agency(ESA) Global Land Cover Map(GlobCover), Landsat Thematic Mapper(TM), Enhanced Thematic Mapper(ETM), Multispectral Scanner(MSS) images, Google Earth images and other additional data were used to produce the land cover datasets in 1975 and 2010 in Siberia. Data evaluation show that the total user′s accuracy of land cover data in 2010 was 86.96%, which was higher than ESA GlobCover data in Siberia. The analysis on the land cover changes found that there were no big land cover changes in Siberia from 1975 to 2010 with only a few conversions between different natural forest types. The mainly changes are the conversion from deciduous needleleaf forest to deciduous broadleaf forest, deciduous needleleaf forest to mixed forest, savannas to deciduous needleleaf forest etc., indicating that the dominant driving factor of land cover changes in Siberia was natural element rather than human activities at some extent, which was very different from China. However, our purpose was not just to produce the land cover datasets at two time period or explore the driving factors of land cover changes in Siberia, we also paid attention on the significance and application of the datasets in various fields such as global climate change, geopolitics, cross-border cooperation and so on.
文摘To construct mediators for data integration systems that integrate structured and semi-structured data, and to facilitate the reformulation and decomposition of the query, the presented system uses the XML processing language (XPL) for the mediator. With XPL, it is easy to construct mediators for data integration based on XML, and it can accelerate the work in the mediator.
基金supported by the Swiss National Science Foundation
文摘Geophysical techniques can help to bridge the inherent gap that exists with regard to spatial resolution and coverage for classical hydrological methods. This has led to the emergence of a new and rapidly growing research domain generally referred to as hydrogeophysics. Given the differing sensitivities of various geophysical techniques to hydrologically relevant parameters, their inherent trade-off between resolution and range, as well as the notoriously site-specific nature of petrophysical parameter relations, the fundamental usefulness of multi-method surveys for reducing uncertainties in data analysis and interpretation is widely accepted. A major challenge arising from such endeavors is the quantitative integration of the resulting vast and diverse database into a unified model of the probed subsurface region that is consistent with all available measurements. To this end, we present a novel approach toward hydrogeophysical data integration based on a Monte-Carlo-type conditional stochastic simulation method that we consider to be particularly suitable for high-resolution local-scale studies. Monte Carlo techniques are flexible and versatile, allowing for accounting for a wide variety of data and constraints of differing resolution and hardness, and thus have the potential of providing, in a geostatistical sense, realistic models of the pertinent target parameter distributions. Compared to more conventional approaches, such as co-kriging or cluster analysis, our approach provides significant ad- vancements in the way that larger-scale structural information eontained in the hydrogeophysieal data can be accounted for. After outlining the methodological background of our algorithm, we present the results of its application to the integration of porosity log and tomographic crosshole georadar data to generate stochastic realizations of the detailed local-scale porosity structure. Our procedure is first tested on pertinent synthetic data and then applied to a field dataset collected at the Boise Hydrogeophysical Research Site. Finally, we compare the performance of our data integration approach to that of more conventional methods with regard to the prediction of flow and transport phenomena in highly heterogeneous media and discuss the implications arising.
基金funding within the Wheat BigData Project(German Federal Ministry of Food and Agriculture,FKZ2818408B18)。
文摘Genome-wide association mapping studies(GWAS)based on Big Data are a potential approach to improve marker-assisted selection in plant breeding.The number of available phenotypic and genomic data sets in which medium-sized populations of several hundred individuals have been studied is rapidly increasing.Combining these data and using them in GWAS could increase both the power of QTL discovery and the accuracy of estimation of underlying genetic effects,but is hindered by data heterogeneity and lack of interoperability.In this study,we used genomic and phenotypic data sets,focusing on Central European winter wheat populations evaluated for heading date.We explored strategies for integrating these data and subsequently the resulting potential for GWAS.Establishing interoperability between data sets was greatly aided by some overlapping genotypes and a linear relationship between the different phenotyping protocols,resulting in high quality integrated phenotypic data.In this context,genomic prediction proved to be a suitable tool to study relevance of interactions between genotypes and experimental series,which was low in our case.Contrary to expectations,fewer associations between markers and traits were found in the larger combined data than in the individual experimental series.However,the predictive power based on the marker-trait associations of the integrated data set was higher across data sets.Therefore,the results show that the integration of medium-sized to Big Data is an approach to increase the power to detect QTL in GWAS.The results encourage further efforts to standardize and share data in the plant breeding community.
基金funding provided by Universitàdegli Studi di Padovapart of the project PROH-DAIRY project(Development of precision livestock breeding tools toward One Health in Italian and Israeli dairy chains)funded by the Ministry of Foreign Affairs and International Cooperation(MAECI)within the Italy-Israel R&D Cooperation Program(Roma,Italy)the Agritech National Research Center and received funding from the European Union Next-GenerationEU(PIANO NAZIONALE DI RIPRESA E RESILIENZA(PNRR)-MISSIONE 4 COM-PONENTE 2,INVESTIMENTO 1.4-D.D.103217/06/2022,CN00000022)。
文摘Background Various blood metabolites are known to be useful indicators of health status in dairy cattle,but their routine assessment is time-consuming,expensive,and stressful for the cows at the herd level.Thus,we evaluated the effectiveness of combining in-line near infrared(NIR)milk spectra with on-farm(days in milk[DIM]and parity)and genetic markers for predicting blood metabolites in Holstein cattle.Data were obtained from 388 Holstein cows from a farm with an AfiLab system.NIR spectra,on-farm information,and single nucleotide polymorphisms(SNP)markers were blended to develop calibration equations for blood metabolites using the elastic net(ENet)approach,considering 3 mod els:(1)Model 1(M1)including only NIR information,(2)Model 2(M2)with both NIR and on-farm information,and(3)Model 3(M3)combining NIR,on-farm and genomic information.Dimension reduction was considered for M3 by preselecting SNP markers from genome-wide association study(GWAS)results.Results Results indicate that M2 improved the predictive ability by an average of 19%for energy-related metabolites(glucose,cholesterol,NEFA,B H B,urea,and c reatinin e),20%for liver functio n/hepatic damage,7%for inflammation/innate immunity.24%for oxidative stress metabolites,and 23%for minerals compared to M1,Meanwhile,M3 further enhanced the predictive ability by 34%for energy-related metabolites,32%for liver function/hepatic damage,22%for inflammation/innate immunity,42.1%for oxidative stress metabolites,and 41%for mineralse compared to M1.We found improved predictive ability of M3 using selected SNP markers from GWAS results using a threshold of>2.0by 5%for energy-related metabolites,9%for liver function/hepatic damage,8%for inflammation/innate immunity,22%for oxidative stress metabolites,and 9%for minerals.Slight redu ctions were observed fo r phosphorus(2%),ferricreducing antioxidant power(1%),and glucose(3%).Furthermore,it was found that prediction accuracies are influenced by using more restrictive thresholds(-log_(10)^(P-value)>2.5 and 3.0),with a lower increase in the predictive ability.Conclusion Our results highlighted the potential of combining several sources of information,such as genetic markers,on-farm information,and in-line NIR infrared data improves the predictive ability of blood metabolites in dairy cattle,representing an effective strategy for large-scale in-line health monitoring in commercial herds.