The rapid growth of biomedical data,particularly multi-omics data including genomes,transcriptomics,proteomics,metabolomics,and epigenomics,medical research and clinical decision-making confront both new opportunities...The rapid growth of biomedical data,particularly multi-omics data including genomes,transcriptomics,proteomics,metabolomics,and epigenomics,medical research and clinical decision-making confront both new opportunities and obstacles.The huge and diversified nature of these datasets cannot always be managed using traditional data analysis methods.As a consequence,deep learning has emerged as a strong tool for analysing numerous omics data due to its ability to handle complex and non-linear relationships.This paper explores the fundamental concepts of deep learning and how they are used in multi-omics medical data mining.We demonstrate how autoencoders,variational autoencoders,multimodal models,attention mechanisms,transformers,and graph neural networks enable pattern analysis and recognition across all omics data.Deep learning has been found to be effective in illness classification,biomarker identification,gene network learning,and therapeutic efficacy prediction.We also consider critical problems like as data quality,model explainability,whether findings can be repeated,and computational power requirements.We now consider future elements of combining omics with clinical and imaging data,explainable AI,federated learning,and real-time diagnostics.Overall,this study emphasises the need of collaborating across disciplines to advance deep learning-based multi-omics research for precision medicine and comprehending complicated disorders.展开更多
Bioinformatic analysis of large and complex omics datasets has become increasingly useful in modern day biology by providing a great depth of information,with its application to neuroscience termed neuroinformatics.Da...Bioinformatic analysis of large and complex omics datasets has become increasingly useful in modern day biology by providing a great depth of information,with its application to neuroscience termed neuroinformatics.Data mining of omics datasets has enabled the generation of new hypotheses based on differentially regulated biological molecules associated with disease mechanisms,which can be tested experimentally for improved diagnostic and therapeutic targeting of neurodegenerative diseases.Importantly,integrating multi-omics data using a systems bioinformatics approach will advance the understanding of the layered and interactive network of biological regulation that exchanges systemic knowledge to facilitate the development of a comprehensive human brain profile.In this review,we first summarize data mining studies utilizing datasets from the individual type of omics analysis,including epigenetics/epigenomics,transcriptomics,proteomics,metabolomics,lipidomics,and spatial omics,pertaining to Alzheimer's disease,Parkinson's disease,and multiple sclerosis.We then discuss multi-omics integration approaches,including independent biological integration and unsupervised integration methods,for more intuitive and informative interpretation of the biological data obtained across different omics layers.We further assess studies that integrate multi-omics in data mining which provide convoluted biological insights and offer proof-of-concept proposition towards systems bioinformatics in the reconstruction of brain networks.Finally,we recommend a combination of high dimensional bioinformatics analysis with experimental validation to achieve translational neuroscience applications including biomarker discovery,therapeutic development,and elucidation of disease mechanisms.We conclude by providing future perspectives and opportunities in applying integrative multi-omics and systems bioinformatics to achieve precision phenotyping of neurodegenerative diseases and towards personalized medicine.展开更多
To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,al...To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.展开更多
Industrial big data integration and sharing(IBDIS)is of great significance in managing and providing data for big data analysis in manufacturing systems.A novel fog-computing-based IBDIS approach called Fog-IBDIS is p...Industrial big data integration and sharing(IBDIS)is of great significance in managing and providing data for big data analysis in manufacturing systems.A novel fog-computing-based IBDIS approach called Fog-IBDIS is proposed in order to integrate and share industrial big data with high raw data security and low network traffic loads by moving the integration task from the cloud to the edge of networks.First,a task flow graph(TFG)is designed to model the data analysis process.The TFG is composed of several tasks,which are executed by the data owners through the Fog-IBDIS platform in order to protect raw data privacy.Second,the function of Fog-IBDIS to enable data integration and sharing is presented in five modules:TFG management,compilation and running control,the data integration model,the basic algorithm library,and the management component.Finally,a case study is presented to illustrate the implementation of Fog-IBDIS,which ensures raw data security by deploying the analysis tasks executed by the data generators,and eases the network traffic load by greatly reducing the volume of transmitted data.展开更多
With the rapid development of Web, there are more and more Web databases available for users to access. At the same time, job searchers often have difficulties in first finding the right sources and then querying over...With the rapid development of Web, there are more and more Web databases available for users to access. At the same time, job searchers often have difficulties in first finding the right sources and then querying over them, providing such an integrated job search system over Web databases has become a Web application in high demand. Based on such consideration, we build a deep Web data integration system that supports unified access for users to multiple job Web sites as a job meta-search engine. In this paper, the architecture of the system is given first, and the key components in the system are introduced.展开更多
This paper analyzes the status of existing resources through extensive research and international cooperation on the basis of four typical global monthly surface temperature datasets including the climate research dat...This paper analyzes the status of existing resources through extensive research and international cooperation on the basis of four typical global monthly surface temperature datasets including the climate research dataset of the University of East Anglia(CRUTEM3), the dataset of the U.S. National Climatic Data Center(GHCN-V3), the dataset of the U.S. National Aeronautics and Space Administration(GISSTMP), and the Berkeley Earth surface temperature dataset(Berkeley). China's first global monthly temperature dataset over land was developed by integrating the four aforementioned global temperature datasets and several regional datasets from major countries or regions. This dataset contains information from 9,519 stations worldwide of at least 20 years for monthly mean temperature, 7,073 for maximum temperature, and 6,587 for minimum temperature. Compared with CRUTEM3 and GHCN-V3, the station density is much higher particularly for South America, Africa,and Asia. Moreover, data from significantly more stations were available after the year 1990 which dramatically reduced the uncertainty of the estimated global temperature trend during 1990e2011. The integrated dataset can serve as a reliable data source for global climate change research.展开更多
Background:Presently,multi-omics data(e.g.,genomics,transcriptomics,proteomics,and metabolomics)are available to improve genomic predictors.Omics data not only offers new data layers for genomic prediction but also pr...Background:Presently,multi-omics data(e.g.,genomics,transcriptomics,proteomics,and metabolomics)are available to improve genomic predictors.Omics data not only offers new data layers for genomic prediction but also provides a bridge between organismal phenotypes and genome variation that cannot be readily captured at the genome sequence level.Therefore,using multi-omics data to select feature markers is a feasible strategy to improve the accuracy of genomic prediction.In this study,simultaneously using whole-genome sequencing(WGS)and gene expression level data,four strategies for single-nucleotide polymorphism(SNP)preselection were investigated for genomic predictions in the Drosophila Genetic Reference Panel.Results:Using genomic best linear unbiased prediction(GBLUP)with complete WGS data,the prediction accuracies were 0.208±0.020(0.181±0.022)for the startle response and 0.272±0.017(0.307±0.015)for starvation resistance in the female(male)lines.Compared with GBLUP using complete WGS data,both GBLUP and the genomic feature BLUP(GFBLUP)did not improve the prediction accuracy using SNPs preselected from complete WGS data based on the results of genome-wide association studies(GWASs)or transcriptome-wide association studies(TWASs).Furthermore,by using SNPs preselected from the WGS data based on the results of the expression quantitative trait locus(eQTL)mapping of all genes,only the startle response had greater accuracy than GBLUP with the complete WGS data.The best accuracy values in the female and male lines were 0.243±0.020 and 0.220±0.022,respectively.Importantly,by using SNPs preselected based on the results of the eQTL mapping of significant genes from TWAS,both GBLUP and GFBLUP resulted in great accuracy and small bias of genomic prediction.Compared with the GBLUP using complete WGS data,the best accuracy values represented increases of 60.66%and 39.09%for the starvation resistance and 27.40%and 35.36%for startle response in the female and male lines,respectively.Conclusions:Overall,multi-omics data can assist genomic feature preselection and improve the performance of genomic prediction.The new knowledge gained from this study will enrich the use of multi-omics in genomic prediction.展开更多
With the deepening informationization of Resources & Environment Remote Sensing geological survey conducted,some potential problems and deficiency are:(1) shortage of unified-planed running environment;(2) inconsi...With the deepening informationization of Resources & Environment Remote Sensing geological survey conducted,some potential problems and deficiency are:(1) shortage of unified-planed running environment;(2) inconsistent methods of data integration;and(3) disadvantages of different performing ways of data integration.This paper solves the above problems through overall planning and design,constructs unified running environment, consistent methods of data integration and system structure in order to advance the informationization展开更多
An 8×10 GHz receiver optical sub-assembly (ROSA) consisting of an 8-channel arrayed waveguide grating (AWG) and an 8-channel PIN photodetector (PD) array is designed and fabricated based on silica hybrid in...An 8×10 GHz receiver optical sub-assembly (ROSA) consisting of an 8-channel arrayed waveguide grating (AWG) and an 8-channel PIN photodetector (PD) array is designed and fabricated based on silica hybrid integration technology. Multimode output waveguides in the silica AWG with 2% refractive index difference are used to obtain fiat-top spectra. The output waveguide facet is polished to 45° bevel to change the light propagation direction into the mesa-type PIN PD, which simplifies the packaging process. The experimentM results show that the single channel I dB bandwidth of AWG ranges from 2.12nm to 3.06nm, the ROSA responsivity ranges from 0.097 A/W to 0.158A/W, and the 3dB bandwidth is up to 11 GHz. It is promising to be applied in the eight-lane WDM transmission system in data center interconnection.展开更多
This paper presents some key techniques for multi-sensor integration system, which is applied to the intelligent transportation system industry and surveying and mapping industry, e.g. road surface condition detection...This paper presents some key techniques for multi-sensor integration system, which is applied to the intelligent transportation system industry and surveying and mapping industry, e.g. road surface condition detection, digital map making. The techniques are synchronization control of multi-sensor, space-time benchmark for sensor data, and multi-sensor data fusion and mining. Firstly, synchronization control of multi-sensor is achieved through a synchronization control system which is composed of a time synchronization controller and some synchronization sub-controllers. The time synchronization controller can receive GPS time information from GPS satellites, relative distance information from distance measuring instrument and send space-time information to the synchronization sub-controller. The latter can work at three types of synchronization mode, i.e. active synchronization, passive synchronization and time service synchronization. Secondly, space-time benchmark can be established based on GPS time and global reference coordinate system, and can be obtained through position and azimuth determining system and synchronization control system. Thirdly, there are many types of data fusion and mining, e.g. GPS/Gyro/DMI data fusion, data fusion between stereophotogrammetry and PADS, data fusion between laser scanner and PADS, and data fusion between CCD camera and laser scanner. Finally, all these solutions presented in paper have been applied to two areas, i.e. land-bone intelligent road detection and measurement system and 3D measurement system based on unmanned helicopter. The former has equipped some highway engineering Co. , Ltd. and has been successfully put into use. The latter is an ongoing resealch.展开更多
In this paper we propose a service-oriented architecture for spatial data integration (SOA-SDI) in the context of a large number of available spatial data sources that are physically sitting at different places, and d...In this paper we propose a service-oriented architecture for spatial data integration (SOA-SDI) in the context of a large number of available spatial data sources that are physically sitting at different places, and develop web-based GIS systems based on SOA-SDI, allowing client applications to pull in, analyze and present spatial data from those available spatial data sources. The proposed architecture logically includes 4 layers or components; they are layer of multiple data provider services, layer of data in-tegration, layer of backend services, and front-end graphical user interface (GUI) for spatial data presentation. On the basis of the 4-layered SOA-SDI framework, WebGIS applications can be quickly deployed, which proves that SOA-SDI has the potential to reduce the input of software development and shorten the development period.展开更多
Accurately evaluating the lifespan of the Printed Circuit Board(PCB)in airborne equipment is an essential issue for aircraft design and operation in the marine atmospheric environment.This paper presents a novel evalu...Accurately evaluating the lifespan of the Printed Circuit Board(PCB)in airborne equipment is an essential issue for aircraft design and operation in the marine atmospheric environment.This paper presents a novel evaluation method by fusing Accelerated Degradation Testing(ADT)data,degradation data,and life data of small samples based on the uncertainty degradation process.An uncertain life model of PCB in airborne equipment is constructed by employing the uncertain distribution that considers the accelerated factor of multiple environmental conditions such as temperature,humidity,and salinity.In addition,a degradation process model of PCB in airborne equipment is constructed by employing the uncertain process of fusing ADT data and field data,in which the performance characteristics of dynamic cumulative change are included.Based on minimizing the pth sample moments,an integrated method for parameter estimation of the PCB in airborne equipment is proposed by fusing the multi-source data of life,degradation,and ADT.An engineering case illustrates the effectiveness and advantage of the proposed method.展开更多
With the advent of the era of big data,traditional financial management has been unable to meet the needs of modern enterprise business.Enterprises hope that financial management has the function of improving the accu...With the advent of the era of big data,traditional financial management has been unable to meet the needs of modern enterprise business.Enterprises hope that financial management has the function of improving the accuracy of corporate financial data,assisting corporate management to make decisions that are more in line with the actual development of the company,and optimizing corporate management systems,thereby comprehensively improving the overall level of the company and ensuring that the company can be in business with the assistance of financial integration,can better improve and develop themselves.Based on the investigation of enterprises and universities,this article analyzes the problem of accounting talent training from both the demand and supply ends,and puts forward some suggestions for the teaching reform of accounting integration with big data in financial colleges and universities,and strives to promote the integration of business and finance.The optimal allocation of enterprise resources will gradually enhance the market competitiveness of enterprises,and explore the application strategies of big data technology in the integration of enterprise business and finance.展开更多
Data compression plays a vital role in datamanagement and information theory by reducing redundancy.However,it lacks built-in security features such as secret keys or password-based access control,leaving sensitive da...Data compression plays a vital role in datamanagement and information theory by reducing redundancy.However,it lacks built-in security features such as secret keys or password-based access control,leaving sensitive data vulnerable to unauthorized access and misuse.With the exponential growth of digital data,robust security measures are essential.Data encryption,a widely used approach,ensures data confidentiality by making it unreadable and unalterable through secret key control.Despite their individual benefits,both require significant computational resources.Additionally,performing them separately for the same data increases complexity and processing time.Recognizing the need for integrated approaches that balance compression ratios and security levels,this research proposes an integrated data compression and encryption algorithm,named IDCE,for enhanced security and efficiency.Thealgorithmoperates on 128-bit block sizes and a 256-bit secret key length.It combines Huffman coding for compression and a Tent map for encryption.Additionally,an iterative Arnold cat map further enhances cryptographic confusion properties.Experimental analysis validates the effectiveness of the proposed algorithm,showcasing competitive performance in terms of compression ratio,security,and overall efficiency when compared to prior algorithms in the field.展开更多
The study of plant diversity is often hindered by the challenge of integrating data from different sources and different data types.A standardized data system would facilitate detailed exploration of plant distributio...The study of plant diversity is often hindered by the challenge of integrating data from different sources and different data types.A standardized data system would facilitate detailed exploration of plant distribution patterns and dynamics for botanists,ecologists,conservation biologists,and biogeographers.This study proposes a gridded vector data integration method,combining grid-based techniques with vectorization to integrate diverse data types from multiple sources into grids of the same scale.Here we demonstrate the methodology by creating a comprehensive 1°×1°database of western China that includes plant distribution information and environmental factor data.This approach addresses the need for a standardized data system to facilitate exploration of plant distribution patterns and dynamic changes in the region.展开更多
Digital twin is a novel technology that has achieved significant progress in industrial manufactur-ing systems in recent years.In the digital twin envi-ronment,entities in the virtual space collect data from devices i...Digital twin is a novel technology that has achieved significant progress in industrial manufactur-ing systems in recent years.In the digital twin envi-ronment,entities in the virtual space collect data from devices in the physical space to analyze their states.However,since a lot of devices exist in the physical space,the digital twin system needs to aggregate data from multiple devices at the edge gateway.Homomor-phic integrity and confidentiality protections are two important requirements for this data aggregation pro-cess.Unfortunately,existing homomorphic encryp-tion algorithms do not support integrity protection,and existing homomorphic signing algorithms require all signers to use the same signing key,which is not feasible in the digital twin environment.Moreover,for both integrity and confidentiality protections,the homomorphic signing algorithm must be compatible with the aggregation manner of the homomorphic en-cryption algorithm.To address these issues,this paper designs a novel homomorphic aggregation scheme,which allows multiple devices in the physical space to sign different data using different keys and support in-tegrity and confidentiality protections.Finally,the security of the newly designed scheme is analyzed,and its efficiency is evaluated.Experimental results show that our scheme is feasible for real world applications.展开更多
To construct mediators for data integration systems that integrate structured and semi-structured data, and to facilitate the reformulation and decomposition of the query, the presented system uses the XML processing ...To construct mediators for data integration systems that integrate structured and semi-structured data, and to facilitate the reformulation and decomposition of the query, the presented system uses the XML processing language (XPL) for the mediator. With XPL, it is easy to construct mediators for data integration based on XML, and it can accelerate the work in the mediator.展开更多
Integrating and sharing data from different data sources is one of the trends to make better use of data. However, data integration hampers data confidentiality where each data source has its own access control policy...Integrating and sharing data from different data sources is one of the trends to make better use of data. However, data integration hampers data confidentiality where each data source has its own access control policy. This paper includes a discussion on the issue about access control across multiple data sources when they arc combined together in the scenario of searching over these data. A method based on multilevel security for data integration is proposed. The proposed method allows the merging of policies and also tackles the issue of policy conflicts between different data sources.展开更多
The distillation process is an important chemical process,and the application of data-driven modelling approach has the potential to reduce model complexity compared to mechanistic modelling,thus improving the efficie...The distillation process is an important chemical process,and the application of data-driven modelling approach has the potential to reduce model complexity compared to mechanistic modelling,thus improving the efficiency of process optimization or monitoring studies.However,the distillation process is highly nonlinear and has multiple uncertainty perturbation intervals,which brings challenges to accurate data-driven modelling of distillation processes.This paper proposes a systematic data-driven modelling framework to solve these problems.Firstly,data segment variance was introduced into the K-means algorithm to form K-means data interval(KMDI)clustering in order to cluster the data into perturbed and steady state intervals for steady-state data extraction.Secondly,maximal information coefficient(MIC)was employed to calculate the nonlinear correlation between variables for removing redundant features.Finally,extreme gradient boosting(XGBoost)was integrated as the basic learner into adaptive boosting(AdaBoost)with the error threshold(ET)set to improve weights update strategy to construct the new integrated learning algorithm,XGBoost-AdaBoost-ET.The superiority of the proposed framework is verified by applying this data-driven modelling framework to a real industrial process of propylene distillation.展开更多
We propose a three-step technique to achieve this purpose. First, we utilize a collection of XML namespaces organized into hierarchical structure as a medium for expressing data semantics. Second, we define the format...We propose a three-step technique to achieve this purpose. First, we utilize a collection of XML namespaces organized into hierarchical structure as a medium for expressing data semantics. Second, we define the format of resource descriptor for the information source discovery scheme so that we can dynamically register and/or deregister the Web data sources on the fly. Third, we employ an inverted-index mechanism to identify the subset of information sources that are relevant to a particular user query. We describe the design, architecture, and implementation of our approach—IWDS, and illustrate its use through case examples. Key words integration - heterogeneity - Web data source - XML namespace CLC number TP 311.13 Foundation item: Supported by the National Key Technologies R&D Program of China(2002BA103A04)Biography: WU Wei (1975-), male, Ph.D candidate, research direction: information integration, distribute computing展开更多
文摘The rapid growth of biomedical data,particularly multi-omics data including genomes,transcriptomics,proteomics,metabolomics,and epigenomics,medical research and clinical decision-making confront both new opportunities and obstacles.The huge and diversified nature of these datasets cannot always be managed using traditional data analysis methods.As a consequence,deep learning has emerged as a strong tool for analysing numerous omics data due to its ability to handle complex and non-linear relationships.This paper explores the fundamental concepts of deep learning and how they are used in multi-omics medical data mining.We demonstrate how autoencoders,variational autoencoders,multimodal models,attention mechanisms,transformers,and graph neural networks enable pattern analysis and recognition across all omics data.Deep learning has been found to be effective in illness classification,biomarker identification,gene network learning,and therapeutic efficacy prediction.We also consider critical problems like as data quality,model explainability,whether findings can be repeated,and computational power requirements.We now consider future elements of combining omics with clinical and imaging data,explainable AI,federated learning,and real-time diagnostics.Overall,this study emphasises the need of collaborating across disciplines to advance deep learning-based multi-omics research for precision medicine and comprehending complicated disorders.
基金supported by a Lee Kong Chian School of Medicine Dean’s Postdoctoral Fellowship(021207-00001)from Nanyang Technological University(NTU)Singapore and a Mistletoe Research Fellowship(022522-00001)from the Momental Foundation USA.Jialiu Zeng is supported by a Presidential Postdoctoral Fellowship(021229-00001)from NTU Singapore and an Open Fund Young Investigator Research Grant(OF-YIRG)(MOH-001147)from the National Medical Research Council(NMRC)SingaporeSu Bin Lim is supported by the National Research Foundation(NRF)of Korea(Grant Nos.:2020R1A6A1A03043539,2020M3A9D8037604,2022R1C1C1004756)a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute(KHIDI),funded by the Ministry of Health&Welfare,Republic of Korea(Grant No.:HR22C1734).
文摘Bioinformatic analysis of large and complex omics datasets has become increasingly useful in modern day biology by providing a great depth of information,with its application to neuroscience termed neuroinformatics.Data mining of omics datasets has enabled the generation of new hypotheses based on differentially regulated biological molecules associated with disease mechanisms,which can be tested experimentally for improved diagnostic and therapeutic targeting of neurodegenerative diseases.Importantly,integrating multi-omics data using a systems bioinformatics approach will advance the understanding of the layered and interactive network of biological regulation that exchanges systemic knowledge to facilitate the development of a comprehensive human brain profile.In this review,we first summarize data mining studies utilizing datasets from the individual type of omics analysis,including epigenetics/epigenomics,transcriptomics,proteomics,metabolomics,lipidomics,and spatial omics,pertaining to Alzheimer's disease,Parkinson's disease,and multiple sclerosis.We then discuss multi-omics integration approaches,including independent biological integration and unsupervised integration methods,for more intuitive and informative interpretation of the biological data obtained across different omics layers.We further assess studies that integrate multi-omics in data mining which provide convoluted biological insights and offer proof-of-concept proposition towards systems bioinformatics in the reconstruction of brain networks.Finally,we recommend a combination of high dimensional bioinformatics analysis with experimental validation to achieve translational neuroscience applications including biomarker discovery,therapeutic development,and elucidation of disease mechanisms.We conclude by providing future perspectives and opportunities in applying integrative multi-omics and systems bioinformatics to achieve precision phenotyping of neurodegenerative diseases and towards personalized medicine.
基金Weaponry Equipment Pre-Research Foundation of PLA Equipment Ministry (No. 9140A06050409JB8102)Pre-Research Foundation of PLA University of Science and Technology (No. 2009JSJ11)
文摘To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.
基金This work was supported in part by the National Natural Science Foundation of China(51435009)Shanghai Sailing Program(19YF1401500)the Fundamental Research Funds for the Central Universities(2232019D3-34).
文摘Industrial big data integration and sharing(IBDIS)is of great significance in managing and providing data for big data analysis in manufacturing systems.A novel fog-computing-based IBDIS approach called Fog-IBDIS is proposed in order to integrate and share industrial big data with high raw data security and low network traffic loads by moving the integration task from the cloud to the edge of networks.First,a task flow graph(TFG)is designed to model the data analysis process.The TFG is composed of several tasks,which are executed by the data owners through the Fog-IBDIS platform in order to protect raw data privacy.Second,the function of Fog-IBDIS to enable data integration and sharing is presented in five modules:TFG management,compilation and running control,the data integration model,the basic algorithm library,and the management component.Finally,a case study is presented to illustrate the implementation of Fog-IBDIS,which ensures raw data security by deploying the analysis tasks executed by the data generators,and eases the network traffic load by greatly reducing the volume of transmitted data.
基金Supportted by the Natural Science Foundation ofChina (60573091 ,60273018) National Basic Research and Develop-ment Programof China (2003CB317000) the Key Project of Minis-try of Education of China (03044) .
文摘With the rapid development of Web, there are more and more Web databases available for users to access. At the same time, job searchers often have difficulties in first finding the right sources and then querying over them, providing such an integrated job search system over Web databases has become a Web application in high demand. Based on such consideration, we build a deep Web data integration system that supports unified access for users to multiple job Web sites as a job meta-search engine. In this paper, the architecture of the system is given first, and the key components in the system are introduced.
基金supported by the China Meteorological Administration Special Public Welfare Research Fund (GYHY201206012, GYHY201406016)the Climate Change Foundation of the China Meteorological Administration (CCSF201338)
文摘This paper analyzes the status of existing resources through extensive research and international cooperation on the basis of four typical global monthly surface temperature datasets including the climate research dataset of the University of East Anglia(CRUTEM3), the dataset of the U.S. National Climatic Data Center(GHCN-V3), the dataset of the U.S. National Aeronautics and Space Administration(GISSTMP), and the Berkeley Earth surface temperature dataset(Berkeley). China's first global monthly temperature dataset over land was developed by integrating the four aforementioned global temperature datasets and several regional datasets from major countries or regions. This dataset contains information from 9,519 stations worldwide of at least 20 years for monthly mean temperature, 7,073 for maximum temperature, and 6,587 for minimum temperature. Compared with CRUTEM3 and GHCN-V3, the station density is much higher particularly for South America, Africa,and Asia. Moreover, data from significantly more stations were available after the year 1990 which dramatically reduced the uncertainty of the estimated global temperature trend during 1990e2011. The integrated dataset can serve as a reliable data source for global climate change research.
基金supported by the National Natural Science Foundation of China(31772556)the Local Innovative and Research Teams Project of Guangdong Province(2019BT02N630)+1 种基金the grants from the earmarked fund for China Agriculture Research System(CARS-35)the Science and Technology Innovation Strategy projects of Guangdong Province(Grant No.2018B020203002).
文摘Background:Presently,multi-omics data(e.g.,genomics,transcriptomics,proteomics,and metabolomics)are available to improve genomic predictors.Omics data not only offers new data layers for genomic prediction but also provides a bridge between organismal phenotypes and genome variation that cannot be readily captured at the genome sequence level.Therefore,using multi-omics data to select feature markers is a feasible strategy to improve the accuracy of genomic prediction.In this study,simultaneously using whole-genome sequencing(WGS)and gene expression level data,four strategies for single-nucleotide polymorphism(SNP)preselection were investigated for genomic predictions in the Drosophila Genetic Reference Panel.Results:Using genomic best linear unbiased prediction(GBLUP)with complete WGS data,the prediction accuracies were 0.208±0.020(0.181±0.022)for the startle response and 0.272±0.017(0.307±0.015)for starvation resistance in the female(male)lines.Compared with GBLUP using complete WGS data,both GBLUP and the genomic feature BLUP(GFBLUP)did not improve the prediction accuracy using SNPs preselected from complete WGS data based on the results of genome-wide association studies(GWASs)or transcriptome-wide association studies(TWASs).Furthermore,by using SNPs preselected from the WGS data based on the results of the expression quantitative trait locus(eQTL)mapping of all genes,only the startle response had greater accuracy than GBLUP with the complete WGS data.The best accuracy values in the female and male lines were 0.243±0.020 and 0.220±0.022,respectively.Importantly,by using SNPs preselected based on the results of the eQTL mapping of significant genes from TWAS,both GBLUP and GFBLUP resulted in great accuracy and small bias of genomic prediction.Compared with the GBLUP using complete WGS data,the best accuracy values represented increases of 60.66%and 39.09%for the starvation resistance and 27.40%and 35.36%for startle response in the female and male lines,respectively.Conclusions:Overall,multi-omics data can assist genomic feature preselection and improve the performance of genomic prediction.The new knowledge gained from this study will enrich the use of multi-omics in genomic prediction.
文摘With the deepening informationization of Resources & Environment Remote Sensing geological survey conducted,some potential problems and deficiency are:(1) shortage of unified-planed running environment;(2) inconsistent methods of data integration;and(3) disadvantages of different performing ways of data integration.This paper solves the above problems through overall planning and design,constructs unified running environment, consistent methods of data integration and system structure in order to advance the informationization
基金Supported by the National High Technology Research and Development Program of China under Grant No 2015AA016902the National Natural Science Foundation of China under Grant Nos 61435013 and 61405188the K.C.Wong Education Foundation
文摘An 8×10 GHz receiver optical sub-assembly (ROSA) consisting of an 8-channel arrayed waveguide grating (AWG) and an 8-channel PIN photodetector (PD) array is designed and fabricated based on silica hybrid integration technology. Multimode output waveguides in the silica AWG with 2% refractive index difference are used to obtain fiat-top spectra. The output waveguide facet is polished to 45° bevel to change the light propagation direction into the mesa-type PIN PD, which simplifies the packaging process. The experimentM results show that the single channel I dB bandwidth of AWG ranges from 2.12nm to 3.06nm, the ROSA responsivity ranges from 0.097 A/W to 0.158A/W, and the 3dB bandwidth is up to 11 GHz. It is promising to be applied in the eight-lane WDM transmission system in data center interconnection.
基金The Foundation for Innovative Research Groups of the National Natural Science Foundation of China (No. 40721001)The Ph.D. Programs Foundation of Ministry of Education of China (No. 20070486001)+1 种基金The State Key Program of National Natural Science of China (No. 40830530)The National Natural Science Foundation of China (No. 60872132)
文摘This paper presents some key techniques for multi-sensor integration system, which is applied to the intelligent transportation system industry and surveying and mapping industry, e.g. road surface condition detection, digital map making. The techniques are synchronization control of multi-sensor, space-time benchmark for sensor data, and multi-sensor data fusion and mining. Firstly, synchronization control of multi-sensor is achieved through a synchronization control system which is composed of a time synchronization controller and some synchronization sub-controllers. The time synchronization controller can receive GPS time information from GPS satellites, relative distance information from distance measuring instrument and send space-time information to the synchronization sub-controller. The latter can work at three types of synchronization mode, i.e. active synchronization, passive synchronization and time service synchronization. Secondly, space-time benchmark can be established based on GPS time and global reference coordinate system, and can be obtained through position and azimuth determining system and synchronization control system. Thirdly, there are many types of data fusion and mining, e.g. GPS/Gyro/DMI data fusion, data fusion between stereophotogrammetry and PADS, data fusion between laser scanner and PADS, and data fusion between CCD camera and laser scanner. Finally, all these solutions presented in paper have been applied to two areas, i.e. land-bone intelligent road detection and measurement system and 3D measurement system based on unmanned helicopter. The former has equipped some highway engineering Co. , Ltd. and has been successfully put into use. The latter is an ongoing resealch.
基金Supported by the Research Fund of Key GIS Lab of the Education Ministry (No. 200610)
文摘In this paper we propose a service-oriented architecture for spatial data integration (SOA-SDI) in the context of a large number of available spatial data sources that are physically sitting at different places, and develop web-based GIS systems based on SOA-SDI, allowing client applications to pull in, analyze and present spatial data from those available spatial data sources. The proposed architecture logically includes 4 layers or components; they are layer of multiple data provider services, layer of data in-tegration, layer of backend services, and front-end graphical user interface (GUI) for spatial data presentation. On the basis of the 4-layered SOA-SDI framework, WebGIS applications can be quickly deployed, which proves that SOA-SDI has the potential to reduce the input of software development and shorten the development period.
基金supported by the National Natural Science Foundation of China(No.62073009).
文摘Accurately evaluating the lifespan of the Printed Circuit Board(PCB)in airborne equipment is an essential issue for aircraft design and operation in the marine atmospheric environment.This paper presents a novel evaluation method by fusing Accelerated Degradation Testing(ADT)data,degradation data,and life data of small samples based on the uncertainty degradation process.An uncertain life model of PCB in airborne equipment is constructed by employing the uncertain distribution that considers the accelerated factor of multiple environmental conditions such as temperature,humidity,and salinity.In addition,a degradation process model of PCB in airborne equipment is constructed by employing the uncertain process of fusing ADT data and field data,in which the performance characteristics of dynamic cumulative change are included.Based on minimizing the pth sample moments,an integrated method for parameter estimation of the PCB in airborne equipment is proposed by fusing the multi-source data of life,degradation,and ADT.An engineering case illustrates the effectiveness and advantage of the proposed method.
基金The research was co-completed by School of Journalism and Communication of Hunan Normal University and Financial Big-Data Research Institute of Hunan University of Finance and Economics.This research was funded by the National Natural Science Foundation of China(No.72073041)Open Foundation for the University Innovation Platform in Hunan Province(No.18K103)+2 种基金2011 Collaborative Innovation Center for Development and Utilization of Finance and Economics Big Data Property,Universities of Hunan Province,Open Project(Nos.20181901CRP03,20181901CRP04,20181901CRP05)2020 Hunan Provincial Higher Education Teaching Reform Research Project(Nos.HNJG-2020-1130,HNJG-2020-1124)2020 General Project of Hunan Social Science Fund(No.20B16).
文摘With the advent of the era of big data,traditional financial management has been unable to meet the needs of modern enterprise business.Enterprises hope that financial management has the function of improving the accuracy of corporate financial data,assisting corporate management to make decisions that are more in line with the actual development of the company,and optimizing corporate management systems,thereby comprehensively improving the overall level of the company and ensuring that the company can be in business with the assistance of financial integration,can better improve and develop themselves.Based on the investigation of enterprises and universities,this article analyzes the problem of accounting talent training from both the demand and supply ends,and puts forward some suggestions for the teaching reform of accounting integration with big data in financial colleges and universities,and strives to promote the integration of business and finance.The optimal allocation of enterprise resources will gradually enhance the market competitiveness of enterprises,and explore the application strategies of big data technology in the integration of enterprise business and finance.
基金the Deanship of Graduate Studies and Scientific Research at Qassim University for financial support(QU-APC-2025).
文摘Data compression plays a vital role in datamanagement and information theory by reducing redundancy.However,it lacks built-in security features such as secret keys or password-based access control,leaving sensitive data vulnerable to unauthorized access and misuse.With the exponential growth of digital data,robust security measures are essential.Data encryption,a widely used approach,ensures data confidentiality by making it unreadable and unalterable through secret key control.Despite their individual benefits,both require significant computational resources.Additionally,performing them separately for the same data increases complexity and processing time.Recognizing the need for integrated approaches that balance compression ratios and security levels,this research proposes an integrated data compression and encryption algorithm,named IDCE,for enhanced security and efficiency.Thealgorithmoperates on 128-bit block sizes and a 256-bit secret key length.It combines Huffman coding for compression and a Tent map for encryption.Additionally,an iterative Arnold cat map further enhances cryptographic confusion properties.Experimental analysis validates the effectiveness of the proposed algorithm,showcasing competitive performance in terms of compression ratio,security,and overall efficiency when compared to prior algorithms in the field.
基金supported by the Second Tibetan Plateau Scientific Expedition and Research(STEP)program(2019QZKK0502)the National Natural Science Foundation of China(32322006)+1 种基金the Major Program for Basic Research Project of Yunnan Province(202103AF140005 and 202101BC070002)the Practice Innovation Fund for Professional Degree Graduates of Yunnan University(ZC-22222401).
文摘The study of plant diversity is often hindered by the challenge of integrating data from different sources and different data types.A standardized data system would facilitate detailed exploration of plant distribution patterns and dynamics for botanists,ecologists,conservation biologists,and biogeographers.This study proposes a gridded vector data integration method,combining grid-based techniques with vectorization to integrate diverse data types from multiple sources into grids of the same scale.Here we demonstrate the methodology by creating a comprehensive 1°×1°database of western China that includes plant distribution information and environmental factor data.This approach addresses the need for a standardized data system to facilitate exploration of plant distribution patterns and dynamic changes in the region.
基金supported by ZTE Industry-University-Institute Cooperation Funds under Grant No.IA20230628015the State Key Laboratory of Particle Detection and Electronics under Grant No.SKLPDE-KF-202314.
文摘Digital twin is a novel technology that has achieved significant progress in industrial manufactur-ing systems in recent years.In the digital twin envi-ronment,entities in the virtual space collect data from devices in the physical space to analyze their states.However,since a lot of devices exist in the physical space,the digital twin system needs to aggregate data from multiple devices at the edge gateway.Homomor-phic integrity and confidentiality protections are two important requirements for this data aggregation pro-cess.Unfortunately,existing homomorphic encryp-tion algorithms do not support integrity protection,and existing homomorphic signing algorithms require all signers to use the same signing key,which is not feasible in the digital twin environment.Moreover,for both integrity and confidentiality protections,the homomorphic signing algorithm must be compatible with the aggregation manner of the homomorphic en-cryption algorithm.To address these issues,this paper designs a novel homomorphic aggregation scheme,which allows multiple devices in the physical space to sign different data using different keys and support in-tegrity and confidentiality protections.Finally,the security of the newly designed scheme is analyzed,and its efficiency is evaluated.Experimental results show that our scheme is feasible for real world applications.
文摘To construct mediators for data integration systems that integrate structured and semi-structured data, and to facilitate the reformulation and decomposition of the query, the presented system uses the XML processing language (XPL) for the mediator. With XPL, it is easy to construct mediators for data integration based on XML, and it can accelerate the work in the mediator.
基金Supported by the China MOE-China Mobile Research Fund(MCM20121051,MCM20130651)China MOE Doctoral Research Fund(20134407120017)+2 种基金Natural Science Foundation of Guangdong Province(S2012030006242)Guangdong Industry Development Fund(S2014-007)Guangzhou Industry Cooperation Fund(2014Y2-00004,2014Y2-00006)
文摘Integrating and sharing data from different data sources is one of the trends to make better use of data. However, data integration hampers data confidentiality where each data source has its own access control policy. This paper includes a discussion on the issue about access control across multiple data sources when they arc combined together in the scenario of searching over these data. A method based on multilevel security for data integration is proposed. The proposed method allows the merging of policies and also tackles the issue of policy conflicts between different data sources.
基金supported by the National Key Research and Development Program of China(2023YFB3307801)the National Natural Science Foundation of China(62394343,62373155,62073142)+3 种基金Major Science and Technology Project of Xinjiang(No.2022A01006-4)the Programme of Introducing Talents of Discipline to Universities(the 111 Project)under Grant B17017the Fundamental Research Funds for the Central Universities,Science Foundation of China University of Petroleum,Beijing(No.2462024YJRC011)the Open Research Project of the State Key Laboratory of Industrial Control Technology,China(Grant No.ICT2024B70).
文摘The distillation process is an important chemical process,and the application of data-driven modelling approach has the potential to reduce model complexity compared to mechanistic modelling,thus improving the efficiency of process optimization or monitoring studies.However,the distillation process is highly nonlinear and has multiple uncertainty perturbation intervals,which brings challenges to accurate data-driven modelling of distillation processes.This paper proposes a systematic data-driven modelling framework to solve these problems.Firstly,data segment variance was introduced into the K-means algorithm to form K-means data interval(KMDI)clustering in order to cluster the data into perturbed and steady state intervals for steady-state data extraction.Secondly,maximal information coefficient(MIC)was employed to calculate the nonlinear correlation between variables for removing redundant features.Finally,extreme gradient boosting(XGBoost)was integrated as the basic learner into adaptive boosting(AdaBoost)with the error threshold(ET)set to improve weights update strategy to construct the new integrated learning algorithm,XGBoost-AdaBoost-ET.The superiority of the proposed framework is verified by applying this data-driven modelling framework to a real industrial process of propylene distillation.
文摘We propose a three-step technique to achieve this purpose. First, we utilize a collection of XML namespaces organized into hierarchical structure as a medium for expressing data semantics. Second, we define the format of resource descriptor for the information source discovery scheme so that we can dynamically register and/or deregister the Web data sources on the fly. Third, we employ an inverted-index mechanism to identify the subset of information sources that are relevant to a particular user query. We describe the design, architecture, and implementation of our approach—IWDS, and illustrate its use through case examples. Key words integration - heterogeneity - Web data source - XML namespace CLC number TP 311.13 Foundation item: Supported by the National Key Technologies R&D Program of China(2002BA103A04)Biography: WU Wei (1975-), male, Ph.D candidate, research direction: information integration, distribute computing