Viral infectious diseases,characterized by their intricate nature and wide-ranging diversity,pose substantial challenges in the domain of data management.The vast volume of data generated by these diseases,spanning fr...Viral infectious diseases,characterized by their intricate nature and wide-ranging diversity,pose substantial challenges in the domain of data management.The vast volume of data generated by these diseases,spanning from the molecular mechanisms within cells to large-scale epidemiological patterns,has surpassed the capabilities of traditional analytical methods.In the era of artificial intelligence(AI)and big data,there is an urgent necessity for the optimization of these analytical methods to more effectively handle and utilize the information.Despite the rapid accumulation of data associated with viral infections,the lack of a comprehensive framework for integrating,selecting,and analyzing these datasets has left numerous researchers uncertain about which data to select,how to access it,and how to utilize it most effectively in their research.This review endeavors to fill these gaps by exploring the multifaceted nature of viral infectious diseases and summarizing relevant data across multiple levels,from the molecular details of pathogens to broad epidemiological trends.The scope extends from the micro-scale to the macro-scale,encompassing pathogens,hosts,and vectors.In addition to data summarization,this review thoroughly investigates various dataset sources.It also traces the historical evolution of data collection in the field of viral infectious diseases,highlighting the progress achieved over time.Simultaneously,it evaluates the current limitations that impede data utilization.Furthermore,we propose strategies to surmount these challenges,focusing on the development and application of advanced computational techniques,AI-driven models,and enhanced data integration practices.By providing a comprehensive synthesis of existing knowledge,this review is designed to guide future research and contribute to more informed approaches in the surveillance,prevention,and control of viral infectious diseases,particularly within the context of the expanding big-data landscape.展开更多
Multi-fidelity Data Fusion(MDF)frameworks have emerged as a prominent approach to producing economical but accurate surrogate models for aerodynamic data modeling by integrating data with different fidelity levels.How...Multi-fidelity Data Fusion(MDF)frameworks have emerged as a prominent approach to producing economical but accurate surrogate models for aerodynamic data modeling by integrating data with different fidelity levels.However,most existing MDF frameworks assume a uniform data structure between sampling data sources;thus,producing an accurate solution at the required level,for cases of non-uniform data structures is challenging.To address this challenge,an Adaptive Multi-fidelity Data Fusion(AMDF)framework is proposed to produce a composite surrogate model which can efficiently model multi-fidelity data featuring non-uniform structures.Firstly,the design space of the input data with non-uniform data structures is decomposed into subdomains containing simplified structures.Secondly,different MDF frameworks and a rule-based selection process are adopted to construct multiple local models for the subdomain data.On the other hand,the Enhanced Local Fidelity Modeling(ELFM)method is proposed to combine the generated local models into a unique and continuous global model.Finally,the resulting model inherits the features of local models and approximates a complete database for the whole design space.The validation of the proposed framework is performed to demonstrate its approximation capabilities in(A)four multi-dimensional analytical problems and(B)a practical engineering case study of constructing an F16C fighter aircraft’s aerodynamic database.Accuracy comparisons of the generated models using the proposed AMDF framework and conventional MDF approaches using a single global modeling algorithm are performed to reveal the adaptability of the proposed approach for fusing multi-fidelity data featuring non-uniform structures.Indeed,the results indicated that the proposed framework outperforms the state-of-the-art MDF approach in the cases of non-uniform data.展开更多
One promising means to reduce building energy for a more sustainable environment is to conduct early-stage building energy optimization using simulation,yet today’s simulation engines are computationally intensive.Re...One promising means to reduce building energy for a more sustainable environment is to conduct early-stage building energy optimization using simulation,yet today’s simulation engines are computationally intensive.Recently,machine learning(ML)energy prediction models have shown promise in replacing these simulation engines.However,it is often difficult to develop such ML models due to the lack of proper datasets.Synthetic datasets can provide a solution,but determining the optimal quantity and diversity of synthetic data remains a challenging task.Furthermore,there is a lack of understanding of the compatibility between different ML algorithms and the characteristics of synthetic datasets.To fill these gaps,this study conducted multiple ML experiments using residential buildings in Sweden to determine the best-performing ML algorithm,as well as the characteristics of the corresponding synthetic dataset.A parametric model was developed to generate a wide range of synthetic datasets varying in size and building shape,referred to as diversity.Five ML algorithms selected through a literature review were trained using the different datasets.Results show that the Support Vector Machine performed the best overall.Multiple Linear Regression performed well with small and lowdiverse datasets,while the Artificial Neural Network performed well with large and high-diverse datasets.We conclude that developers should focus more on increasing diversity instead of size once the dataset size reaches around 1440 when generating synthetic training datasets.This study offers insights for researchers and practitioners,such as software tool developers,when developing ML building energy prediction models in early-stage optimization.展开更多
The idea of accurately modeling life within a computer is no longer science fiction;it is becoming a reality through the rise of the virtual cell.Over the past few years,fueled by advances in single-cell and spatial o...The idea of accurately modeling life within a computer is no longer science fiction;it is becoming a reality through the rise of the virtual cell.Over the past few years,fueled by advances in single-cell and spatial omics,artificial intelligence(AI),and high-performance computing,virtual cells have rapidly evolved from abstract concepts into practical tools with the power to reshape biomedical research.Building on earlier,more constrained attempts at integration,today’s virtual cells can merge diverse data streams with sophisticated computational models,enabling comprehensive simulations of cellular structure,function,and behavior.1,2 In doing so,they provide an unprecedented platform for reconstructing and manipulating life and open transformative opportunities for intelligent oncology.The core technical framework,data foundations,and key potential application areas of virtual cells in intelligent oncology are illustrated in Figure 1.展开更多
With the progression of modern information techniques,such as next generation sequencing(NGS),Internet of Everything(IoE)based smart sensors,and artificial intelligence algorithms,data-intensive research and applicati...With the progression of modern information techniques,such as next generation sequencing(NGS),Internet of Everything(IoE)based smart sensors,and artificial intelligence algorithms,data-intensive research and applications are emerging as the fourth paradigm for scientific discovery.However,we facemany challenges to practical application of this paradigm.In this article,10 challenges to data-intensive discovery and applications in precision medicine and healthcare are summarized and the future perspectives on next generation medicine are discussed.展开更多
基金supported by the National Natural Science Foundation of China(32370703)the CAMS Innovation Fund for Medical Sciences(CIFMS)(2022-I2M-1-021,2021-I2M-1-061)the Major Project of Guangzhou National Labora-tory(GZNL2024A01015).
文摘Viral infectious diseases,characterized by their intricate nature and wide-ranging diversity,pose substantial challenges in the domain of data management.The vast volume of data generated by these diseases,spanning from the molecular mechanisms within cells to large-scale epidemiological patterns,has surpassed the capabilities of traditional analytical methods.In the era of artificial intelligence(AI)and big data,there is an urgent necessity for the optimization of these analytical methods to more effectively handle and utilize the information.Despite the rapid accumulation of data associated with viral infections,the lack of a comprehensive framework for integrating,selecting,and analyzing these datasets has left numerous researchers uncertain about which data to select,how to access it,and how to utilize it most effectively in their research.This review endeavors to fill these gaps by exploring the multifaceted nature of viral infectious diseases and summarizing relevant data across multiple levels,from the molecular details of pathogens to broad epidemiological trends.The scope extends from the micro-scale to the macro-scale,encompassing pathogens,hosts,and vectors.In addition to data summarization,this review thoroughly investigates various dataset sources.It also traces the historical evolution of data collection in the field of viral infectious diseases,highlighting the progress achieved over time.Simultaneously,it evaluates the current limitations that impede data utilization.Furthermore,we propose strategies to surmount these challenges,focusing on the development and application of advanced computational techniques,AI-driven models,and enhanced data integration practices.By providing a comprehensive synthesis of existing knowledge,this review is designed to guide future research and contribute to more informed approaches in the surveillance,prevention,and control of viral infectious diseases,particularly within the context of the expanding big-data landscape.
基金supported by the Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(No.2020R1A6A1A03046811).This paper was also supported by Konkuk University Researcher Fund in 2021.
文摘Multi-fidelity Data Fusion(MDF)frameworks have emerged as a prominent approach to producing economical but accurate surrogate models for aerodynamic data modeling by integrating data with different fidelity levels.However,most existing MDF frameworks assume a uniform data structure between sampling data sources;thus,producing an accurate solution at the required level,for cases of non-uniform data structures is challenging.To address this challenge,an Adaptive Multi-fidelity Data Fusion(AMDF)framework is proposed to produce a composite surrogate model which can efficiently model multi-fidelity data featuring non-uniform structures.Firstly,the design space of the input data with non-uniform data structures is decomposed into subdomains containing simplified structures.Secondly,different MDF frameworks and a rule-based selection process are adopted to construct multiple local models for the subdomain data.On the other hand,the Enhanced Local Fidelity Modeling(ELFM)method is proposed to combine the generated local models into a unique and continuous global model.Finally,the resulting model inherits the features of local models and approximates a complete database for the whole design space.The validation of the proposed framework is performed to demonstrate its approximation capabilities in(A)four multi-dimensional analytical problems and(B)a practical engineering case study of constructing an F16C fighter aircraft’s aerodynamic database.Accuracy comparisons of the generated models using the proposed AMDF framework and conventional MDF approaches using a single global modeling algorithm are performed to reveal the adaptability of the proposed approach for fusing multi-fidelity data featuring non-uniform structures.Indeed,the results indicated that the proposed framework outperforms the state-of-the-art MDF approach in the cases of non-uniform data.
文摘One promising means to reduce building energy for a more sustainable environment is to conduct early-stage building energy optimization using simulation,yet today’s simulation engines are computationally intensive.Recently,machine learning(ML)energy prediction models have shown promise in replacing these simulation engines.However,it is often difficult to develop such ML models due to the lack of proper datasets.Synthetic datasets can provide a solution,but determining the optimal quantity and diversity of synthetic data remains a challenging task.Furthermore,there is a lack of understanding of the compatibility between different ML algorithms and the characteristics of synthetic datasets.To fill these gaps,this study conducted multiple ML experiments using residential buildings in Sweden to determine the best-performing ML algorithm,as well as the characteristics of the corresponding synthetic dataset.A parametric model was developed to generate a wide range of synthetic datasets varying in size and building shape,referred to as diversity.Five ML algorithms selected through a literature review were trained using the different datasets.Results show that the Support Vector Machine performed the best overall.Multiple Linear Regression performed well with small and lowdiverse datasets,while the Artificial Neural Network performed well with large and high-diverse datasets.We conclude that developers should focus more on increasing diversity instead of size once the dataset size reaches around 1440 when generating synthetic training datasets.This study offers insights for researchers and practitioners,such as software tool developers,when developing ML building energy prediction models in early-stage optimization.
文摘The idea of accurately modeling life within a computer is no longer science fiction;it is becoming a reality through the rise of the virtual cell.Over the past few years,fueled by advances in single-cell and spatial omics,artificial intelligence(AI),and high-performance computing,virtual cells have rapidly evolved from abstract concepts into practical tools with the power to reshape biomedical research.Building on earlier,more constrained attempts at integration,today’s virtual cells can merge diverse data streams with sophisticated computational models,enabling comprehensive simulations of cellular structure,function,and behavior.1,2 In doing so,they provide an unprecedented platform for reconstructing and manipulating life and open transformative opportunities for intelligent oncology.The core technical framework,data foundations,and key potential application areas of virtual cells in intelligent oncology are illustrated in Figure 1.
基金This work was supported by the regional innovation cooperation between Sichuan and Guangxi Provinces(Grant No.2020YFQ0019)the National Natural Science Foundation of China(Grant No.32070671).
文摘With the progression of modern information techniques,such as next generation sequencing(NGS),Internet of Everything(IoE)based smart sensors,and artificial intelligence algorithms,data-intensive research and applications are emerging as the fourth paradigm for scientific discovery.However,we facemany challenges to practical application of this paradigm.In this article,10 challenges to data-intensive discovery and applications in precision medicine and healthcare are summarized and the future perspectives on next generation medicine are discussed.