In this paper, an estimation method for reliability parameter in the case of zero-failuare data-synthetic estimation method is given. For zero-failure data of double-parameter exponential distribution, a hierarchical ...In this paper, an estimation method for reliability parameter in the case of zero-failuare data-synthetic estimation method is given. For zero-failure data of double-parameter exponential distribution, a hierarchical Bayesian estimation of the failure probability is presented. After failure information is introduced, hierarchical Bayesian estimation and synthetic estimation of the failure probability, as well as synthetic estimation of reliability are given. Calculation and analysis are performed regarding practical problems in case that life distribution of an engine obeys double-parameter exponential distribution.展开更多
This paper introduces a new method, E-Bayesian estimation method, to estimate the reliability in zero-failure data. The definition of E-Bayesian estimation of the reliability is given. Based on the definition,the form...This paper introduces a new method, E-Bayesian estimation method, to estimate the reliability in zero-failure data. The definition of E-Bayesian estimation of the reliability is given. Based on the definition,the formulas of E-Bayesian estimation and hierarchical Bayesian estimation of the reliability are provided, and property of the E-Bayesian estimation, i.e. relation between E-Bayesian estimation and hierarchical Bayesian estimation, is discussed. Calculations performed on practical problems show that the proposed new method is feasible and easy to operate.展开更多
The zero_failure data research is a new field in the recent years, but it is required urgently in practical projects, so the work has more theory and practical values. In this paper, for zero_failure data (t i,n i...The zero_failure data research is a new field in the recent years, but it is required urgently in practical projects, so the work has more theory and practical values. In this paper, for zero_failure data (t i,n i) at moment t i , if the prior distribution of the failure probability p i=p{T【t i} is quasi_exponential distribution, the author gives the p i Bayesian estimation and hierarchical Bayesian estimation and the reliability under zero_failure date condition is also obtained.展开更多
For many products,distributions of their life mostly comply with increasing failure rates in average(IFRA).Aiming to these distributions,using properties of IFRA classification,this paper gives a non-parametric method...For many products,distributions of their life mostly comply with increasing failure rates in average(IFRA).Aiming to these distributions,using properties of IFRA classification,this paper gives a non-parametric method for processing zero-failure data.Estimations of reliabilities in any time are first obtained,and based on a regression model of failure rates,estimations of reliability indexes are given.Finally,a practical example is processed with this method.展开更多
In this paper, for zero-fai1ure data (t,, n1), at moment ti, if the prior distribution of the failure probability p, = P {T<ti } is incomplete Fisher--Z distribution: Fisher-Z (0, λi; a, b), the author gives pi hi...In this paper, for zero-fai1ure data (t,, n1), at moment ti, if the prior distribution of the failure probability p, = P {T<ti } is incomplete Fisher--Z distribution: Fisher-Z (0, λi; a, b), the author gives pi hierarchical Biyesian estimation and the estimation of reliability under zero-failure data condition is obtained also. The author also gives a practical ca1culating example using the theory.展开更多
With the development of science and technology, the products reliability is higher and higher. So for high reliability products, zero\|failure data situation appears often in the time ended reliability tests. In this ...With the development of science and technology, the products reliability is higher and higher. So for high reliability products, zero\|failure data situation appears often in the time ended reliability tests. In this paper, the hierarchical Bayesian estimation of the products reliability is given under the conditions of the Binomial distribution with zero\|failure data and the prior distribution of the reliability being quasi\|Beta distribution. The authors also give a practical calculating example using the theory.展开更多
The rapid growth of biomedical data,particularly multi-omics data including genomes,transcriptomics,proteomics,metabolomics,and epigenomics,medical research and clinical decision-making confront both new opportunities...The rapid growth of biomedical data,particularly multi-omics data including genomes,transcriptomics,proteomics,metabolomics,and epigenomics,medical research and clinical decision-making confront both new opportunities and obstacles.The huge and diversified nature of these datasets cannot always be managed using traditional data analysis methods.As a consequence,deep learning has emerged as a strong tool for analysing numerous omics data due to its ability to handle complex and non-linear relationships.This paper explores the fundamental concepts of deep learning and how they are used in multi-omics medical data mining.We demonstrate how autoencoders,variational autoencoders,multimodal models,attention mechanisms,transformers,and graph neural networks enable pattern analysis and recognition across all omics data.Deep learning has been found to be effective in illness classification,biomarker identification,gene network learning,and therapeutic efficacy prediction.We also consider critical problems like as data quality,model explainability,whether findings can be repeated,and computational power requirements.We now consider future elements of combining omics with clinical and imaging data,explainable AI,federated learning,and real-time diagnostics.Overall,this study emphasises the need of collaborating across disciplines to advance deep learning-based multi-omics research for precision medicine and comprehending complicated disorders.展开更多
High-throughput transcriptomics has evolved from bulk RNA-seq to single-cell and spatial profiling,yet its clinical translation still depends on effective integration across diverse omics and data modalities.Emerging ...High-throughput transcriptomics has evolved from bulk RNA-seq to single-cell and spatial profiling,yet its clinical translation still depends on effective integration across diverse omics and data modalities.Emerging foundation models and multimodal learning frameworks are enabling scalable and transferable representations of cellular states,while advances in interpretability and real-world data integration are bridging the gap between discovery and clinical application.This paper outlines a concise roadmap for AI-driven,transcriptome-centered multi-omics integration in precision medicine(Figure 1).展开更多
Gastrointestinal tumors require personalized treatment strategies due to their heterogeneity and complexity.Multimodal artificial intelligence(AI)addresses this challenge by integrating diverse data sources-including ...Gastrointestinal tumors require personalized treatment strategies due to their heterogeneity and complexity.Multimodal artificial intelligence(AI)addresses this challenge by integrating diverse data sources-including computed tomography(CT),magnetic resonance imaging(MRI),endoscopic imaging,and genomic profiles-to enable intelligent decision-making for individualized therapy.This approach leverages AI algorithms to fuse imaging,endoscopic,and omics data,facilitating comprehensive characterization of tumor biology,prediction of treatment response,and optimization of therapeutic strategies.By combining CT and MRI for structural assessment,endoscopic data for real-time visual inspection,and genomic information for molecular profiling,multimodal AI enhances the accuracy of patient stratification and treatment personalization.The clinical implementation of this technology demonstrates potential for improving patient outcomes,advancing precision oncology,and supporting individualized care in gastrointestinal cancers.Ultimately,multimodal AI serves as a transformative tool in oncology,bridging data integration with clinical application to effectively tailor therapies.展开更多
Iced transmission line galloping poses a significant threat to the safety and reliability of power systems,leading directly to line tripping,disconnections,and power outages.Existing early warning methods of iced tran...Iced transmission line galloping poses a significant threat to the safety and reliability of power systems,leading directly to line tripping,disconnections,and power outages.Existing early warning methods of iced transmission line galloping suffer from issues such as reliance on a single data source,neglect of irregular time series,and lack of attention-based closed-loop feedback,resulting in high rates of missed and false alarms.To address these challenges,we propose an Internet of Things(IoT)empowered early warning method of transmission line galloping that integrates time series data from optical fiber sensing and weather forecast.Initially,the method applies a primary adaptive weighted fusion to the IoT empowered optical fiber real-time sensing data and weather forecast data,followed by a secondary fusion based on a Back Propagation(BP)neural network,and uses the K-medoids algorithm for clustering the fused data.Furthermore,an adaptive irregular time series perception adjustment module is introduced into the traditional Gated Recurrent Unit(GRU)network,and closed-loop feedback based on attentionmechanism is employed to update network parameters through gradient feedback of the loss function,enabling closed-loop training and time series data prediction of the GRU network model.Subsequently,considering various types of prediction data and the duration of icing,an iced transmission line galloping risk coefficient is established,and warnings are categorized based on this coefficient.Finally,using an IoT-driven realistic dataset of iced transmission line galloping,the effectiveness of the proposed method is validated through multi-dimensional simulation scenarios.展开更多
The Intelligent Internet of Things(IIoT)involves real-world things that communicate or interact with each other through networking technologies by collecting data from these“things”and using intelligent approaches,s...The Intelligent Internet of Things(IIoT)involves real-world things that communicate or interact with each other through networking technologies by collecting data from these“things”and using intelligent approaches,such as Artificial Intelligence(AI)and machine learning,to make accurate decisions.Data science is the science of dealing with data and its relationships through intelligent approaches.Most state-of-the-art research focuses independently on either data science or IIoT,rather than exploring their integration.Therefore,to address the gap,this article provides a comprehensive survey on the advances and integration of data science with the Intelligent IoT(IIoT)system by classifying the existing IoT-based data science techniques and presenting a summary of various characteristics.The paper analyzes the data science or big data security and privacy features,including network architecture,data protection,and continuous monitoring of data,which face challenges in various IoT-based systems.Extensive insights into IoT data security,privacy,and challenges are visualized in the context of data science for IoT.In addition,this study reveals the current opportunities to enhance data science and IoT market development.The current gap and challenges faced in the integration of data science and IoT are comprehensively presented,followed by the future outlook and possible solutions.展开更多
Viral infectious diseases,characterized by their intricate nature and wide-ranging diversity,pose substantial challenges in the domain of data management.The vast volume of data generated by these diseases,spanning fr...Viral infectious diseases,characterized by their intricate nature and wide-ranging diversity,pose substantial challenges in the domain of data management.The vast volume of data generated by these diseases,spanning from the molecular mechanisms within cells to large-scale epidemiological patterns,has surpassed the capabilities of traditional analytical methods.In the era of artificial intelligence(AI)and big data,there is an urgent necessity for the optimization of these analytical methods to more effectively handle and utilize the information.Despite the rapid accumulation of data associated with viral infections,the lack of a comprehensive framework for integrating,selecting,and analyzing these datasets has left numerous researchers uncertain about which data to select,how to access it,and how to utilize it most effectively in their research.This review endeavors to fill these gaps by exploring the multifaceted nature of viral infectious diseases and summarizing relevant data across multiple levels,from the molecular details of pathogens to broad epidemiological trends.The scope extends from the micro-scale to the macro-scale,encompassing pathogens,hosts,and vectors.In addition to data summarization,this review thoroughly investigates various dataset sources.It also traces the historical evolution of data collection in the field of viral infectious diseases,highlighting the progress achieved over time.Simultaneously,it evaluates the current limitations that impede data utilization.Furthermore,we propose strategies to surmount these challenges,focusing on the development and application of advanced computational techniques,AI-driven models,and enhanced data integration practices.By providing a comprehensive synthesis of existing knowledge,this review is designed to guide future research and contribute to more informed approaches in the surveillance,prevention,and control of viral infectious diseases,particularly within the context of the expanding big-data landscape.展开更多
Air pollution in China covers a large area with complex sources and formation mechanisms,making it a unique place to conduct air pollution and atmospheric chemistry research.The National Natural Science Foundation of ...Air pollution in China covers a large area with complex sources and formation mechanisms,making it a unique place to conduct air pollution and atmospheric chemistry research.The National Natural Science Foundation of China’s Major Research Plan entitled“Fundamental Researches on the Formation and Response Mechanism of the Air Pollution Complex in China”(or the Plan)has funded 76 research projects to explore the causes of air pollution in China,and the key processes of air pollution in atmospheric physics and atmospheric chemistry.In order to summarize the abundant data from the Plan and exhibit the long-term impacts domestically and internationally,an integration project is responsible for collecting the various types of data generated by the 76 projects of the Plan.This project has classified and integrated these data,forming eight categories containing 258 datasets and 15 technical reports in total.The integration project has led to the successful establishment of the China Air Pollution Data Center(CAPDC)platform,providing storage,retrieval,and download services for the eight categories.This platform has distinct features including data visualization,related project information querying,and bilingual services in both English and Chinese,which allows for rapid searching and downloading of data and provides a solid foundation of data and support for future related research.Air pollution control in China,especially in the past decade,is undeniably a global exemplar,and this data center is the first in China to focus on research into the country’s air pollution complex.展开更多
Imputation of missing data has long been an important topic and an essential application for intelligent transportation systems(ITS)in the real world.As a state-of-the-art generative model,the diffusion model has prov...Imputation of missing data has long been an important topic and an essential application for intelligent transportation systems(ITS)in the real world.As a state-of-the-art generative model,the diffusion model has proven highly successful in image generation,speech generation,time series modelling etc.and now opens a new avenue for traffic data imputation.In this paper,we propose a conditional diffusion model,called the implicit-explicit diffusion model,for traffic data imputation.This model exploits both the implicit and explicit feature of the data simultaneously.More specifically,we design two types of feature extraction modules,one to capture the implicit dependencies hidden in the raw data at multiple time scales and the other to obtain the long-term temporal dependencies of the time series.This approach not only inherits the advantages of the diffusion model for estimating missing data,but also takes into account the multiscale correlation inherent in traffic data.To illustrate the performance of the model,extensive experiments are conducted on three real-world time series datasets using different missing rates.The experimental results demonstrate that the model improves imputation accuracy and generalization capability.展开更多
As a new type of production factor in healthcare,healthcare data elements have been rapidly integrated into various health production processes,such as clinical assistance,health management,biological testing,and oper...As a new type of production factor in healthcare,healthcare data elements have been rapidly integrated into various health production processes,such as clinical assistance,health management,biological testing,and operation and supervision[1,2].Healthcare data elements include biolog.ical and clinical data that are related to disease,environ-mental health data that are associated with life,and operational and healthcare management data that are related to healthcare activities(Figure 1).Activities such as the construction of a data value assessment system,the devel-opment of a data circulation and sharing platform,and the authorization of data compliance and operation products support the strong growth momentum of the market for health care data elements in China[3].展开更多
As smart grid technology rapidly advances,the vast amount of user data collected by smart meter presents significant challenges in data security and privacy protection.Current research emphasizes data security and use...As smart grid technology rapidly advances,the vast amount of user data collected by smart meter presents significant challenges in data security and privacy protection.Current research emphasizes data security and user privacy concerns within smart grids.However,existing methods struggle with efficiency and security when processing large-scale data.Balancing efficient data processing with stringent privacy protection during data aggregation in smart grids remains an urgent challenge.This paper proposes an AI-based multi-type data aggregation method designed to enhance aggregation efficiency and security by standardizing and normalizing various data modalities.The approach optimizes data preprocessing,integrates Long Short-Term Memory(LSTM)networks for handling time-series data,and employs homomorphic encryption to safeguard user privacy.It also explores the application of Boneh Lynn Shacham(BLS)signatures for user authentication.The proposed scheme’s efficiency,security,and privacy protection capabilities are validated through rigorous security proofs and experimental analysis.展开更多
The data production elements are driving profound transformations in the real economy across production objects,methods,and tools,generating significant economic effects such as industrial structure upgrading.This pap...The data production elements are driving profound transformations in the real economy across production objects,methods,and tools,generating significant economic effects such as industrial structure upgrading.This paper aims to reveal the impact mechanism of the data elements on the“three transformations”(high-end,intelligent,and green)in the manufacturing sector,theoretically elucidating the intrinsic mechanisms by which the data elements influence these transformations.The study finds that the data elements significantly enhance the high-end,intelligent,and green levels of China's manufacturing industry.In terms of the pathways of impact,the data elements primarily influence the development of high-tech industries and overall green technological innovation,thereby affecting the high-end,intelligent,and green transformation of the industry.展开更多
Earth’s internal core and crustal magnetic fields,as measured by geomagnetic satellites like MSS-1(Macao Science Satellite-1)and Swarm,are vital for understanding core dynamics and tectonic evolution.To model these i...Earth’s internal core and crustal magnetic fields,as measured by geomagnetic satellites like MSS-1(Macao Science Satellite-1)and Swarm,are vital for understanding core dynamics and tectonic evolution.To model these internal magnetic fields accurately,data selection based on specific criteria is often employed to minimize the influence of rapidly changing current systems in the ionosphere and magnetosphere.However,the quantitative impact of various data selection criteria on internal geomagnetic field modeling is not well understood.This study aims to address this issue and provide a reference for constructing and applying geomagnetic field models.First,we collect the latest MSS-1 and Swarm satellite magnetic data and summarize widely used data selection criteria in geomagnetic field modeling.Second,we briefly describe the method to co-estimate the core,crustal,and large-scale magnetospheric fields using satellite magnetic data.Finally,we conduct a series of field modeling experiments with different data selection criteria to quantitatively estimate their influence.Our numerical experiments confirm that without selecting data from dark regions and geomagnetically quiet times,the resulting internal field differences at the Earth’s surface can range from tens to hundreds of nanotesla(nT).Additionally,we find that the uncertainties introduced into field models by different data selection criteria are significantly larger than the measurement accuracy of modern geomagnetic satellites.These uncertainties should be considered when utilizing constructed magnetic field models for scientific research and applications.展开更多
Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subse...Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subsets via hierarchical clustering,but objective methods to determine the appropriate classification granularity are missing.We recently introduced a technique to systematically identify when to stop subdividing clusters based on the fundamental principle that cells must differ more between than within clusters.Here we present the corresponding protocol to classify cellular datasets by combining datadriven unsupervised hierarchical clustering with statistical testing.These general-purpose functions are applicable to any cellular dataset that can be organized as two-dimensional matrices of numerical values,including molecula r,physiological,and anatomical datasets.We demonstrate the protocol using cellular data from the Janelia MouseLight project to chara cterize morphological aspects of neurons.展开更多
On October 18,2017,the 19th National Congress Report called for the implementation of the Healthy China Strategy.The development of biomedical data plays a pivotal role in advancing this strategy.Since the 18th Nation...On October 18,2017,the 19th National Congress Report called for the implementation of the Healthy China Strategy.The development of biomedical data plays a pivotal role in advancing this strategy.Since the 18th National Congress of the Communist Party of China,China has vigorously promoted the integration and implementation of the Healthy China and Digital China strategies.The National Health Commission has prioritized the development of health and medical big data,issuing policies to promote standardized applica-tions and foster innovation in"Internet+Healthcare."Biomedical data has significantly contributed to preci-sion medicine,personalized health management,drug development,disease diagnosis,public health monitor-ing,and epidemic prediction capabilities.展开更多
文摘In this paper, an estimation method for reliability parameter in the case of zero-failuare data-synthetic estimation method is given. For zero-failure data of double-parameter exponential distribution, a hierarchical Bayesian estimation of the failure probability is presented. After failure information is introduced, hierarchical Bayesian estimation and synthetic estimation of the failure probability, as well as synthetic estimation of reliability are given. Calculation and analysis are performed regarding practical problems in case that life distribution of an engine obeys double-parameter exponential distribution.
基金the Ningbo University of Technology Science Foundation and Ningbo Natural Science Foundation(No.2013A610108)
文摘This paper introduces a new method, E-Bayesian estimation method, to estimate the reliability in zero-failure data. The definition of E-Bayesian estimation of the reliability is given. Based on the definition,the formulas of E-Bayesian estimation and hierarchical Bayesian estimation of the reliability are provided, and property of the E-Bayesian estimation, i.e. relation between E-Bayesian estimation and hierarchical Bayesian estimation, is discussed. Calculations performed on practical problems show that the proposed new method is feasible and easy to operate.
文摘The zero_failure data research is a new field in the recent years, but it is required urgently in practical projects, so the work has more theory and practical values. In this paper, for zero_failure data (t i,n i) at moment t i , if the prior distribution of the failure probability p i=p{T【t i} is quasi_exponential distribution, the author gives the p i Bayesian estimation and hierarchical Bayesian estimation and the reliability under zero_failure date condition is also obtained.
文摘For many products,distributions of their life mostly comply with increasing failure rates in average(IFRA).Aiming to these distributions,using properties of IFRA classification,this paper gives a non-parametric method for processing zero-failure data.Estimations of reliabilities in any time are first obtained,and based on a regression model of failure rates,estimations of reliability indexes are given.Finally,a practical example is processed with this method.
文摘In this paper, for zero-fai1ure data (t,, n1), at moment ti, if the prior distribution of the failure probability p, = P {T<ti } is incomplete Fisher--Z distribution: Fisher-Z (0, λi; a, b), the author gives pi hierarchical Biyesian estimation and the estimation of reliability under zero-failure data condition is obtained also. The author also gives a practical ca1culating example using the theory.
文摘With the development of science and technology, the products reliability is higher and higher. So for high reliability products, zero\|failure data situation appears often in the time ended reliability tests. In this paper, the hierarchical Bayesian estimation of the products reliability is given under the conditions of the Binomial distribution with zero\|failure data and the prior distribution of the reliability being quasi\|Beta distribution. The authors also give a practical calculating example using the theory.
文摘The rapid growth of biomedical data,particularly multi-omics data including genomes,transcriptomics,proteomics,metabolomics,and epigenomics,medical research and clinical decision-making confront both new opportunities and obstacles.The huge and diversified nature of these datasets cannot always be managed using traditional data analysis methods.As a consequence,deep learning has emerged as a strong tool for analysing numerous omics data due to its ability to handle complex and non-linear relationships.This paper explores the fundamental concepts of deep learning and how they are used in multi-omics medical data mining.We demonstrate how autoencoders,variational autoencoders,multimodal models,attention mechanisms,transformers,and graph neural networks enable pattern analysis and recognition across all omics data.Deep learning has been found to be effective in illness classification,biomarker identification,gene network learning,and therapeutic efficacy prediction.We also consider critical problems like as data quality,model explainability,whether findings can be repeated,and computational power requirements.We now consider future elements of combining omics with clinical and imaging data,explainable AI,federated learning,and real-time diagnostics.Overall,this study emphasises the need of collaborating across disciplines to advance deep learning-based multi-omics research for precision medicine and comprehending complicated disorders.
文摘High-throughput transcriptomics has evolved from bulk RNA-seq to single-cell and spatial profiling,yet its clinical translation still depends on effective integration across diverse omics and data modalities.Emerging foundation models and multimodal learning frameworks are enabling scalable and transferable representations of cellular states,while advances in interpretability and real-world data integration are bridging the gap between discovery and clinical application.This paper outlines a concise roadmap for AI-driven,transcriptome-centered multi-omics integration in precision medicine(Figure 1).
基金Supported by Xuhui District Health Commission,No.SHXH202214.
文摘Gastrointestinal tumors require personalized treatment strategies due to their heterogeneity and complexity.Multimodal artificial intelligence(AI)addresses this challenge by integrating diverse data sources-including computed tomography(CT),magnetic resonance imaging(MRI),endoscopic imaging,and genomic profiles-to enable intelligent decision-making for individualized therapy.This approach leverages AI algorithms to fuse imaging,endoscopic,and omics data,facilitating comprehensive characterization of tumor biology,prediction of treatment response,and optimization of therapeutic strategies.By combining CT and MRI for structural assessment,endoscopic data for real-time visual inspection,and genomic information for molecular profiling,multimodal AI enhances the accuracy of patient stratification and treatment personalization.The clinical implementation of this technology demonstrates potential for improving patient outcomes,advancing precision oncology,and supporting individualized care in gastrointestinal cancers.Ultimately,multimodal AI serves as a transformative tool in oncology,bridging data integration with clinical application to effectively tailor therapies.
基金research was funded by Science and Technology Project of State Grid Corporation of China under grant number 5200-202319382A-2-3-XG.
文摘Iced transmission line galloping poses a significant threat to the safety and reliability of power systems,leading directly to line tripping,disconnections,and power outages.Existing early warning methods of iced transmission line galloping suffer from issues such as reliance on a single data source,neglect of irregular time series,and lack of attention-based closed-loop feedback,resulting in high rates of missed and false alarms.To address these challenges,we propose an Internet of Things(IoT)empowered early warning method of transmission line galloping that integrates time series data from optical fiber sensing and weather forecast.Initially,the method applies a primary adaptive weighted fusion to the IoT empowered optical fiber real-time sensing data and weather forecast data,followed by a secondary fusion based on a Back Propagation(BP)neural network,and uses the K-medoids algorithm for clustering the fused data.Furthermore,an adaptive irregular time series perception adjustment module is introduced into the traditional Gated Recurrent Unit(GRU)network,and closed-loop feedback based on attentionmechanism is employed to update network parameters through gradient feedback of the loss function,enabling closed-loop training and time series data prediction of the GRU network model.Subsequently,considering various types of prediction data and the duration of icing,an iced transmission line galloping risk coefficient is established,and warnings are categorized based on this coefficient.Finally,using an IoT-driven realistic dataset of iced transmission line galloping,the effectiveness of the proposed method is validated through multi-dimensional simulation scenarios.
基金supported in part by the National Natural Science Foundation of China under Grant 62371181in part by the Changzhou Science and Technology International Cooperation Program under Grant CZ20230029+1 种基金supported by a National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(2021R1A2B5B02087169)supported under the framework of international cooperation program managed by the National Research Foundation of Korea(2022K2A9A1A01098051)。
文摘The Intelligent Internet of Things(IIoT)involves real-world things that communicate or interact with each other through networking technologies by collecting data from these“things”and using intelligent approaches,such as Artificial Intelligence(AI)and machine learning,to make accurate decisions.Data science is the science of dealing with data and its relationships through intelligent approaches.Most state-of-the-art research focuses independently on either data science or IIoT,rather than exploring their integration.Therefore,to address the gap,this article provides a comprehensive survey on the advances and integration of data science with the Intelligent IoT(IIoT)system by classifying the existing IoT-based data science techniques and presenting a summary of various characteristics.The paper analyzes the data science or big data security and privacy features,including network architecture,data protection,and continuous monitoring of data,which face challenges in various IoT-based systems.Extensive insights into IoT data security,privacy,and challenges are visualized in the context of data science for IoT.In addition,this study reveals the current opportunities to enhance data science and IoT market development.The current gap and challenges faced in the integration of data science and IoT are comprehensively presented,followed by the future outlook and possible solutions.
基金supported by the National Natural Science Foundation of China(32370703)the CAMS Innovation Fund for Medical Sciences(CIFMS)(2022-I2M-1-021,2021-I2M-1-061)the Major Project of Guangzhou National Labora-tory(GZNL2024A01015).
文摘Viral infectious diseases,characterized by their intricate nature and wide-ranging diversity,pose substantial challenges in the domain of data management.The vast volume of data generated by these diseases,spanning from the molecular mechanisms within cells to large-scale epidemiological patterns,has surpassed the capabilities of traditional analytical methods.In the era of artificial intelligence(AI)and big data,there is an urgent necessity for the optimization of these analytical methods to more effectively handle and utilize the information.Despite the rapid accumulation of data associated with viral infections,the lack of a comprehensive framework for integrating,selecting,and analyzing these datasets has left numerous researchers uncertain about which data to select,how to access it,and how to utilize it most effectively in their research.This review endeavors to fill these gaps by exploring the multifaceted nature of viral infectious diseases and summarizing relevant data across multiple levels,from the molecular details of pathogens to broad epidemiological trends.The scope extends from the micro-scale to the macro-scale,encompassing pathogens,hosts,and vectors.In addition to data summarization,this review thoroughly investigates various dataset sources.It also traces the historical evolution of data collection in the field of viral infectious diseases,highlighting the progress achieved over time.Simultaneously,it evaluates the current limitations that impede data utilization.Furthermore,we propose strategies to surmount these challenges,focusing on the development and application of advanced computational techniques,AI-driven models,and enhanced data integration practices.By providing a comprehensive synthesis of existing knowledge,this review is designed to guide future research and contribute to more informed approaches in the surveillance,prevention,and control of viral infectious diseases,particularly within the context of the expanding big-data landscape.
基金supported by the National Natural Science Foundation of China(Grant No.92044303)。
文摘Air pollution in China covers a large area with complex sources and formation mechanisms,making it a unique place to conduct air pollution and atmospheric chemistry research.The National Natural Science Foundation of China’s Major Research Plan entitled“Fundamental Researches on the Formation and Response Mechanism of the Air Pollution Complex in China”(or the Plan)has funded 76 research projects to explore the causes of air pollution in China,and the key processes of air pollution in atmospheric physics and atmospheric chemistry.In order to summarize the abundant data from the Plan and exhibit the long-term impacts domestically and internationally,an integration project is responsible for collecting the various types of data generated by the 76 projects of the Plan.This project has classified and integrated these data,forming eight categories containing 258 datasets and 15 technical reports in total.The integration project has led to the successful establishment of the China Air Pollution Data Center(CAPDC)platform,providing storage,retrieval,and download services for the eight categories.This platform has distinct features including data visualization,related project information querying,and bilingual services in both English and Chinese,which allows for rapid searching and downloading of data and provides a solid foundation of data and support for future related research.Air pollution control in China,especially in the past decade,is undeniably a global exemplar,and this data center is the first in China to focus on research into the country’s air pollution complex.
基金partially supported by the National Natural Science Foundation of China(62271485)the SDHS Science and Technology Project(HS2023B044)
文摘Imputation of missing data has long been an important topic and an essential application for intelligent transportation systems(ITS)in the real world.As a state-of-the-art generative model,the diffusion model has proven highly successful in image generation,speech generation,time series modelling etc.and now opens a new avenue for traffic data imputation.In this paper,we propose a conditional diffusion model,called the implicit-explicit diffusion model,for traffic data imputation.This model exploits both the implicit and explicit feature of the data simultaneously.More specifically,we design two types of feature extraction modules,one to capture the implicit dependencies hidden in the raw data at multiple time scales and the other to obtain the long-term temporal dependencies of the time series.This approach not only inherits the advantages of the diffusion model for estimating missing data,but also takes into account the multiscale correlation inherent in traffic data.To illustrate the performance of the model,extensive experiments are conducted on three real-world time series datasets using different missing rates.The experimental results demonstrate that the model improves imputation accuracy and generalization capability.
基金supported by National Natural Science Foundation of China(Grants 72474022,71974011,72174022,71972012,71874009)"BIT think tank"Promotion Plan of Science and Technology Innovation Program of Beijing Institute of Technology(Grants 2024CX14017,2023CX13029).
文摘As a new type of production factor in healthcare,healthcare data elements have been rapidly integrated into various health production processes,such as clinical assistance,health management,biological testing,and operation and supervision[1,2].Healthcare data elements include biolog.ical and clinical data that are related to disease,environ-mental health data that are associated with life,and operational and healthcare management data that are related to healthcare activities(Figure 1).Activities such as the construction of a data value assessment system,the devel-opment of a data circulation and sharing platform,and the authorization of data compliance and operation products support the strong growth momentum of the market for health care data elements in China[3].
基金supported by the National Key R&D Program of China(No.2023YFB2703700)the National Natural Science Foundation of China(Nos.U21A20465,62302457,62402444,62172292)+4 种基金the Fundamental Research Funds of Zhejiang Sci-Tech University(Nos.23222092-Y,22222266-Y)the Program for Leading Innovative Research Team of Zhejiang Province(No.2023R01001)the Zhejiang Provincial Natural Science Foundation of China(Nos.LQ24F020008,LQ24F020012)the Foundation of State Key Laboratory of Public Big Data(No.[2022]417)the“Pioneer”and“Leading Goose”R&D Program of Zhejiang(No.2023C01119).
文摘As smart grid technology rapidly advances,the vast amount of user data collected by smart meter presents significant challenges in data security and privacy protection.Current research emphasizes data security and user privacy concerns within smart grids.However,existing methods struggle with efficiency and security when processing large-scale data.Balancing efficient data processing with stringent privacy protection during data aggregation in smart grids remains an urgent challenge.This paper proposes an AI-based multi-type data aggregation method designed to enhance aggregation efficiency and security by standardizing and normalizing various data modalities.The approach optimizes data preprocessing,integrates Long Short-Term Memory(LSTM)networks for handling time-series data,and employs homomorphic encryption to safeguard user privacy.It also explores the application of Boneh Lynn Shacham(BLS)signatures for user authentication.The proposed scheme’s efficiency,security,and privacy protection capabilities are validated through rigorous security proofs and experimental analysis.
文摘The data production elements are driving profound transformations in the real economy across production objects,methods,and tools,generating significant economic effects such as industrial structure upgrading.This paper aims to reveal the impact mechanism of the data elements on the“three transformations”(high-end,intelligent,and green)in the manufacturing sector,theoretically elucidating the intrinsic mechanisms by which the data elements influence these transformations.The study finds that the data elements significantly enhance the high-end,intelligent,and green levels of China's manufacturing industry.In terms of the pathways of impact,the data elements primarily influence the development of high-tech industries and overall green technological innovation,thereby affecting the high-end,intelligent,and green transformation of the industry.
基金supported by the National Natural Science Foundation of China(42250101)the Macao Foundation。
文摘Earth’s internal core and crustal magnetic fields,as measured by geomagnetic satellites like MSS-1(Macao Science Satellite-1)and Swarm,are vital for understanding core dynamics and tectonic evolution.To model these internal magnetic fields accurately,data selection based on specific criteria is often employed to minimize the influence of rapidly changing current systems in the ionosphere and magnetosphere.However,the quantitative impact of various data selection criteria on internal geomagnetic field modeling is not well understood.This study aims to address this issue and provide a reference for constructing and applying geomagnetic field models.First,we collect the latest MSS-1 and Swarm satellite magnetic data and summarize widely used data selection criteria in geomagnetic field modeling.Second,we briefly describe the method to co-estimate the core,crustal,and large-scale magnetospheric fields using satellite magnetic data.Finally,we conduct a series of field modeling experiments with different data selection criteria to quantitatively estimate their influence.Our numerical experiments confirm that without selecting data from dark regions and geomagnetically quiet times,the resulting internal field differences at the Earth’s surface can range from tens to hundreds of nanotesla(nT).Additionally,we find that the uncertainties introduced into field models by different data selection criteria are significantly larger than the measurement accuracy of modern geomagnetic satellites.These uncertainties should be considered when utilizing constructed magnetic field models for scientific research and applications.
基金supported in part by NIH grants R01NS39600,U01MH114829RF1MH128693(to GAA)。
文摘Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subsets via hierarchical clustering,but objective methods to determine the appropriate classification granularity are missing.We recently introduced a technique to systematically identify when to stop subdividing clusters based on the fundamental principle that cells must differ more between than within clusters.Here we present the corresponding protocol to classify cellular datasets by combining datadriven unsupervised hierarchical clustering with statistical testing.These general-purpose functions are applicable to any cellular dataset that can be organized as two-dimensional matrices of numerical values,including molecula r,physiological,and anatomical datasets.We demonstrate the protocol using cellular data from the Janelia MouseLight project to chara cterize morphological aspects of neurons.
文摘On October 18,2017,the 19th National Congress Report called for the implementation of the Healthy China Strategy.The development of biomedical data plays a pivotal role in advancing this strategy.Since the 18th National Congress of the Communist Party of China,China has vigorously promoted the integration and implementation of the Healthy China and Digital China strategies.The National Health Commission has prioritized the development of health and medical big data,issuing policies to promote standardized applica-tions and foster innovation in"Internet+Healthcare."Biomedical data has significantly contributed to preci-sion medicine,personalized health management,drug development,disease diagnosis,public health monitor-ing,and epidemic prediction capabilities.