The rapid growth of biomedical data,particularly multi-omics data including genomes,transcriptomics,proteomics,metabolomics,and epigenomics,medical research and clinical decision-making confront both new opportunities...The rapid growth of biomedical data,particularly multi-omics data including genomes,transcriptomics,proteomics,metabolomics,and epigenomics,medical research and clinical decision-making confront both new opportunities and obstacles.The huge and diversified nature of these datasets cannot always be managed using traditional data analysis methods.As a consequence,deep learning has emerged as a strong tool for analysing numerous omics data due to its ability to handle complex and non-linear relationships.This paper explores the fundamental concepts of deep learning and how they are used in multi-omics medical data mining.We demonstrate how autoencoders,variational autoencoders,multimodal models,attention mechanisms,transformers,and graph neural networks enable pattern analysis and recognition across all omics data.Deep learning has been found to be effective in illness classification,biomarker identification,gene network learning,and therapeutic efficacy prediction.We also consider critical problems like as data quality,model explainability,whether findings can be repeated,and computational power requirements.We now consider future elements of combining omics with clinical and imaging data,explainable AI,federated learning,and real-time diagnostics.Overall,this study emphasises the need of collaborating across disciplines to advance deep learning-based multi-omics research for precision medicine and comprehending complicated disorders.展开更多
High-throughput transcriptomics has evolved from bulk RNA-seq to single-cell and spatial profiling,yet its clinical translation still depends on effective integration across diverse omics and data modalities.Emerging ...High-throughput transcriptomics has evolved from bulk RNA-seq to single-cell and spatial profiling,yet its clinical translation still depends on effective integration across diverse omics and data modalities.Emerging foundation models and multimodal learning frameworks are enabling scalable and transferable representations of cellular states,while advances in interpretability and real-world data integration are bridging the gap between discovery and clinical application.This paper outlines a concise roadmap for AI-driven,transcriptome-centered multi-omics integration in precision medicine(Figure 1).展开更多
Gastrointestinal tumors require personalized treatment strategies due to their heterogeneity and complexity.Multimodal artificial intelligence(AI)addresses this challenge by integrating diverse data sources-including ...Gastrointestinal tumors require personalized treatment strategies due to their heterogeneity and complexity.Multimodal artificial intelligence(AI)addresses this challenge by integrating diverse data sources-including computed tomography(CT),magnetic resonance imaging(MRI),endoscopic imaging,and genomic profiles-to enable intelligent decision-making for individualized therapy.This approach leverages AI algorithms to fuse imaging,endoscopic,and omics data,facilitating comprehensive characterization of tumor biology,prediction of treatment response,and optimization of therapeutic strategies.By combining CT and MRI for structural assessment,endoscopic data for real-time visual inspection,and genomic information for molecular profiling,multimodal AI enhances the accuracy of patient stratification and treatment personalization.The clinical implementation of this technology demonstrates potential for improving patient outcomes,advancing precision oncology,and supporting individualized care in gastrointestinal cancers.Ultimately,multimodal AI serves as a transformative tool in oncology,bridging data integration with clinical application to effectively tailor therapies.展开更多
By employing the unique phenological feature of winter wheat extracted from peak before winter (PBW) and the advantages of moderate resolution imaging spectroradiometer (MODIS) data with high temporal resolution a...By employing the unique phenological feature of winter wheat extracted from peak before winter (PBW) and the advantages of moderate resolution imaging spectroradiometer (MODIS) data with high temporal resolution and intermediate spatial resolution, a remote sensing-based model for mapping winter wheat on the North China Plain was built through integration with Landsat images and land-use data. First, a phenological window, PBW was drawn from time-series MODIS data. Next, feature extraction was performed for the PBW to reduce feature dimension and enhance its information. Finally, a regression model was built to model the relationship of the phenological feature and the sample data. The amount of information of the PBW was evaluated and compared with that of the main peak (MP). The relative precision of the mapping reached up to 92% in comparison to the Landsat sample data, and ranged between 87 and 96% in comparison to the statistical data. These results were sufficient to satisfy the accuracy requirements for winter wheat mapping at a large scale. Moreover, the proposed method has the ability to obtain the distribution information for winter wheat in an earlier period than previous studies. This study could throw light on the monitoring of winter wheat in China by using unique phenological feature of winter wheat.展开更多
Clustering is used to gain an intuition of the struc tures in the data.Most of the current clustering algorithms pro duce a clustering structure even on data that do not possess such structure.In these cases,the algor...Clustering is used to gain an intuition of the struc tures in the data.Most of the current clustering algorithms pro duce a clustering structure even on data that do not possess such structure.In these cases,the algorithms force a structure in the data instead of discovering one.To avoid false structures in the relations of data,a novel clusterability assessment method called density-based clusterability measure is proposed in this paper.I measures the prominence of clustering structure in the data to evaluate whether a cluster analysis could produce a meaningfu insight to the relationships in the data.This is especially useful in time-series data since visualizing the structure in time-series data is hard.The performance of the clusterability measure is evalu ated against several synthetic data sets and time-series data sets which illustrate that the density-based clusterability measure can successfully indicate clustering structure of time-series data.展开更多
Based on the 16d-composite MODIS (moderate resolution imaging spectroradiometer)-NDVI(normalized difference vegetation index) time-series data in 2004, vegetation in North Tibet Plateau was classified and seasonal...Based on the 16d-composite MODIS (moderate resolution imaging spectroradiometer)-NDVI(normalized difference vegetation index) time-series data in 2004, vegetation in North Tibet Plateau was classified and seasonal variations on the pixels selected from different vegetation type were analyzed. The Savitzky-Golay filtering algorithm was applied to perform a filtration processing for MODIS-NDVI time-series data. The processed time-series curves can reflect a real variation trend of vegetation growth. The NDVI time-series curves of coniferous forest, high-cold meadow, high-cold meadow steppe and high-cold steppe all appear a mono-peak model during vegetation growth with the maximum peak occurring in August. A decision-tree classification model was established according to either NDVI time-series data or land surface temperature data. And then, both classifying and processing for vegetations were carried out through the model based on NDVI time-series curves. An accuracy test illustrates that classification results are of high accuracy and credibility and the model is conducive for studying a climate variation and estimating a vegetation production at regional even global scale.展开更多
Underground coal fires are one of the most common and serious geohazards in most coal producing countries in the world. Monitoring their spatio-temporal changes plays an important role in controlling and preventing th...Underground coal fires are one of the most common and serious geohazards in most coal producing countries in the world. Monitoring their spatio-temporal changes plays an important role in controlling and preventing the effects of coal fires, and their environmental impact. In this study, the spatio-temporal changes of underground coal fires in Khanh Hoa coal field(North-East of Viet Nam) were analyzed using Landsat time-series data during the 2008-2016 period. Based on land surface temperatures retrieved from Landsat thermal data, underground coal fires related to thermal anomalies were identified using the MEDIAN+1.5×IQR(IQR: Interquartile range) threshold technique. The locations of underground coal fires were validated using a coal fire map produced by the field survey data and cross-validated using the daytime ASTER thermal infrared imagery. Based on the fires extracted from seven Landsat thermal imageries, the spatiotemporal changes of underground coal fire areas were analyzed. The results showed that the thermalanomalous zones have been correlated with known coal fires. Cross-validation of coal fires using ASTER TIR data showed a high consistency of 79.3%. The largest coal fire area of 184.6 hectares was detected in 2010, followed by 2014(181.1 hectares) and 2016(178.5 hectares). The smaller coal fire areas were extracted with areas of 133.6 and 152.5 hectares in 2011 and 2009 respectively. Underground coal fires were mainly detected in the northern and southern part, and tend to spread to north-west of the coal field.展开更多
In the age of information sharing, logistics information sharing also faces the risk of privacy leakage. In regard to the privacy leakage of time-series location information in the field of logistics, this paper propo...In the age of information sharing, logistics information sharing also faces the risk of privacy leakage. In regard to the privacy leakage of time-series location information in the field of logistics, this paper proposes a method based on differential privacy for time-series location data publication. Firstly, it constructs public region of interest(PROI) related to time by using clustering optimal algorithm. And it adopts the method of the centroid point to ensure the public interest point(PIP) representing the location of the public interest zone. Secondly, according to the PIP, we can construct location search tree(LST) that is a commonly used index structure of spatial data, in order to ensure the inherent relation among location data. Thirdly, we add Laplace noise to the node of LST, which means fewer times to add Laplace noise on the original data set and ensures the data availability. Finally, experiments show that this method not only ensures the security of sequential location data publishing, but also has better data availability than the general differential privacy method, which achieves a good balance between the security and availability of data.展开更多
Accurate information about phenological stages is essential for canola field management practices such as irrigation, fertilization, and harvesting. Previous studies in canola phenology monitoring focused mainly on th...Accurate information about phenological stages is essential for canola field management practices such as irrigation, fertilization, and harvesting. Previous studies in canola phenology monitoring focused mainly on the flowering stage, using its apparent structure features and colors. Additional phenological stages have been largely overlooked. The objective of this study was to improve a shape-model method(SMM) for extracting winter canola phenological stages from time-series top-of-canopy reflectance images collected by an unmanned aerial vehicle(UAV). The transformation equation of the SMM was refined to account for the multi-peak features of the temporal dynamics of three vegetation indices(VIs)(NDVI, EVI, and CI). An experiment with various seeding scenarios was conducted, including four different seeding dates and three seeding densities. Three mathematical functions: asymmetric Gaussian function(AGF), Fourier function, and double logistic function, were employed to fit timeseries vegetation indices to extract information about phenological stages. The refined SMM effectively estimated the phenological stages of canola, with a minimum root mean square error(RMSE) of 3.7 days for all phenological stages. The AGF function provided the best fitting performance, as it captured multiple peaks in the growth dynamics characteristics for all seeding date scenarios using four scaling parameters. For the three selected VIs, CIred-edgeachieved the greatest accuracy in estimating the phenological stage dates. This study demonstrates the high potential of the refined SMM for estimating winter canola phenology.展开更多
This essay combines the Defense Meteorological Satellite Program Operational Linescan System(DMSP-OLS)nighttime light data and the Visible Infrared Imaging Radiometer Suite(VIIRS)nighttime light data into a“synthetic...This essay combines the Defense Meteorological Satellite Program Operational Linescan System(DMSP-OLS)nighttime light data and the Visible Infrared Imaging Radiometer Suite(VIIRS)nighttime light data into a“synthetic DMSP”dataset,from 1992 to 2020,to retrieve the spatio-temporal variations in energy-related carbon emissions in Xinjiang,China.Then,this paper analyzes several influencing factors for spatial differentiation of carbon emissions in Xinjiang with the application of geographical detector technique.Results reveal that(1)total carbon emissions continued to grow,while the growth rate slowed down in the past five years.(2)Large regional differences exist in total carbon emissions across various regions.Total carbon emissions of these regions in descending order are the northern slope of the Tianshan(Mountains)>the southern slope of the Tianshan>the three prefectures in southern Xinjiang>the northern part of Xinjiang.(3)Economic growth,population size,and energy consumption intensity are the most important factors of spatial differentiation of carbon emissions.The interaction between economic growth and population size as well as between economic growth and energy consumption intensity also enhances the explanatory power of carbon emissions’spatial differentiation.This paper aims to help formulate differentiated carbon reduction targets and strategies for cities in different economic development stages and those with different carbon intensities so as to achieve the carbon peak goals in different steps.展开更多
The increasing penetration rate of electric kickboard vehicles has been popularized and promoted primarily because of its clean and efficient features.Electric kickboards are gradually growing in popularity in tourist...The increasing penetration rate of electric kickboard vehicles has been popularized and promoted primarily because of its clean and efficient features.Electric kickboards are gradually growing in popularity in tourist and education-centric localities.In the upcoming arrival of electric kickboard vehicles,deploying a customer rental service is essential.Due to its freefloating nature,the shared electric kickboard is a common and practical means of transportation.Relocation plans for shared electric kickboards are required to increase the quality of service,and forecasting demand for their use in a specific region is crucial.Predicting demand accurately with small data is troublesome.Extensive data is necessary for training machine learning algorithms for effective prediction.Data generation is a method for expanding the amount of data that will be further accessible for training.In this work,we proposed a model that takes time-series customers’electric kickboard demand data as input,pre-processes it,and generates synthetic data according to the original data distribution using generative adversarial networks(GAN).The electric kickboard mobility demand prediction error was reduced when we combined synthetic data with the original data.We proposed Tabular-GAN-Modified-WGAN-GP for generating synthetic data for better prediction results.We modified The Wasserstein GAN-gradient penalty(GP)with the RMSprop optimizer and then employed Spectral Normalization(SN)to improve training stability and faster convergence.Finally,we applied a regression-based blending ensemble technique that can help us to improve performance of demand prediction.We used various evaluation criteria and visual representations to compare our proposed model’s performance.Synthetic data generated by our suggested GAN model is also evaluated.The TGAN-Modified-WGAN-GP model mitigates the overfitting and mode collapse problem,and it also converges faster than previous GAN models for synthetic data creation.The presented model’s performance is compared to existing ensemble and baseline models.The experimental findings imply that combining synthetic and actual data can significantly reduce prediction error rates in the mean absolute percentage error(MAPE)of 4.476 and increase prediction accuracy.展开更多
Accurate mapping and timely monitoring of urban redevelopment are pivotal for urban studies and decisionmakers to foster sustainable urban development.Traditional mapping methods heavily depend on field surveys and su...Accurate mapping and timely monitoring of urban redevelopment are pivotal for urban studies and decisionmakers to foster sustainable urban development.Traditional mapping methods heavily depend on field surveys and subjective questionnaires,yielding less objective,reliable,and timely data.Recent advancements in Geographic Information Systems(GIS)and remote-sensing technologies have improved the identification and mapping of urban redevelopment through quantitative analysis using satellite-based observations.Nonetheless,challenges persist,particularly concerning accuracy and significant temporal delays.This study introduces a novel approach to modeling urban redevelopment,leveraging machine learning algorithms and remote-sensing data.This methodology can facilitate the accurate and timely identification of urban redevelopment activities.The study’s machine learning model can analyze time-series remote-sensing data to identify spatio-temporal and spectral patterns related to urban redevelopment.The model is thoroughly evaluated,and the results indicate that it can accurately capture the time-series patterns of urban redevelopment.This research’s findings are useful for evaluating urban demographic and economic changes,informing policymaking and urban planning,and contributing to sustainable urban development.The model can also serve as a foundation for future research on early-stage urban redevelopment detection and evaluation of the causes and impacts of urban redevelopment.展开更多
In this study,we developed software for vehicle big data analysis to analyze the time-series data of connected vehicles.We designed two software modules:The rst to derive the Pearson correlation coefcients to analyze ...In this study,we developed software for vehicle big data analysis to analyze the time-series data of connected vehicles.We designed two software modules:The rst to derive the Pearson correlation coefcients to analyze the collected data and the second to conduct exploratory data analysis of the collected vehicle data.In particular,we analyzed the dangerous driving patterns of motorists based on the safety standards of the Korea Transportation Safety Authority.We also analyzed seasonal fuel efciency(four seasons)and mileage of vehicles,and identied rapid acceleration,rapid deceleration,sudden stopping(harsh braking),quick starting,sudden left turn,sudden right turn and sudden U-turn driving patterns of vehicles.We implemented the density-based spatial clustering of applications with a noise algorithm for trajectory analysis based on GPS(Global Positioning System)data and designed a long shortterm memory algorithm and an auto-regressive integrated moving average model for time-series data analysis.In this paper,we mainly describe the development environment of the analysis software,the structure and data ow of the overall analysis platform,the conguration of the collected vehicle data,and the various algorithms used in the analysis.Finally,we present illustrative results of our analysis,such as dangerous driving patterns that were detected.展开更多
Iced transmission line galloping poses a significant threat to the safety and reliability of power systems,leading directly to line tripping,disconnections,and power outages.Existing early warning methods of iced tran...Iced transmission line galloping poses a significant threat to the safety and reliability of power systems,leading directly to line tripping,disconnections,and power outages.Existing early warning methods of iced transmission line galloping suffer from issues such as reliance on a single data source,neglect of irregular time series,and lack of attention-based closed-loop feedback,resulting in high rates of missed and false alarms.To address these challenges,we propose an Internet of Things(IoT)empowered early warning method of transmission line galloping that integrates time series data from optical fiber sensing and weather forecast.Initially,the method applies a primary adaptive weighted fusion to the IoT empowered optical fiber real-time sensing data and weather forecast data,followed by a secondary fusion based on a Back Propagation(BP)neural network,and uses the K-medoids algorithm for clustering the fused data.Furthermore,an adaptive irregular time series perception adjustment module is introduced into the traditional Gated Recurrent Unit(GRU)network,and closed-loop feedback based on attentionmechanism is employed to update network parameters through gradient feedback of the loss function,enabling closed-loop training and time series data prediction of the GRU network model.Subsequently,considering various types of prediction data and the duration of icing,an iced transmission line galloping risk coefficient is established,and warnings are categorized based on this coefficient.Finally,using an IoT-driven realistic dataset of iced transmission line galloping,the effectiveness of the proposed method is validated through multi-dimensional simulation scenarios.展开更多
The Intelligent Internet of Things(IIoT)involves real-world things that communicate or interact with each other through networking technologies by collecting data from these“things”and using intelligent approaches,s...The Intelligent Internet of Things(IIoT)involves real-world things that communicate or interact with each other through networking technologies by collecting data from these“things”and using intelligent approaches,such as Artificial Intelligence(AI)and machine learning,to make accurate decisions.Data science is the science of dealing with data and its relationships through intelligent approaches.Most state-of-the-art research focuses independently on either data science or IIoT,rather than exploring their integration.Therefore,to address the gap,this article provides a comprehensive survey on the advances and integration of data science with the Intelligent IoT(IIoT)system by classifying the existing IoT-based data science techniques and presenting a summary of various characteristics.The paper analyzes the data science or big data security and privacy features,including network architecture,data protection,and continuous monitoring of data,which face challenges in various IoT-based systems.Extensive insights into IoT data security,privacy,and challenges are visualized in the context of data science for IoT.In addition,this study reveals the current opportunities to enhance data science and IoT market development.The current gap and challenges faced in the integration of data science and IoT are comprehensively presented,followed by the future outlook and possible solutions.展开更多
Viral infectious diseases,characterized by their intricate nature and wide-ranging diversity,pose substantial challenges in the domain of data management.The vast volume of data generated by these diseases,spanning fr...Viral infectious diseases,characterized by their intricate nature and wide-ranging diversity,pose substantial challenges in the domain of data management.The vast volume of data generated by these diseases,spanning from the molecular mechanisms within cells to large-scale epidemiological patterns,has surpassed the capabilities of traditional analytical methods.In the era of artificial intelligence(AI)and big data,there is an urgent necessity for the optimization of these analytical methods to more effectively handle and utilize the information.Despite the rapid accumulation of data associated with viral infections,the lack of a comprehensive framework for integrating,selecting,and analyzing these datasets has left numerous researchers uncertain about which data to select,how to access it,and how to utilize it most effectively in their research.This review endeavors to fill these gaps by exploring the multifaceted nature of viral infectious diseases and summarizing relevant data across multiple levels,from the molecular details of pathogens to broad epidemiological trends.The scope extends from the micro-scale to the macro-scale,encompassing pathogens,hosts,and vectors.In addition to data summarization,this review thoroughly investigates various dataset sources.It also traces the historical evolution of data collection in the field of viral infectious diseases,highlighting the progress achieved over time.Simultaneously,it evaluates the current limitations that impede data utilization.Furthermore,we propose strategies to surmount these challenges,focusing on the development and application of advanced computational techniques,AI-driven models,and enhanced data integration practices.By providing a comprehensive synthesis of existing knowledge,this review is designed to guide future research and contribute to more informed approaches in the surveillance,prevention,and control of viral infectious diseases,particularly within the context of the expanding big-data landscape.展开更多
Air pollution in China covers a large area with complex sources and formation mechanisms,making it a unique place to conduct air pollution and atmospheric chemistry research.The National Natural Science Foundation of ...Air pollution in China covers a large area with complex sources and formation mechanisms,making it a unique place to conduct air pollution and atmospheric chemistry research.The National Natural Science Foundation of China’s Major Research Plan entitled“Fundamental Researches on the Formation and Response Mechanism of the Air Pollution Complex in China”(or the Plan)has funded 76 research projects to explore the causes of air pollution in China,and the key processes of air pollution in atmospheric physics and atmospheric chemistry.In order to summarize the abundant data from the Plan and exhibit the long-term impacts domestically and internationally,an integration project is responsible for collecting the various types of data generated by the 76 projects of the Plan.This project has classified and integrated these data,forming eight categories containing 258 datasets and 15 technical reports in total.The integration project has led to the successful establishment of the China Air Pollution Data Center(CAPDC)platform,providing storage,retrieval,and download services for the eight categories.This platform has distinct features including data visualization,related project information querying,and bilingual services in both English and Chinese,which allows for rapid searching and downloading of data and provides a solid foundation of data and support for future related research.Air pollution control in China,especially in the past decade,is undeniably a global exemplar,and this data center is the first in China to focus on research into the country’s air pollution complex.展开更多
The reverse design of solid rocket motor(SRM)propellant grain involves determining the grain geometry to closely match a predefined internal ballistic curve.While existing reverse design methods are feasible,they ofte...The reverse design of solid rocket motor(SRM)propellant grain involves determining the grain geometry to closely match a predefined internal ballistic curve.While existing reverse design methods are feasible,they often face challenges such as lengthy computation times and limited accuracy.To achieve rapid and accurate matching between the targeted ballistic curve and complex grain shape,this paper proposes a novel reverse design method for SRM propellant grain based on time-series data imaging and convolutional neural network(CNN).First,a finocyl grain shape-internal ballistic curve dataset is created using parametric modeling techniques to comprehensively cover the design space.Next,the internal ballistic time-series data is encoded into three-channel images,establishing a potential relationship between the ballistic curves and their image representations.A CNN is then constructed and trained using these encoded images.Once trained,the model enables efficient inference of propellant grain dimensions from a target internal ballistic curve.This paper conducts comparative experiments across various neural network models,validating the effectiveness of the feature extraction method that transforms internal ballistic time-series data into images,as well as its generalization capability across different CNN architectures.Ignition tests were performed based on the predicted propellant grain.The results demonstrate that the relative error between the experimental internal ballistic curves and the target curves is less than 5%,confirming the validity and feasibility of the proposed reverse design methodology.展开更多
Imputation of missing data has long been an important topic and an essential application for intelligent transportation systems(ITS)in the real world.As a state-of-the-art generative model,the diffusion model has prov...Imputation of missing data has long been an important topic and an essential application for intelligent transportation systems(ITS)in the real world.As a state-of-the-art generative model,the diffusion model has proven highly successful in image generation,speech generation,time series modelling etc.and now opens a new avenue for traffic data imputation.In this paper,we propose a conditional diffusion model,called the implicit-explicit diffusion model,for traffic data imputation.This model exploits both the implicit and explicit feature of the data simultaneously.More specifically,we design two types of feature extraction modules,one to capture the implicit dependencies hidden in the raw data at multiple time scales and the other to obtain the long-term temporal dependencies of the time series.This approach not only inherits the advantages of the diffusion model for estimating missing data,but also takes into account the multiscale correlation inherent in traffic data.To illustrate the performance of the model,extensive experiments are conducted on three real-world time series datasets using different missing rates.The experimental results demonstrate that the model improves imputation accuracy and generalization capability.展开更多
文摘The rapid growth of biomedical data,particularly multi-omics data including genomes,transcriptomics,proteomics,metabolomics,and epigenomics,medical research and clinical decision-making confront both new opportunities and obstacles.The huge and diversified nature of these datasets cannot always be managed using traditional data analysis methods.As a consequence,deep learning has emerged as a strong tool for analysing numerous omics data due to its ability to handle complex and non-linear relationships.This paper explores the fundamental concepts of deep learning and how they are used in multi-omics medical data mining.We demonstrate how autoencoders,variational autoencoders,multimodal models,attention mechanisms,transformers,and graph neural networks enable pattern analysis and recognition across all omics data.Deep learning has been found to be effective in illness classification,biomarker identification,gene network learning,and therapeutic efficacy prediction.We also consider critical problems like as data quality,model explainability,whether findings can be repeated,and computational power requirements.We now consider future elements of combining omics with clinical and imaging data,explainable AI,federated learning,and real-time diagnostics.Overall,this study emphasises the need of collaborating across disciplines to advance deep learning-based multi-omics research for precision medicine and comprehending complicated disorders.
文摘High-throughput transcriptomics has evolved from bulk RNA-seq to single-cell and spatial profiling,yet its clinical translation still depends on effective integration across diverse omics and data modalities.Emerging foundation models and multimodal learning frameworks are enabling scalable and transferable representations of cellular states,while advances in interpretability and real-world data integration are bridging the gap between discovery and clinical application.This paper outlines a concise roadmap for AI-driven,transcriptome-centered multi-omics integration in precision medicine(Figure 1).
基金Supported by Xuhui District Health Commission,No.SHXH202214.
文摘Gastrointestinal tumors require personalized treatment strategies due to their heterogeneity and complexity.Multimodal artificial intelligence(AI)addresses this challenge by integrating diverse data sources-including computed tomography(CT),magnetic resonance imaging(MRI),endoscopic imaging,and genomic profiles-to enable intelligent decision-making for individualized therapy.This approach leverages AI algorithms to fuse imaging,endoscopic,and omics data,facilitating comprehensive characterization of tumor biology,prediction of treatment response,and optimization of therapeutic strategies.By combining CT and MRI for structural assessment,endoscopic data for real-time visual inspection,and genomic information for molecular profiling,multimodal AI enhances the accuracy of patient stratification and treatment personalization.The clinical implementation of this technology demonstrates potential for improving patient outcomes,advancing precision oncology,and supporting individualized care in gastrointestinal cancers.Ultimately,multimodal AI serves as a transformative tool in oncology,bridging data integration with clinical application to effectively tailor therapies.
基金supported by the open research fund of the Key Laboratory of Agri-informatics,Ministry of Agriculture and the fund of Outstanding Agricultural Researcher,Ministry of Agriculture,China
文摘By employing the unique phenological feature of winter wheat extracted from peak before winter (PBW) and the advantages of moderate resolution imaging spectroradiometer (MODIS) data with high temporal resolution and intermediate spatial resolution, a remote sensing-based model for mapping winter wheat on the North China Plain was built through integration with Landsat images and land-use data. First, a phenological window, PBW was drawn from time-series MODIS data. Next, feature extraction was performed for the PBW to reduce feature dimension and enhance its information. Finally, a regression model was built to model the relationship of the phenological feature and the sample data. The amount of information of the PBW was evaluated and compared with that of the main peak (MP). The relative precision of the mapping reached up to 92% in comparison to the Landsat sample data, and ranged between 87 and 96% in comparison to the statistical data. These results were sufficient to satisfy the accuracy requirements for winter wheat mapping at a large scale. Moreover, the proposed method has the ability to obtain the distribution information for winter wheat in an earlier period than previous studies. This study could throw light on the monitoring of winter wheat in China by using unique phenological feature of winter wheat.
文摘Clustering is used to gain an intuition of the struc tures in the data.Most of the current clustering algorithms pro duce a clustering structure even on data that do not possess such structure.In these cases,the algorithms force a structure in the data instead of discovering one.To avoid false structures in the relations of data,a novel clusterability assessment method called density-based clusterability measure is proposed in this paper.I measures the prominence of clustering structure in the data to evaluate whether a cluster analysis could produce a meaningfu insight to the relationships in the data.This is especially useful in time-series data since visualizing the structure in time-series data is hard.The performance of the clusterability measure is evalu ated against several synthetic data sets and time-series data sets which illustrate that the density-based clusterability measure can successfully indicate clustering structure of time-series data.
基金the Frontier Program of the Knowledge Innovation Program of Chinese Academy of Sciences
文摘Based on the 16d-composite MODIS (moderate resolution imaging spectroradiometer)-NDVI(normalized difference vegetation index) time-series data in 2004, vegetation in North Tibet Plateau was classified and seasonal variations on the pixels selected from different vegetation type were analyzed. The Savitzky-Golay filtering algorithm was applied to perform a filtration processing for MODIS-NDVI time-series data. The processed time-series curves can reflect a real variation trend of vegetation growth. The NDVI time-series curves of coniferous forest, high-cold meadow, high-cold meadow steppe and high-cold steppe all appear a mono-peak model during vegetation growth with the maximum peak occurring in August. A decision-tree classification model was established according to either NDVI time-series data or land surface temperature data. And then, both classifying and processing for vegetations were carried out through the model based on NDVI time-series curves. An accuracy test illustrates that classification results are of high accuracy and credibility and the model is conducive for studying a climate variation and estimating a vegetation production at regional even global scale.
基金funded by the Ministry-level Scientific and Technological Key Programs of Ministry of Natural Resources and Environment of Viet Nam "Application of thermal infrared remote sensing and GIS for mapping underground coal fires in Quang Ninh coal basin" (Grant No. TNMT.2017.08.06)
文摘Underground coal fires are one of the most common and serious geohazards in most coal producing countries in the world. Monitoring their spatio-temporal changes plays an important role in controlling and preventing the effects of coal fires, and their environmental impact. In this study, the spatio-temporal changes of underground coal fires in Khanh Hoa coal field(North-East of Viet Nam) were analyzed using Landsat time-series data during the 2008-2016 period. Based on land surface temperatures retrieved from Landsat thermal data, underground coal fires related to thermal anomalies were identified using the MEDIAN+1.5×IQR(IQR: Interquartile range) threshold technique. The locations of underground coal fires were validated using a coal fire map produced by the field survey data and cross-validated using the daytime ASTER thermal infrared imagery. Based on the fires extracted from seven Landsat thermal imageries, the spatiotemporal changes of underground coal fire areas were analyzed. The results showed that the thermalanomalous zones have been correlated with known coal fires. Cross-validation of coal fires using ASTER TIR data showed a high consistency of 79.3%. The largest coal fire area of 184.6 hectares was detected in 2010, followed by 2014(181.1 hectares) and 2016(178.5 hectares). The smaller coal fire areas were extracted with areas of 133.6 and 152.5 hectares in 2011 and 2009 respectively. Underground coal fires were mainly detected in the northern and southern part, and tend to spread to north-west of the coal field.
基金Supported by the Social Science Foundation of Beijing(15JGB099,15ZHA004)the National Natural Science Foundation of China(61370139)"Information+" Special Fund(5111823610)
文摘In the age of information sharing, logistics information sharing also faces the risk of privacy leakage. In regard to the privacy leakage of time-series location information in the field of logistics, this paper proposes a method based on differential privacy for time-series location data publication. Firstly, it constructs public region of interest(PROI) related to time by using clustering optimal algorithm. And it adopts the method of the centroid point to ensure the public interest point(PIP) representing the location of the public interest zone. Secondly, according to the PIP, we can construct location search tree(LST) that is a commonly used index structure of spatial data, in order to ensure the inherent relation among location data. Thirdly, we add Laplace noise to the node of LST, which means fewer times to add Laplace noise on the original data set and ensures the data availability. Finally, experiments show that this method not only ensures the security of sequential location data publishing, but also has better data availability than the general differential privacy method, which achieves a good balance between the security and availability of data.
基金supported by the National Natural Science Foundation of China (51909228)the Postdoctoral Science Foundation of China (2020M671623)the ‘‘Blue Project” of Yangzhou University。
文摘Accurate information about phenological stages is essential for canola field management practices such as irrigation, fertilization, and harvesting. Previous studies in canola phenology monitoring focused mainly on the flowering stage, using its apparent structure features and colors. Additional phenological stages have been largely overlooked. The objective of this study was to improve a shape-model method(SMM) for extracting winter canola phenological stages from time-series top-of-canopy reflectance images collected by an unmanned aerial vehicle(UAV). The transformation equation of the SMM was refined to account for the multi-peak features of the temporal dynamics of three vegetation indices(VIs)(NDVI, EVI, and CI). An experiment with various seeding scenarios was conducted, including four different seeding dates and three seeding densities. Three mathematical functions: asymmetric Gaussian function(AGF), Fourier function, and double logistic function, were employed to fit timeseries vegetation indices to extract information about phenological stages. The refined SMM effectively estimated the phenological stages of canola, with a minimum root mean square error(RMSE) of 3.7 days for all phenological stages. The AGF function provided the best fitting performance, as it captured multiple peaks in the growth dynamics characteristics for all seeding date scenarios using four scaling parameters. For the three selected VIs, CIred-edgeachieved the greatest accuracy in estimating the phenological stage dates. This study demonstrates the high potential of the refined SMM for estimating winter canola phenology.
基金The Third Xinjiang Scientific Expedition Program(2021xjkk0905)GDAS Special Project of Science and Technology Development(2020GDASYL-20200301003)+2 种基金GDAS Special Project of Science and Technology Development(2020GDASYL-20200102002)National Natural Science Foundation of China(41501144)Project of Department of Natural Resources of Guangdong Province(GDZRZYKJ2022005)。
文摘This essay combines the Defense Meteorological Satellite Program Operational Linescan System(DMSP-OLS)nighttime light data and the Visible Infrared Imaging Radiometer Suite(VIIRS)nighttime light data into a“synthetic DMSP”dataset,from 1992 to 2020,to retrieve the spatio-temporal variations in energy-related carbon emissions in Xinjiang,China.Then,this paper analyzes several influencing factors for spatial differentiation of carbon emissions in Xinjiang with the application of geographical detector technique.Results reveal that(1)total carbon emissions continued to grow,while the growth rate slowed down in the past five years.(2)Large regional differences exist in total carbon emissions across various regions.Total carbon emissions of these regions in descending order are the northern slope of the Tianshan(Mountains)>the southern slope of the Tianshan>the three prefectures in southern Xinjiang>the northern part of Xinjiang.(3)Economic growth,population size,and energy consumption intensity are the most important factors of spatial differentiation of carbon emissions.The interaction between economic growth and population size as well as between economic growth and energy consumption intensity also enhances the explanatory power of carbon emissions’spatial differentiation.This paper aims to help formulate differentiated carbon reduction targets and strategies for cities in different economic development stages and those with different carbon intensities so as to achieve the carbon peak goals in different steps.
基金This work was supported by Korea Institute for Advancement of Technology(KIAT)grant funded by the Korea Government(MOTIE)(P0016977,The Establishment Project of Industry-University Fusion District).
文摘The increasing penetration rate of electric kickboard vehicles has been popularized and promoted primarily because of its clean and efficient features.Electric kickboards are gradually growing in popularity in tourist and education-centric localities.In the upcoming arrival of electric kickboard vehicles,deploying a customer rental service is essential.Due to its freefloating nature,the shared electric kickboard is a common and practical means of transportation.Relocation plans for shared electric kickboards are required to increase the quality of service,and forecasting demand for their use in a specific region is crucial.Predicting demand accurately with small data is troublesome.Extensive data is necessary for training machine learning algorithms for effective prediction.Data generation is a method for expanding the amount of data that will be further accessible for training.In this work,we proposed a model that takes time-series customers’electric kickboard demand data as input,pre-processes it,and generates synthetic data according to the original data distribution using generative adversarial networks(GAN).The electric kickboard mobility demand prediction error was reduced when we combined synthetic data with the original data.We proposed Tabular-GAN-Modified-WGAN-GP for generating synthetic data for better prediction results.We modified The Wasserstein GAN-gradient penalty(GP)with the RMSprop optimizer and then employed Spectral Normalization(SN)to improve training stability and faster convergence.Finally,we applied a regression-based blending ensemble technique that can help us to improve performance of demand prediction.We used various evaluation criteria and visual representations to compare our proposed model’s performance.Synthetic data generated by our suggested GAN model is also evaluated.The TGAN-Modified-WGAN-GP model mitigates the overfitting and mode collapse problem,and it also converges faster than previous GAN models for synthetic data creation.The presented model’s performance is compared to existing ensemble and baseline models.The experimental findings imply that combining synthetic and actual data can significantly reduce prediction error rates in the mean absolute percentage error(MAPE)of 4.476 and increase prediction accuracy.
文摘Accurate mapping and timely monitoring of urban redevelopment are pivotal for urban studies and decisionmakers to foster sustainable urban development.Traditional mapping methods heavily depend on field surveys and subjective questionnaires,yielding less objective,reliable,and timely data.Recent advancements in Geographic Information Systems(GIS)and remote-sensing technologies have improved the identification and mapping of urban redevelopment through quantitative analysis using satellite-based observations.Nonetheless,challenges persist,particularly concerning accuracy and significant temporal delays.This study introduces a novel approach to modeling urban redevelopment,leveraging machine learning algorithms and remote-sensing data.This methodology can facilitate the accurate and timely identification of urban redevelopment activities.The study’s machine learning model can analyze time-series remote-sensing data to identify spatio-temporal and spectral patterns related to urban redevelopment.The model is thoroughly evaluated,and the results indicate that it can accurately capture the time-series patterns of urban redevelopment.This research’s findings are useful for evaluating urban demographic and economic changes,informing policymaking and urban planning,and contributing to sustainable urban development.The model can also serve as a foundation for future research on early-stage urban redevelopment detection and evaluation of the causes and impacts of urban redevelopment.
基金supported by the Technology Innovation Program(10083633,Development on Big Data Analysis Technology and Business Service for Connected Vehicles)funded by the Ministry of Trade,Industry&Energy(MOTIE,Korea)。
文摘In this study,we developed software for vehicle big data analysis to analyze the time-series data of connected vehicles.We designed two software modules:The rst to derive the Pearson correlation coefcients to analyze the collected data and the second to conduct exploratory data analysis of the collected vehicle data.In particular,we analyzed the dangerous driving patterns of motorists based on the safety standards of the Korea Transportation Safety Authority.We also analyzed seasonal fuel efciency(four seasons)and mileage of vehicles,and identied rapid acceleration,rapid deceleration,sudden stopping(harsh braking),quick starting,sudden left turn,sudden right turn and sudden U-turn driving patterns of vehicles.We implemented the density-based spatial clustering of applications with a noise algorithm for trajectory analysis based on GPS(Global Positioning System)data and designed a long shortterm memory algorithm and an auto-regressive integrated moving average model for time-series data analysis.In this paper,we mainly describe the development environment of the analysis software,the structure and data ow of the overall analysis platform,the conguration of the collected vehicle data,and the various algorithms used in the analysis.Finally,we present illustrative results of our analysis,such as dangerous driving patterns that were detected.
基金research was funded by Science and Technology Project of State Grid Corporation of China under grant number 5200-202319382A-2-3-XG.
文摘Iced transmission line galloping poses a significant threat to the safety and reliability of power systems,leading directly to line tripping,disconnections,and power outages.Existing early warning methods of iced transmission line galloping suffer from issues such as reliance on a single data source,neglect of irregular time series,and lack of attention-based closed-loop feedback,resulting in high rates of missed and false alarms.To address these challenges,we propose an Internet of Things(IoT)empowered early warning method of transmission line galloping that integrates time series data from optical fiber sensing and weather forecast.Initially,the method applies a primary adaptive weighted fusion to the IoT empowered optical fiber real-time sensing data and weather forecast data,followed by a secondary fusion based on a Back Propagation(BP)neural network,and uses the K-medoids algorithm for clustering the fused data.Furthermore,an adaptive irregular time series perception adjustment module is introduced into the traditional Gated Recurrent Unit(GRU)network,and closed-loop feedback based on attentionmechanism is employed to update network parameters through gradient feedback of the loss function,enabling closed-loop training and time series data prediction of the GRU network model.Subsequently,considering various types of prediction data and the duration of icing,an iced transmission line galloping risk coefficient is established,and warnings are categorized based on this coefficient.Finally,using an IoT-driven realistic dataset of iced transmission line galloping,the effectiveness of the proposed method is validated through multi-dimensional simulation scenarios.
基金supported in part by the National Natural Science Foundation of China under Grant 62371181in part by the Changzhou Science and Technology International Cooperation Program under Grant CZ20230029+1 种基金supported by a National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(2021R1A2B5B02087169)supported under the framework of international cooperation program managed by the National Research Foundation of Korea(2022K2A9A1A01098051)。
文摘The Intelligent Internet of Things(IIoT)involves real-world things that communicate or interact with each other through networking technologies by collecting data from these“things”and using intelligent approaches,such as Artificial Intelligence(AI)and machine learning,to make accurate decisions.Data science is the science of dealing with data and its relationships through intelligent approaches.Most state-of-the-art research focuses independently on either data science or IIoT,rather than exploring their integration.Therefore,to address the gap,this article provides a comprehensive survey on the advances and integration of data science with the Intelligent IoT(IIoT)system by classifying the existing IoT-based data science techniques and presenting a summary of various characteristics.The paper analyzes the data science or big data security and privacy features,including network architecture,data protection,and continuous monitoring of data,which face challenges in various IoT-based systems.Extensive insights into IoT data security,privacy,and challenges are visualized in the context of data science for IoT.In addition,this study reveals the current opportunities to enhance data science and IoT market development.The current gap and challenges faced in the integration of data science and IoT are comprehensively presented,followed by the future outlook and possible solutions.
基金supported by the National Natural Science Foundation of China(32370703)the CAMS Innovation Fund for Medical Sciences(CIFMS)(2022-I2M-1-021,2021-I2M-1-061)the Major Project of Guangzhou National Labora-tory(GZNL2024A01015).
文摘Viral infectious diseases,characterized by their intricate nature and wide-ranging diversity,pose substantial challenges in the domain of data management.The vast volume of data generated by these diseases,spanning from the molecular mechanisms within cells to large-scale epidemiological patterns,has surpassed the capabilities of traditional analytical methods.In the era of artificial intelligence(AI)and big data,there is an urgent necessity for the optimization of these analytical methods to more effectively handle and utilize the information.Despite the rapid accumulation of data associated with viral infections,the lack of a comprehensive framework for integrating,selecting,and analyzing these datasets has left numerous researchers uncertain about which data to select,how to access it,and how to utilize it most effectively in their research.This review endeavors to fill these gaps by exploring the multifaceted nature of viral infectious diseases and summarizing relevant data across multiple levels,from the molecular details of pathogens to broad epidemiological trends.The scope extends from the micro-scale to the macro-scale,encompassing pathogens,hosts,and vectors.In addition to data summarization,this review thoroughly investigates various dataset sources.It also traces the historical evolution of data collection in the field of viral infectious diseases,highlighting the progress achieved over time.Simultaneously,it evaluates the current limitations that impede data utilization.Furthermore,we propose strategies to surmount these challenges,focusing on the development and application of advanced computational techniques,AI-driven models,and enhanced data integration practices.By providing a comprehensive synthesis of existing knowledge,this review is designed to guide future research and contribute to more informed approaches in the surveillance,prevention,and control of viral infectious diseases,particularly within the context of the expanding big-data landscape.
基金supported by the National Natural Science Foundation of China(Grant No.92044303)。
文摘Air pollution in China covers a large area with complex sources and formation mechanisms,making it a unique place to conduct air pollution and atmospheric chemistry research.The National Natural Science Foundation of China’s Major Research Plan entitled“Fundamental Researches on the Formation and Response Mechanism of the Air Pollution Complex in China”(or the Plan)has funded 76 research projects to explore the causes of air pollution in China,and the key processes of air pollution in atmospheric physics and atmospheric chemistry.In order to summarize the abundant data from the Plan and exhibit the long-term impacts domestically and internationally,an integration project is responsible for collecting the various types of data generated by the 76 projects of the Plan.This project has classified and integrated these data,forming eight categories containing 258 datasets and 15 technical reports in total.The integration project has led to the successful establishment of the China Air Pollution Data Center(CAPDC)platform,providing storage,retrieval,and download services for the eight categories.This platform has distinct features including data visualization,related project information querying,and bilingual services in both English and Chinese,which allows for rapid searching and downloading of data and provides a solid foundation of data and support for future related research.Air pollution control in China,especially in the past decade,is undeniably a global exemplar,and this data center is the first in China to focus on research into the country’s air pollution complex.
文摘The reverse design of solid rocket motor(SRM)propellant grain involves determining the grain geometry to closely match a predefined internal ballistic curve.While existing reverse design methods are feasible,they often face challenges such as lengthy computation times and limited accuracy.To achieve rapid and accurate matching between the targeted ballistic curve and complex grain shape,this paper proposes a novel reverse design method for SRM propellant grain based on time-series data imaging and convolutional neural network(CNN).First,a finocyl grain shape-internal ballistic curve dataset is created using parametric modeling techniques to comprehensively cover the design space.Next,the internal ballistic time-series data is encoded into three-channel images,establishing a potential relationship between the ballistic curves and their image representations.A CNN is then constructed and trained using these encoded images.Once trained,the model enables efficient inference of propellant grain dimensions from a target internal ballistic curve.This paper conducts comparative experiments across various neural network models,validating the effectiveness of the feature extraction method that transforms internal ballistic time-series data into images,as well as its generalization capability across different CNN architectures.Ignition tests were performed based on the predicted propellant grain.The results demonstrate that the relative error between the experimental internal ballistic curves and the target curves is less than 5%,confirming the validity and feasibility of the proposed reverse design methodology.
基金partially supported by the National Natural Science Foundation of China(62271485)the SDHS Science and Technology Project(HS2023B044)
文摘Imputation of missing data has long been an important topic and an essential application for intelligent transportation systems(ITS)in the real world.As a state-of-the-art generative model,the diffusion model has proven highly successful in image generation,speech generation,time series modelling etc.and now opens a new avenue for traffic data imputation.In this paper,we propose a conditional diffusion model,called the implicit-explicit diffusion model,for traffic data imputation.This model exploits both the implicit and explicit feature of the data simultaneously.More specifically,we design two types of feature extraction modules,one to capture the implicit dependencies hidden in the raw data at multiple time scales and the other to obtain the long-term temporal dependencies of the time series.This approach not only inherits the advantages of the diffusion model for estimating missing data,but also takes into account the multiscale correlation inherent in traffic data.To illustrate the performance of the model,extensive experiments are conducted on three real-world time series datasets using different missing rates.The experimental results demonstrate that the model improves imputation accuracy and generalization capability.