Graph embedding aims to map the high-dimensional nodes to a low-dimensional space and learns the graph relationship from its latent representations.Most existing graph embedding methods focus on the topological struct...Graph embedding aims to map the high-dimensional nodes to a low-dimensional space and learns the graph relationship from its latent representations.Most existing graph embedding methods focus on the topological structure of graph data,but ignore the semantic information of graph data,which results in the unsatisfied performance in practical applications.To overcome the problem,this paper proposes a novel deep convolutional adversarial graph autoencoder(GAE)model.To embed the semantic information between nodes in the graph data,the random walk strategy is first used to construct the positive pointwise mutual information(PPMI)matrix,then,graph convolutional net-work(GCN)is employed to encode the PPMI matrix and node content into the latent representation.Finally,the learned latent representation is used to reconstruct the topological structure of the graph data by decoder.Furthermore,the deep convolutional adversarial training algorithm is introduced to make the learned latent representation conform to the prior distribution better.The state-of-the-art experimental results on the graph data validate the effectiveness of the proposed model in the link prediction,node clustering and graph visualization tasks for three standard datasets,Cora,Citeseer and Pubmed.展开更多
Similarity measurement has been a prevailing research topic geographic information science.Geometric similarity measurement inin scaling transformation(GSM_ST)is critical to ensure spatial data quality while balancing...Similarity measurement has been a prevailing research topic geographic information science.Geometric similarity measurement inin scaling transformation(GSM_ST)is critical to ensure spatial data quality while balancing detailed information with distinctive features.However,GSM_ST is an uncertain problem due to subjective spatial cognition,global and local concerns,and geometric complexity.Traditional rule-based methods considering multiple consistent conditions require subjective adjustments to characteristics and weights,leading to poor robustness in addressing GSM_ST.This study proposes an unsupervised representation learning framework for automated GSM_ST,using a Graph Autoencoder Network(GAE)and drainage networks as an example.The framework involves constructing a drainage graph,designing the GAE architecture for GSM_ST,and using Cosine similarity to measure similarity based on the GAE-derived drainage embeddings in different scales.We perform extensive experiments and compare methods across 71 drainage networks duringfive scaling transformations.The results show that the proposed GAE method outperforms other methods with a satisfaction ratio of around 88%and has strong robustness.Moreover,our proposed method also can be applied to other scenarios,such as measuring similarity between geographical entities at different times and data from different datasets.展开更多
Unlike traditional clustering analysis,the biclustering algorithm works simultaneously on two dimensions of samples(row)and variables(column).In recent years,biclustering methods have been developed rapidly and widely...Unlike traditional clustering analysis,the biclustering algorithm works simultaneously on two dimensions of samples(row)and variables(column).In recent years,biclustering methods have been developed rapidly and widely applied in biological data analysis,text clustering,recommendation system and other fields.The traditional clustering algorithms cannot be well adapted to process high-dimensional data and/or large-scale data.At present,most of the biclustering algorithms are designed for the differentially expressed big biological data.However,there is little discussion on binary data clustering mining such as miRNA-targeted gene data.Here,we propose a novel biclustering method for miRNA-targeted gene data based on graph autoencoder named as GAEBic.GAEBic applies graph autoencoder to capture the similarity of sample sets or variable sets,and takes a new irregular clustering strategy to mine biclusters with excellent generalization.Based on the miRNA-targeted gene data of soybean,we benchmark several different types of the biclustering algorithm,and find that GAEBic performs better than Bimax,Bibit and the Spectral Biclustering algorithm in terms of target gene enrichment.This biclustering method achieves comparable performance on the high throughput miRNA data of soybean and it can also be used for other species.展开更多
As oil and gas exploration continues to progress into deeper and unconventional reservoirs,the likelihood of kick risk increases,making kick warning a critical factor in ensuring drilling safety and efficiency.Due to ...As oil and gas exploration continues to progress into deeper and unconventional reservoirs,the likelihood of kick risk increases,making kick warning a critical factor in ensuring drilling safety and efficiency.Due to the scarcity of kick samples,traditional supervised models perform poorly,and significant fluctuations in field data lead to high false alarm rates.This study proposes an unsupervised graph autoencoder(GAE)-based kick warning method,which effectively reduces false alarms by eliminating the influence of field engineer operations and incorporating real-time model updates.The method utilizes the GAE model to process time-series data during drilling,accurately identifying kick risk while overcoming challenges related to small sample sizes and missing features.To further reduce false alarms,the weighted dynamic time warping(WDTW)algorithm is introduced to identify fluctuations in logging data caused by field engineer operations during drilling,with real-time updates applied to prevent normal conditions from being misclassified as kick risk.Experimental results show that the GAE-based kick warning method achieves an accuracy of 92.7%and significantly reduces the false alarm rate.The GAE model continues to operate effectively even under conditions of missing features and issues kick warnings 4 min earlier than field engineers,demonstrating its high sensitivity and robustness.After integrating the WDTW algorithm and real-time updates,the false alarm rate is reduced from 17.3%to 5.6%,further improving the accuracy of kick warnings.The proposed method provides an efficient and reliable approach for kick warning in drilling operations,offering strong practical value and technical support for the intelligent management of future drilling operations.展开更多
The outbreak and subsequent recurring waves of COVID−19 pose threats on the emergency management and people's daily life,while the large-scale spatio-temporal epidemiological data have sure come in handy in epidem...The outbreak and subsequent recurring waves of COVID−19 pose threats on the emergency management and people's daily life,while the large-scale spatio-temporal epidemiological data have sure come in handy in epidemic surveillance.Nonetheless,some challenges remain to be addressed in terms of multi-source heterogeneous data fusion,deep mining,and comprehensive applications.The Spatio-Temporal Artificial Intelligence(STAI)technology,which focuses on integrating spatial related time-series data,artificial intelligence models,and digital tools to provide intelligent computing platforms and applications,opens up new opportunities for scientific epidemic control.To this end,we leverage STAI and long-term experience in location-based intelligent services in the work.Specifically,we devise and develop a STAI-driven digital infrastructure,namely,WAYZ Disease Control Intelligent Platform(WDCIP),which consists of a systematic framework for building pipelines from automatic spatio-temporal data collection,processing to AI-based analysis and inference implementation for providing appropriate applications serving various epidemic scenarios.According to the platform implementation logic,our work can be performed and summarized from three aspects:(1)a STAI-driven integrated system;(2)a hybrid GNN-based approach for hierarchical risk assessment(as the core algorithm of WDCIP);and(3)comprehensive applications for social epidemic containment.This work makes a pivotal contribution to facilitating the aggregation and full utilization of spatio-temporal epidemic data from multiple sources,where the real-time human mobility data generated by high-precision mobile positioning plays a vital role in sensing the spread of the epidemic.So far,WDCIP has accumulated more than 200 million users who have been served in life convenience and decision-making during the pandemic.展开更多
Wind power is one of the fastest-growing renewable energy sectors instrumental in the ongoing decarbonizationprocess. However, wind turbines are subjected to a wide range of dynamic loads which can cause more frequent...Wind power is one of the fastest-growing renewable energy sectors instrumental in the ongoing decarbonizationprocess. However, wind turbines are subjected to a wide range of dynamic loads which can cause more frequentfailures and downtime periods, leading to ever-increasing attention to effective Condition Monitoring strategies.In this paper, we propose a novel unsupervised deep anomaly detection framework to detect anomalies in windturbines based on SCADA data. We introduce a promising neural architecture, namely a Graph ConvolutionalAutoencoder for Multivariate Time series, to model the sensor network as a dynamical functional graph. Thisstructure improves the unsupervised learning capabilities of Autoencoders by considering individual sensormeasurements together with the nonlinear correlations existing among signals. On this basis, we developeda deep anomaly detection framework that was validated on 12 failure events occurred during 20 months ofoperation of four wind turbines. The results show that the proposed framework successfully detects anomaliesand anticipates SCADA alarms by outperforming other two recent neural approaches.展开更多
Ensuring the safe and efficient operation of self-driving vehicles relies heavily on accurately predicting their future trajectories.Existing approaches commonly employ an encoder-decoder neural network structure to e...Ensuring the safe and efficient operation of self-driving vehicles relies heavily on accurately predicting their future trajectories.Existing approaches commonly employ an encoder-decoder neural network structure to enhance information extraction during the encoding phase.However,these methods often neglect the inclusion of road rule constraints during trajectory formulation in the decoding phase.This paper proposes a novel method that combines neural networks and rule-based constraints in the decoder stage to improve trajectory prediction accuracy while ensuring compliance with vehicle kinematics and road rules.The approach separates vehicle trajectories into lateral and longitudinal routes and utilizes conditional variational autoencoder(CVAE)to capture trajectory uncertainty.The evaluation results demonstrate a reduction of 32.4%and 27.6%in the average displacement error(ADE)for predicting the top five and top ten trajectories,respectively,compared to the baseline method.展开更多
Joint analysis of multimodalities in spatial mass spectrometry imaging(SMSI)data,including histology,spatial location,and molecule data,allows us to gain novel insights into tissue structures.However,the significant d...Joint analysis of multimodalities in spatial mass spectrometry imaging(SMSI)data,including histology,spatial location,and molecule data,allows us to gain novel insights into tissue structures.However,the significant differences in characteristics such as scale and heterogeneity among the multimodal data,coupled with the high noise levels and uneven quality of MSI data,severely hinder their comprehensive analysis.Here,we introduce a cross-graph cycle attention model,MSCG,to learn efficient joint embeddings for multimodalities of SMSI data by integrating graph attention autoencoders and attention-transfer.Specifically,MSCG enables leveraging one modality(e.g.,histology)to fine-tune the graph neural network trained for another modality(e.g.,MSI).Our study on real datasets from different platforms highlights the superior capacities of MSCG in dissecting cellular heterogeneity,as well as in denoising and aggregating MSI data.Notably,MSCG demonstrates versatile applicability across MSI data from various platforms,showcasing its potential for broad utility in this field.展开更多
Recent advancements in spatial transcriptomics(ST)technologies offer unprecedented opportunities to unveil the spatial heterogeneity of gene expression and cell states within tissues.Despite these capabilities of the ...Recent advancements in spatial transcriptomics(ST)technologies offer unprecedented opportunities to unveil the spatial heterogeneity of gene expression and cell states within tissues.Despite these capabilities of the ST data,accurately dissecting spatiotemporal structures(e.g.,spatial domains,temporal trajectories,and functional interactions)remains challenging.Here,we introduce a computational framework,PearlST(partial differential equation[PDE]-enhanced adversarial graph autoencoder of ST),for accurate inference of spatiotemporal structures from the ST data using PDE-enhanced adversarial graph autoencoder.PearlST employs contrastive learning to extract histological image features,integrates a PDE-based diffusion model to enhance characterization of spatial features at domain boundaries,and learns the latent low-dimensional embeddings via Wasserstein adversarial regularized graph autoencoders.Comparative analyses across multiple ST datasets with varying resolutions demonstrate that PearlST outperforms existing methods in spatial clustering,trajectory inference,and pseudotime analysis.Furthermore,PearlST elucidates functional regulations of the latent features by linking intercellular ligand-receptor interactions to most contributing genes of the low-dimensional embeddings,as illustrated in a human breast cancer dataset.Overall,PearlST proves to be a powerful tool for extracting interpretable latent features and dissecting intricate spatiotemporal structures in ST data across various biological contexts.展开更多
基金Supported by the Strategy Priority Research Program of Chinese Academy of Sciences(No.XDC02070600).
文摘Graph embedding aims to map the high-dimensional nodes to a low-dimensional space and learns the graph relationship from its latent representations.Most existing graph embedding methods focus on the topological structure of graph data,but ignore the semantic information of graph data,which results in the unsatisfied performance in practical applications.To overcome the problem,this paper proposes a novel deep convolutional adversarial graph autoencoder(GAE)model.To embed the semantic information between nodes in the graph data,the random walk strategy is first used to construct the positive pointwise mutual information(PPMI)matrix,then,graph convolutional net-work(GCN)is employed to encode the PPMI matrix and node content into the latent representation.Finally,the learned latent representation is used to reconstruct the topological structure of the graph data by decoder.Furthermore,the deep convolutional adversarial training algorithm is introduced to make the learned latent representation conform to the prior distribution better.The state-of-the-art experimental results on the graph data validate the effectiveness of the proposed model in the link prediction,node clustering and graph visualization tasks for three standard datasets,Cora,Citeseer and Pubmed.
基金supported by the National Natural Science Foundation of China[grant number 41531180]the National Natural Science Foundation of China[grant number 42071450]the China Scholarship Council(CSC)[grant number 202206270076].
文摘Similarity measurement has been a prevailing research topic geographic information science.Geometric similarity measurement inin scaling transformation(GSM_ST)is critical to ensure spatial data quality while balancing detailed information with distinctive features.However,GSM_ST is an uncertain problem due to subjective spatial cognition,global and local concerns,and geometric complexity.Traditional rule-based methods considering multiple consistent conditions require subjective adjustments to characteristics and weights,leading to poor robustness in addressing GSM_ST.This study proposes an unsupervised representation learning framework for automated GSM_ST,using a Graph Autoencoder Network(GAE)and drainage networks as an example.The framework involves constructing a drainage graph,designing the GAE architecture for GSM_ST,and using Cosine similarity to measure similarity based on the GAE-derived drainage embeddings in different scales.We perform extensive experiments and compare methods across 71 drainage networks duringfive scaling transformations.The results show that the proposed GAE method outperforms other methods with a satisfaction ratio of around 88%and has strong robustness.Moreover,our proposed method also can be applied to other scenarios,such as measuring similarity between geographical entities at different times and data from different datasets.
基金This work was supported by the National Natural Science Foundation of China under Grant No.62072210the Project of the Development and Reform Commission of Jilin Province of China under Grant No.2019C053-6.
文摘Unlike traditional clustering analysis,the biclustering algorithm works simultaneously on two dimensions of samples(row)and variables(column).In recent years,biclustering methods have been developed rapidly and widely applied in biological data analysis,text clustering,recommendation system and other fields.The traditional clustering algorithms cannot be well adapted to process high-dimensional data and/or large-scale data.At present,most of the biclustering algorithms are designed for the differentially expressed big biological data.However,there is little discussion on binary data clustering mining such as miRNA-targeted gene data.Here,we propose a novel biclustering method for miRNA-targeted gene data based on graph autoencoder named as GAEBic.GAEBic applies graph autoencoder to capture the similarity of sample sets or variable sets,and takes a new irregular clustering strategy to mine biclusters with excellent generalization.Based on the miRNA-targeted gene data of soybean,we benchmark several different types of the biclustering algorithm,and find that GAEBic performs better than Bimax,Bibit and the Spectral Biclustering algorithm in terms of target gene enrichment.This biclustering method achieves comparable performance on the high throughput miRNA data of soybean and it can also be used for other species.
基金Youth Foundation of National Natural Science Foundation of China (No. 52204020)Distinguished Young Foundation of National Natural Science Foundation of China (No. 52125401).
文摘As oil and gas exploration continues to progress into deeper and unconventional reservoirs,the likelihood of kick risk increases,making kick warning a critical factor in ensuring drilling safety and efficiency.Due to the scarcity of kick samples,traditional supervised models perform poorly,and significant fluctuations in field data lead to high false alarm rates.This study proposes an unsupervised graph autoencoder(GAE)-based kick warning method,which effectively reduces false alarms by eliminating the influence of field engineer operations and incorporating real-time model updates.The method utilizes the GAE model to process time-series data during drilling,accurately identifying kick risk while overcoming challenges related to small sample sizes and missing features.To further reduce false alarms,the weighted dynamic time warping(WDTW)algorithm is introduced to identify fluctuations in logging data caused by field engineer operations during drilling,with real-time updates applied to prevent normal conditions from being misclassified as kick risk.Experimental results show that the GAE-based kick warning method achieves an accuracy of 92.7%and significantly reduces the false alarm rate.The GAE model continues to operate effectively even under conditions of missing features and issues kick warnings 4 min earlier than field engineers,demonstrating its high sensitivity and robustness.After integrating the WDTW algorithm and real-time updates,the false alarm rate is reduced from 17.3%to 5.6%,further improving the accuracy of kick warnings.The proposed method provides an efficient and reliable approach for kick warning in drilling operations,offering strong practical value and technical support for the intelligent management of future drilling operations.
基金supported by the Shanghai Municipal Science and Technology Major Project[grant number 2021SHZD ZX0100]the Fundamental Research Funds for the Central Universities[grant number 2021SHZDZX0100].
文摘The outbreak and subsequent recurring waves of COVID−19 pose threats on the emergency management and people's daily life,while the large-scale spatio-temporal epidemiological data have sure come in handy in epidemic surveillance.Nonetheless,some challenges remain to be addressed in terms of multi-source heterogeneous data fusion,deep mining,and comprehensive applications.The Spatio-Temporal Artificial Intelligence(STAI)technology,which focuses on integrating spatial related time-series data,artificial intelligence models,and digital tools to provide intelligent computing platforms and applications,opens up new opportunities for scientific epidemic control.To this end,we leverage STAI and long-term experience in location-based intelligent services in the work.Specifically,we devise and develop a STAI-driven digital infrastructure,namely,WAYZ Disease Control Intelligent Platform(WDCIP),which consists of a systematic framework for building pipelines from automatic spatio-temporal data collection,processing to AI-based analysis and inference implementation for providing appropriate applications serving various epidemic scenarios.According to the platform implementation logic,our work can be performed and summarized from three aspects:(1)a STAI-driven integrated system;(2)a hybrid GNN-based approach for hierarchical risk assessment(as the core algorithm of WDCIP);and(3)comprehensive applications for social epidemic containment.This work makes a pivotal contribution to facilitating the aggregation and full utilization of spatio-temporal epidemic data from multiple sources,where the real-time human mobility data generated by high-precision mobile positioning plays a vital role in sensing the spread of the epidemic.So far,WDCIP has accumulated more than 200 million users who have been served in life convenience and decision-making during the pandemic.
文摘Wind power is one of the fastest-growing renewable energy sectors instrumental in the ongoing decarbonizationprocess. However, wind turbines are subjected to a wide range of dynamic loads which can cause more frequentfailures and downtime periods, leading to ever-increasing attention to effective Condition Monitoring strategies.In this paper, we propose a novel unsupervised deep anomaly detection framework to detect anomalies in windturbines based on SCADA data. We introduce a promising neural architecture, namely a Graph ConvolutionalAutoencoder for Multivariate Time series, to model the sensor network as a dynamical functional graph. Thisstructure improves the unsupervised learning capabilities of Autoencoders by considering individual sensormeasurements together with the nonlinear correlations existing among signals. On this basis, we developeda deep anomaly detection framework that was validated on 12 failure events occurred during 20 months ofoperation of four wind turbines. The results show that the proposed framework successfully detects anomaliesand anticipates SCADA alarms by outperforming other two recent neural approaches.
基金supported in part by the National Natural Science Foundation of China under Grant 52372393,62003238in part by the DongfengTechnology Center(Research and Application of Next-Generation Low-Carbonntelligent Architecture Technology).
文摘Ensuring the safe and efficient operation of self-driving vehicles relies heavily on accurately predicting their future trajectories.Existing approaches commonly employ an encoder-decoder neural network structure to enhance information extraction during the encoding phase.However,these methods often neglect the inclusion of road rule constraints during trajectory formulation in the decoding phase.This paper proposes a novel method that combines neural networks and rule-based constraints in the decoder stage to improve trajectory prediction accuracy while ensuring compliance with vehicle kinematics and road rules.The approach separates vehicle trajectories into lateral and longitudinal routes and utilizes conditional variational autoencoder(CVAE)to capture trajectory uncertainty.The evaluation results demonstrate a reduction of 32.4%and 27.6%in the average displacement error(ADE)for predicting the top five and top ten trajectories,respectively,compared to the baseline method.
基金supported by the National Natural Science Foundation of China under Grant No.32300523the Shanghai Sailing Program under Grant No.22YF1401700+1 种基金the Fundamental Research Funds for the Central Universities of China under Grant No.2232022Dthe Shanghai Science and Technology Program under Grant No.20DZ2251400.
文摘Joint analysis of multimodalities in spatial mass spectrometry imaging(SMSI)data,including histology,spatial location,and molecule data,allows us to gain novel insights into tissue structures.However,the significant differences in characteristics such as scale and heterogeneity among the multimodal data,coupled with the high noise levels and uneven quality of MSI data,severely hinder their comprehensive analysis.Here,we introduce a cross-graph cycle attention model,MSCG,to learn efficient joint embeddings for multimodalities of SMSI data by integrating graph attention autoencoders and attention-transfer.Specifically,MSCG enables leveraging one modality(e.g.,histology)to fine-tune the graph neural network trained for another modality(e.g.,MSI).Our study on real datasets from different platforms highlights the superior capacities of MSCG in dissecting cellular heterogeneity,as well as in denoising and aggregating MSI data.Notably,MSCG demonstrates versatile applicability across MSI data from various platforms,showcasing its potential for broad utility in this field.
基金supported by grants from the National Key R&D Program of China(2021YFF1200903)the National Natural Science Foundation of China(62273364,11931019,11871070,and 62362062)+2 种基金the Guangdong Basic and Applied Basic Research Foundation(2020B1515020047)Fundamental Research Funds for the Central Universities,Sun Yat-sen University(231lgbj025)the open fund of Information Materials and Intelligent Sensing Laboratory of Anhui Province(grant no.IMIS202105).
文摘Recent advancements in spatial transcriptomics(ST)technologies offer unprecedented opportunities to unveil the spatial heterogeneity of gene expression and cell states within tissues.Despite these capabilities of the ST data,accurately dissecting spatiotemporal structures(e.g.,spatial domains,temporal trajectories,and functional interactions)remains challenging.Here,we introduce a computational framework,PearlST(partial differential equation[PDE]-enhanced adversarial graph autoencoder of ST),for accurate inference of spatiotemporal structures from the ST data using PDE-enhanced adversarial graph autoencoder.PearlST employs contrastive learning to extract histological image features,integrates a PDE-based diffusion model to enhance characterization of spatial features at domain boundaries,and learns the latent low-dimensional embeddings via Wasserstein adversarial regularized graph autoencoders.Comparative analyses across multiple ST datasets with varying resolutions demonstrate that PearlST outperforms existing methods in spatial clustering,trajectory inference,and pseudotime analysis.Furthermore,PearlST elucidates functional regulations of the latent features by linking intercellular ligand-receptor interactions to most contributing genes of the low-dimensional embeddings,as illustrated in a human breast cancer dataset.Overall,PearlST proves to be a powerful tool for extracting interpretable latent features and dissecting intricate spatiotemporal structures in ST data across various biological contexts.