Detecting overlapping communities in attributed networks remains a significant challenge due to the complexity of jointly modeling topological structure and node attributes,the unknown number of communities,and the ne...Detecting overlapping communities in attributed networks remains a significant challenge due to the complexity of jointly modeling topological structure and node attributes,the unknown number of communities,and the need to capture nodes with multiple memberships.To address these issues,we propose a novel framework named density peaks clustering with neutrosophic C-means.First,we construct a consensus embedding by aligning structure-based and attribute-based representations using spectral decomposition and canonical correlation analysis.Then,an improved density peaks algorithm automatically estimates the number of communities and selects initial cluster centers based on a newly designed cluster strength metric.Finally,a neutrosophic C-means algorithm refines the community assignments,modeling uncertainty and overlap explicitly.Experimental results on synthetic and real-world networks demonstrate that the proposed method achieves superior performance in terms of detection accuracy,stability,and its ability to identify overlapping structures.展开更多
Drug repurposing offers a promising alternative to traditional drug development and significantly re-duces costs and timelines by identifying new therapeutic uses for existing drugs.However,the current approaches ofte...Drug repurposing offers a promising alternative to traditional drug development and significantly re-duces costs and timelines by identifying new therapeutic uses for existing drugs.However,the current approaches often rely on limited data sources and simplistic hypotheses,which restrict their ability to capture the multi-faceted nature of biological systems.This study introduces adaptive multi-view learning(AMVL),a novel methodology that integrates chemical-induced transcriptional profiles(CTPs),knowledge graph(KG)embeddings,and large language model(LLM)representations,to enhance drug repurposing predictions.AMVL incorporates an innovative similarity matrix expansion strategy and leverages multi-view learning(MVL),matrix factorization,and ensemble optimization techniques to integrate heterogeneous multi-source data.Comprehensive evaluations on benchmark datasets(Fdata-set,Cdataset,and Ydataset)and the large-scale iDrug dataset demonstrate that AMVL outperforms state-of-the-art(SOTA)methods,achieving superior accuracy in predicting drug-disease associations across multiple metrics.Literature-based validation further confirmed the model's predictive capabilities,with seven out of the top ten predictions corroborated by post-2011 evidence.To promote transparency and reproducibility,all data and codes used in this study were open-sourced,providing resources for pro-cessing CTPs,KG,and LLM-based similarity calculations,along with the complete AMVL algorithm and benchmarking procedures.By unifying diverse data modalities,AMVL offers a robust and scalable so-lution for accelerating drug discovery,fostering advancements in translational medicine and integrating multi-omics data.We aim to inspire further innovations in multi-source data integration and support the development of more precise and efficient strategies for advancing drug discovery and translational medicine.展开更多
Contrastive self‐supervised representation learning on attributed graph networks with Graph Neural Networks has attracted considerable research interest recently.However,there are still two challenges.First,most of t...Contrastive self‐supervised representation learning on attributed graph networks with Graph Neural Networks has attracted considerable research interest recently.However,there are still two challenges.First,most of the real‐word system are multiple relations,where entities are linked by different types of relations,and each relation is a view of the graph network.Second,the rich multi‐scale information(structure‐level and feature‐level)of the graph network can be seen as self‐supervised signals,which are not fully exploited.A novel contrastive self‐supervised representation learning framework on attributed multiplex graph networks with multi‐scale(named CoLM^(2)S)information is presented in this study.It mainly contains two components:intra‐relation contrast learning and interrelation contrastive learning.Specifically,the contrastive self‐supervised representation learning framework on attributed single‐layer graph networks with multi‐scale information(CoLMS)framework with the graph convolutional network as encoder to capture the intra‐relation information with multi‐scale structure‐level and feature‐level selfsupervised signals is introduced first.The structure‐level information includes the edge structure and sub‐graph structure,and the feature‐level information represents the output of different graph convolutional layer.Second,according to the consensus assumption among inter‐relations,the CoLM^(2)S framework is proposed to jointly learn various graph relations in attributed multiplex graph network to achieve global consensus node embedding.The proposed method can fully distil the graph information.Extensive experiments on unsupervised node clustering and graph visualisation tasks demonstrate the effectiveness of our methods,and it outperforms existing competitive baselines.展开更多
In multi-view image localization task,the features of the images captured from different views should be fused properly.This paper considers the classification-based image localization problem.We propose the relationa...In multi-view image localization task,the features of the images captured from different views should be fused properly.This paper considers the classification-based image localization problem.We propose the relational graph location network(RGLN)to perform this task.In this network,we propose a heterogeneous graph construction approach for graph classification tasks,which aims to describe the location in a more appropriate way,thereby improving the expression ability of the location representation module.Experiments show that the expression ability of the proposed graph construction approach outperforms the compared methods by a large margin.In addition,the proposed localization method outperforms the compared localization methods by around 1.7%in terms of meter-level accuracy.展开更多
Network modeling is an important approach in many fields in analyzing complex systems. Recently new series of methods have emerged, by using Kronecker product and similar tools to model real systems. One of such appro...Network modeling is an important approach in many fields in analyzing complex systems. Recently new series of methods have emerged, by using Kronecker product and similar tools to model real systems. One of such approaches is the multiplicative attribute graph(MAG) model, which generates networks based on category attributes of nodes. In this paper we try to extend this model into a continuous one, give an overview of its properties, and discuss some special cases related to real-world networks, as well as the influence of attribute distribution and affinity function respectively.展开更多
Multi-view clustering is a critical research area in computer science aimed at effectively extracting meaningful patterns from complex,high-dimensional data that single-view methods cannot capture.Traditional fuzzy cl...Multi-view clustering is a critical research area in computer science aimed at effectively extracting meaningful patterns from complex,high-dimensional data that single-view methods cannot capture.Traditional fuzzy clustering techniques,such as Fuzzy C-Means(FCM),face significant challenges in handling uncertainty and the dependencies between different views.To overcome these limitations,we introduce a new multi-view fuzzy clustering approach that integrates picture fuzzy sets with a dual-anchor graph method for multi-view data,aiming to enhance clustering accuracy and robustness,termed Multi-view Picture Fuzzy Clustering(MPFC).In particular,the picture fuzzy set theory extends the capability to represent uncertainty by modeling three membership levels:membership degrees,neutral degrees,and refusal degrees.This allows for a more flexible representation of uncertain and conflicting data than traditional fuzzy models.Meanwhile,dual-anchor graphs exploit the similarity relationships between data points and integrate information across views.This combination improves stability,scalability,and robustness when handling noisy and heterogeneous data.Experimental results on several benchmark datasets demonstrate significant improvements in clustering accuracy and efficiency,outperforming traditional methods.Specifically,the MPFC algorithm demonstrates outstanding clustering performance on a variety of datasets,attaining a Purity(PUR)score of 0.6440 and an Accuracy(ACC)score of 0.6213 for the 3 Sources dataset,underscoring its robustness and efficiency.The proposed approach significantly contributes to fields such as pattern recognition,multi-view relational data analysis,and large-scale clustering problems.Future work will focus on extending the method for semi-supervised multi-view clustering,aiming to enhance adaptability,scalability,and performance in real-world applications.展开更多
Computational analysis can accurately detect drug-gene interactions(DGIs)cost-effectively.However,transductive learning models are the hotspot to reveal the promising performance for unknown DGIs(both drugs and genes ...Computational analysis can accurately detect drug-gene interactions(DGIs)cost-effectively.However,transductive learning models are the hotspot to reveal the promising performance for unknown DGIs(both drugs and genes are present in the training model),without special attention to the unseen DGIs(both drugs and genes are absent in the training model).In view of this,this study,for the first time,proposed an inductive learning-based model for the precise identification of unseen DGIs.In our study,by integrating disease nodes to avoid data sparsity,a multi-relational drug-disease-gene(DDG)graph was constructed to achieve effective fusion of data on DDG intro-relationships and inter-actions.Following the extraction of graph features by utilizing graph embedding algorithms,our next step was the retrieval of the attributes of individual gene and drug nodes.In this way,a hybrid feature characterization was represented by integrating graph features and node attributes.Machine learning(ML)models were built,enabling the fulfillment of transductive predictions of unknown DGIs.To realize inductive learning,this study generated an innovative idea of transforming known node vectors derived from the DDG graph into representations of unseen nodes using node similarities as weights,enabling inductive predictions for the unseen DGIs.Consequently,the final model was superior to existing models,with significant improvement in predicting both external unknown and unseen DGIs.The practical feasibility of our model was further confirmed through case study and molecular docking.In summary,this study establishes an efficient data-driven approach through the proposed modeling,suggesting its value as a promising tool for accelerating drug discovery and repurposing.展开更多
Cross-domain graph anomaly detection(CD-GAD)is a promising task that leverages knowledge from a labelled source graph to guide anomaly detection on an unlabelled target graph.CD-GAD classifies anomalies as unique or c...Cross-domain graph anomaly detection(CD-GAD)is a promising task that leverages knowledge from a labelled source graph to guide anomaly detection on an unlabelled target graph.CD-GAD classifies anomalies as unique or common based on their presence in both the source and target graphs.However,existing models often fail to fully explore domain-unique knowledge of the target graph for detecting unique anomalies.Additionally,they tend to focus solely on node-level differences,overlooking structural-level differences that provide complementary information for common anomaly detection.To address these issues,we propose a novel method,Synthetic Graph Anomaly Detection via Graph Transfer and Graph Decouple(GTGD),which effectively detects common and unique anomalies in the target graph.Specifically,our approach ensures deeper learning of domain-unique knowledge by decoupling the reconstruction graphs of common and unique features.Moreover,we simulta-neously consider node-level and structural-level differences by transferring node and edge information from the source graph to the target graph,enabling comprehensive domain-common knowledge representation.Anomalies are detected using both common and unique features,with their synthetic score serving as the final result.Extensive experiments demonstrate the effectiveness of our approach,improving an average performance by 12.6%on the AUC-PR compared to state-of-the-art methods.展开更多
Graph similarity learning aims to calculate the similarity between pairs of graphs.Existing unsupervised graph similarity learning methods based on contrastive learning encounter challenges related to random graph aug...Graph similarity learning aims to calculate the similarity between pairs of graphs.Existing unsupervised graph similarity learning methods based on contrastive learning encounter challenges related to random graph augmentation strategies,which can harm the semantic and structural information of graphs and overlook the rich structural information present in subgraphs.To address these issues,we propose a graph similarity learning model based on learnable augmentation and multi-level contrastive learning.First,to tackle the problem of random augmentation disrupting the semantics and structure of the graph,we design a learnable augmentation method to selectively choose nodes and edges within the graph.To enhance contrastive levels,we employ a biased random walk method to generate corresponding subgraphs,enriching the contrastive hierarchy.Second,to solve the issue of previous work not considering multi-level contrastive learning,we utilize graph convolutional networks to learn node representations of augmented views and the original graph and calculate the interaction information between the attribute-augmented and structure-augmented views and the original graph.The goal is to maximize node consistency between different views and learn node matching between different graphs,resulting in node-level representations for each graph.Subgraph representations are then obtained through pooling operations,and we conduct contrastive learning utilizing both node and subgraph representations.Finally,the graph similarity score is computed according to different downstream tasks.We conducted three sets of experiments across eight datasets,and the results demonstrate that the proposed model effectively mitigates the issues of random augmentation damaging the original graph’s semantics and structure,as well as the insufficiency of contrastive levels.Additionally,the model achieves the best overall performance.展开更多
Existing multi-view deep subspace clustering methods aim to learn a unified representation from multi-view data,while the learned representation is difficult to maintain the underlying structure hidden in the origin s...Existing multi-view deep subspace clustering methods aim to learn a unified representation from multi-view data,while the learned representation is difficult to maintain the underlying structure hidden in the origin samples,especially the high-order neighbor relationship between samples.To overcome the above challenges,this paper proposes a novel multi-order neighborhood fusion based multi-view deep subspace clustering model.We creatively integrate the multi-order proximity graph structures of different views into the self-expressive layer by a multi-order neighborhood fusion module.By this design,the multi-order Laplacian matrix supervises the learning of the view-consistent self-representation affinity matrix;then,we can obtain an optimal global affinity matrix where each connected node belongs to one cluster.In addition,the discriminative constraint between views is designed to further improve the clustering performance.A range of experiments on six public datasets demonstrates that the method performs better than other advanced multi-view clustering methods.The code is available at https://github.com/songzuolong/MNF-MDSC(accessed on 25 December 2024).展开更多
During software development,developers tend to tangle multiple concerns into a single commit,resulting in many composite commits.This paper studies the problem of detecting and untangling composite commits,so as to im...During software development,developers tend to tangle multiple concerns into a single commit,resulting in many composite commits.This paper studies the problem of detecting and untangling composite commits,so as to improve the maintainability and understandability of software.Our approach is built upon the observation that both the textual content of code statements and the dependencies between code statements are helpful in comprehending the code commit.Based on this observation,we first construct an attributed graph for each commit,where code statements and various code dependencies are modeled as nodes and edges,respectively,and the textual bodies of code statements are maintained as node attributes.Based on the attributed graph,we propose graph-based learning algorithms that first detect whether the given commit is a composite commit,and then untangle the composite commit into atomic ones.We evaluate our approach on nine C#projects,and the results demonstrate the effectiveness and efficiency of our approach.展开更多
In this paper we propose a novel model "recursive directed graph" based on feature structure, and apply it to represent the semantic relations of postpositive attributive structures in biomedical texts. The usages o...In this paper we propose a novel model "recursive directed graph" based on feature structure, and apply it to represent the semantic relations of postpositive attributive structures in biomedical texts. The usages of postpositive attributive are complex and variable, especially three categories: present participle phrase, past participle phrase, and preposition phrase as postpositire attributive, which always bring the difficulties of automatic parsing. We summarize these categories and annotate the semantic information. Compared with dependency structure, feature structure, being recursive directed graph, enhances semantic information extraction in biomedical field. The annotation results show that recursive directed graph is more suitable to extract complex semantic relations for biomedical text mining.展开更多
The identification of design pattern instances is important for program understanding and software maintenance. Aiming at the mining of design patterns in existing systems, this paper proposes a subgraph isomorphism a...The identification of design pattern instances is important for program understanding and software maintenance. Aiming at the mining of design patterns in existing systems, this paper proposes a subgraph isomorphism approach to discover several design patterns in a legacy system at a time. The attributed relational graph is used to describe design patterns and legacy systems. The sub-graph isomorphism approach consists of decomposition and composition process. During the decomposition process, graphs corresponding to the design patterns are decomposed into sub-graphs, some of which are graphs corresponding to the elemental design patterns. The composition process tries to get sub-graph isomorphism of the matched graph if sub-graph isomorphism of each subgraph is obtained. Due to the common structures between design patterns, the proposed approach can reduce the matching times of entities and relations. Compared with the existing methods, the proposed algorithm is not linearly dependent on the number of design pattern graphs. Key words design pattern mining - attributed relational graph - subgraph isomorphism CLC number TP 311.5 Foundation item: Supported by the National Natural Science Foundation of China (60273075) and the Science Foundation of Naval University of Engineering (HGDJJ03019)Biography: LI Qing-hua (1940-), male, Professor, research direction: parallel computing.展开更多
Graph transformation systems have become a general formal modeling language to describe many models in software development process.Behavioral modeling of dynamic systems and model-to-model transformations are only a ...Graph transformation systems have become a general formal modeling language to describe many models in software development process.Behavioral modeling of dynamic systems and model-to-model transformations are only a few examples in which graphs have been used to software development.But even the perfect graph transformation system must be equipped with automated analysis capabilities to let users understand whether such a formal specification fulfills their requirements.In this paper,we present a new solution to verify graph transformation systems using the Bogor model checker.The attributed graph grammars(AGG)-like graph transformation systems are translated to Bandera intermediate representation(BIR),the input language of Bogor,and Bogor verifies the model against some interesting properties defined by combining linear temporal logic(LTL) and special-purpose graph rules.Experimental results are encouraging,showing that in most cases our solution improves existing approaches in terms of both performance and expressiveness.展开更多
现有的多视图属性图聚类方法通常是在融合多个视图的统一表示中学习一致信息与互补信息,然而先融合再学习的方法不仅会损失原始各个视图的特定信息,而且统一表示难以兼顾一致性与互补性.为了保留各个视图的原始信息,采用先学习再融合的...现有的多视图属性图聚类方法通常是在融合多个视图的统一表示中学习一致信息与互补信息,然而先融合再学习的方法不仅会损失原始各个视图的特定信息,而且统一表示难以兼顾一致性与互补性.为了保留各个视图的原始信息,采用先学习再融合的方式,先分别学习每个视图的共享表示与特定表示再进行融合,更细粒度地学习多视图的一致信息和互补信息,构建一种基于共享和特定表示的多视图属性图聚类模型(multi-view attribute graph clustering based on shared and specific representation,MSAGC).具体来说,首先通过多视图编码器获得每个视图的初级表示,进而获得每个视图的共享信息和特定信息;然后对齐视图共享信息来学习多视图的一致信息,联合视图特定信息来利用多视图的互补信息,通过差异性约束来处理冗余信息;之后训练多视图解码器重构图的拓扑结构和属性特征矩阵;最后,附加自监督聚类模块使得图表示的学习和聚类任务趋向一致.MSAGC的有效性在真实的多视图属性图数据集上得到了很好地验证.展开更多
基金supported by the Natural Science Foundation of China(Grant No.72571150)。
文摘Detecting overlapping communities in attributed networks remains a significant challenge due to the complexity of jointly modeling topological structure and node attributes,the unknown number of communities,and the need to capture nodes with multiple memberships.To address these issues,we propose a novel framework named density peaks clustering with neutrosophic C-means.First,we construct a consensus embedding by aligning structure-based and attribute-based representations using spectral decomposition and canonical correlation analysis.Then,an improved density peaks algorithm automatically estimates the number of communities and selects initial cluster centers based on a newly designed cluster strength metric.Finally,a neutrosophic C-means algorithm refines the community assignments,modeling uncertainty and overlap explicitly.Experimental results on synthetic and real-world networks demonstrate that the proposed method achieves superior performance in terms of detection accuracy,stability,and its ability to identify overlapping structures.
基金supported by the National Natural Science Foundation of China(Grant No.:62101087)the China Postdoctoral Science Foundation(Grant No.:2021MD703942)+2 种基金the Chongqing Postdoctoral Research Project Special Funding,China(Grant No.:2021XM2016)the Science Foundation of Chongqing Municipal Commission of Education,China(Grant No.:KJQN202100642)the Chongqing Natural Science Foundation,China(Grant No.:cstc2021jcyj-msxmX0834).
文摘Drug repurposing offers a promising alternative to traditional drug development and significantly re-duces costs and timelines by identifying new therapeutic uses for existing drugs.However,the current approaches often rely on limited data sources and simplistic hypotheses,which restrict their ability to capture the multi-faceted nature of biological systems.This study introduces adaptive multi-view learning(AMVL),a novel methodology that integrates chemical-induced transcriptional profiles(CTPs),knowledge graph(KG)embeddings,and large language model(LLM)representations,to enhance drug repurposing predictions.AMVL incorporates an innovative similarity matrix expansion strategy and leverages multi-view learning(MVL),matrix factorization,and ensemble optimization techniques to integrate heterogeneous multi-source data.Comprehensive evaluations on benchmark datasets(Fdata-set,Cdataset,and Ydataset)and the large-scale iDrug dataset demonstrate that AMVL outperforms state-of-the-art(SOTA)methods,achieving superior accuracy in predicting drug-disease associations across multiple metrics.Literature-based validation further confirmed the model's predictive capabilities,with seven out of the top ten predictions corroborated by post-2011 evidence.To promote transparency and reproducibility,all data and codes used in this study were open-sourced,providing resources for pro-cessing CTPs,KG,and LLM-based similarity calculations,along with the complete AMVL algorithm and benchmarking procedures.By unifying diverse data modalities,AMVL offers a robust and scalable so-lution for accelerating drug discovery,fostering advancements in translational medicine and integrating multi-omics data.We aim to inspire further innovations in multi-source data integration and support the development of more precise and efficient strategies for advancing drug discovery and translational medicine.
基金support by the National Natural Science Foundation of China(NSFC)under grant number 61873274.
文摘Contrastive self‐supervised representation learning on attributed graph networks with Graph Neural Networks has attracted considerable research interest recently.However,there are still two challenges.First,most of the real‐word system are multiple relations,where entities are linked by different types of relations,and each relation is a view of the graph network.Second,the rich multi‐scale information(structure‐level and feature‐level)of the graph network can be seen as self‐supervised signals,which are not fully exploited.A novel contrastive self‐supervised representation learning framework on attributed multiplex graph networks with multi‐scale(named CoLM^(2)S)information is presented in this study.It mainly contains two components:intra‐relation contrast learning and interrelation contrastive learning.Specifically,the contrastive self‐supervised representation learning framework on attributed single‐layer graph networks with multi‐scale information(CoLMS)framework with the graph convolutional network as encoder to capture the intra‐relation information with multi‐scale structure‐level and feature‐level selfsupervised signals is introduced first.The structure‐level information includes the edge structure and sub‐graph structure,and the feature‐level information represents the output of different graph convolutional layer.Second,according to the consensus assumption among inter‐relations,the CoLM^(2)S framework is proposed to jointly learn various graph relations in attributed multiplex graph network to achieve global consensus node embedding.The proposed method can fully distil the graph information.Extensive experiments on unsupervised node clustering and graph visualisation tasks demonstrate the effectiveness of our methods,and it outperforms existing competitive baselines.
文摘In multi-view image localization task,the features of the images captured from different views should be fused properly.This paper considers the classification-based image localization problem.We propose the relational graph location network(RGLN)to perform this task.In this network,we propose a heterogeneous graph construction approach for graph classification tasks,which aims to describe the location in a more appropriate way,thereby improving the expression ability of the location representation module.Experiments show that the expression ability of the proposed graph construction approach outperforms the compared methods by a large margin.In addition,the proposed localization method outperforms the compared localization methods by around 1.7%in terms of meter-level accuracy.
基金the National Natural Science Foundation of China(No.61379074)the Zhejiang Provincial Natural Science Foundation of China(No.LZ12F02003)
文摘Network modeling is an important approach in many fields in analyzing complex systems. Recently new series of methods have emerged, by using Kronecker product and similar tools to model real systems. One of such approaches is the multiplicative attribute graph(MAG) model, which generates networks based on category attributes of nodes. In this paper we try to extend this model into a continuous one, give an overview of its properties, and discuss some special cases related to real-world networks, as well as the influence of attribute distribution and affinity function respectively.
基金funded by the Research Project:THTETN.05/24-25,VietnamAcademy of Science and Technology.
文摘Multi-view clustering is a critical research area in computer science aimed at effectively extracting meaningful patterns from complex,high-dimensional data that single-view methods cannot capture.Traditional fuzzy clustering techniques,such as Fuzzy C-Means(FCM),face significant challenges in handling uncertainty and the dependencies between different views.To overcome these limitations,we introduce a new multi-view fuzzy clustering approach that integrates picture fuzzy sets with a dual-anchor graph method for multi-view data,aiming to enhance clustering accuracy and robustness,termed Multi-view Picture Fuzzy Clustering(MPFC).In particular,the picture fuzzy set theory extends the capability to represent uncertainty by modeling three membership levels:membership degrees,neutral degrees,and refusal degrees.This allows for a more flexible representation of uncertain and conflicting data than traditional fuzzy models.Meanwhile,dual-anchor graphs exploit the similarity relationships between data points and integrate information across views.This combination improves stability,scalability,and robustness when handling noisy and heterogeneous data.Experimental results on several benchmark datasets demonstrate significant improvements in clustering accuracy and efficiency,outperforming traditional methods.Specifically,the MPFC algorithm demonstrates outstanding clustering performance on a variety of datasets,attaining a Purity(PUR)score of 0.6440 and an Accuracy(ACC)score of 0.6213 for the 3 Sources dataset,underscoring its robustness and efficiency.The proposed approach significantly contributes to fields such as pattern recognition,multi-view relational data analysis,and large-scale clustering problems.Future work will focus on extending the method for semi-supervised multi-view clustering,aiming to enhance adaptability,scalability,and performance in real-world applications.
基金funded by the National Natural Science Foundation of China(Grant No.:22173065)the Sichuan International Science and Technology Innovation Cooperation Project,China(Grant No.:24GJHZ0431).
文摘Computational analysis can accurately detect drug-gene interactions(DGIs)cost-effectively.However,transductive learning models are the hotspot to reveal the promising performance for unknown DGIs(both drugs and genes are present in the training model),without special attention to the unseen DGIs(both drugs and genes are absent in the training model).In view of this,this study,for the first time,proposed an inductive learning-based model for the precise identification of unseen DGIs.In our study,by integrating disease nodes to avoid data sparsity,a multi-relational drug-disease-gene(DDG)graph was constructed to achieve effective fusion of data on DDG intro-relationships and inter-actions.Following the extraction of graph features by utilizing graph embedding algorithms,our next step was the retrieval of the attributes of individual gene and drug nodes.In this way,a hybrid feature characterization was represented by integrating graph features and node attributes.Machine learning(ML)models were built,enabling the fulfillment of transductive predictions of unknown DGIs.To realize inductive learning,this study generated an innovative idea of transforming known node vectors derived from the DDG graph into representations of unseen nodes using node similarities as weights,enabling inductive predictions for the unseen DGIs.Consequently,the final model was superior to existing models,with significant improvement in predicting both external unknown and unseen DGIs.The practical feasibility of our model was further confirmed through case study and molecular docking.In summary,this study establishes an efficient data-driven approach through the proposed modeling,suggesting its value as a promising tool for accelerating drug discovery and repurposing.
基金supported by the National Nature Science Foundation of China,Grant/Award Numbers:62337001,62037001“Pioneer”and“Leading Goose”R&D Program of Zhejiang,Grant/Award Number:2022C03106.
文摘Cross-domain graph anomaly detection(CD-GAD)is a promising task that leverages knowledge from a labelled source graph to guide anomaly detection on an unlabelled target graph.CD-GAD classifies anomalies as unique or common based on their presence in both the source and target graphs.However,existing models often fail to fully explore domain-unique knowledge of the target graph for detecting unique anomalies.Additionally,they tend to focus solely on node-level differences,overlooking structural-level differences that provide complementary information for common anomaly detection.To address these issues,we propose a novel method,Synthetic Graph Anomaly Detection via Graph Transfer and Graph Decouple(GTGD),which effectively detects common and unique anomalies in the target graph.Specifically,our approach ensures deeper learning of domain-unique knowledge by decoupling the reconstruction graphs of common and unique features.Moreover,we simulta-neously consider node-level and structural-level differences by transferring node and edge information from the source graph to the target graph,enabling comprehensive domain-common knowledge representation.Anomalies are detected using both common and unique features,with their synthetic score serving as the final result.Extensive experiments demonstrate the effectiveness of our approach,improving an average performance by 12.6%on the AUC-PR compared to state-of-the-art methods.
文摘Graph similarity learning aims to calculate the similarity between pairs of graphs.Existing unsupervised graph similarity learning methods based on contrastive learning encounter challenges related to random graph augmentation strategies,which can harm the semantic and structural information of graphs and overlook the rich structural information present in subgraphs.To address these issues,we propose a graph similarity learning model based on learnable augmentation and multi-level contrastive learning.First,to tackle the problem of random augmentation disrupting the semantics and structure of the graph,we design a learnable augmentation method to selectively choose nodes and edges within the graph.To enhance contrastive levels,we employ a biased random walk method to generate corresponding subgraphs,enriching the contrastive hierarchy.Second,to solve the issue of previous work not considering multi-level contrastive learning,we utilize graph convolutional networks to learn node representations of augmented views and the original graph and calculate the interaction information between the attribute-augmented and structure-augmented views and the original graph.The goal is to maximize node consistency between different views and learn node matching between different graphs,resulting in node-level representations for each graph.Subgraph representations are then obtained through pooling operations,and we conduct contrastive learning utilizing both node and subgraph representations.Finally,the graph similarity score is computed according to different downstream tasks.We conducted three sets of experiments across eight datasets,and the results demonstrate that the proposed model effectively mitigates the issues of random augmentation damaging the original graph’s semantics and structure,as well as the insufficiency of contrastive levels.Additionally,the model achieves the best overall performance.
基金supported by the National Key R&D Program of China(2023YFC3304600).
文摘Existing multi-view deep subspace clustering methods aim to learn a unified representation from multi-view data,while the learned representation is difficult to maintain the underlying structure hidden in the origin samples,especially the high-order neighbor relationship between samples.To overcome the above challenges,this paper proposes a novel multi-order neighborhood fusion based multi-view deep subspace clustering model.We creatively integrate the multi-order proximity graph structures of different views into the self-expressive layer by a multi-order neighborhood fusion module.By this design,the multi-order Laplacian matrix supervises the learning of the view-consistent self-representation affinity matrix;then,we can obtain an optimal global affinity matrix where each connected node belongs to one cluster.In addition,the discriminative constraint between views is designed to further improve the clustering performance.A range of experiments on six public datasets demonstrates that the method performs better than other advanced multi-view clustering methods.The code is available at https://github.com/songzuolong/MNF-MDSC(accessed on 25 December 2024).
基金supported by the National Natural Science Foundation of China under Grant No.62025202the Fundamental Research Funds for the Central Universities under Grant No.020214380102.
文摘During software development,developers tend to tangle multiple concerns into a single commit,resulting in many composite commits.This paper studies the problem of detecting and untangling composite commits,so as to improve the maintainability and understandability of software.Our approach is built upon the observation that both the textual content of code statements and the dependencies between code statements are helpful in comprehending the code commit.Based on this observation,we first construct an attributed graph for each commit,where code statements and various code dependencies are modeled as nodes and edges,respectively,and the textual bodies of code statements are maintained as node attributes.Based on the attributed graph,we propose graph-based learning algorithms that first detect whether the given commit is a composite commit,and then untangle the composite commit into atomic ones.We evaluate our approach on nine C#projects,and the results demonstrate the effectiveness and efficiency of our approach.
基金Supported by the National Natural Science Foundation of China(61202193,61202304)the Major Projects of Chinese National Social Science Foundation(11&ZD189)the Chinese Postdoctoral Science Foundation(2013M540593,2014T70722)
文摘In this paper we propose a novel model "recursive directed graph" based on feature structure, and apply it to represent the semantic relations of postpositive attributive structures in biomedical texts. The usages of postpositive attributive are complex and variable, especially three categories: present participle phrase, past participle phrase, and preposition phrase as postpositire attributive, which always bring the difficulties of automatic parsing. We summarize these categories and annotate the semantic information. Compared with dependency structure, feature structure, being recursive directed graph, enhances semantic information extraction in biomedical field. The annotation results show that recursive directed graph is more suitable to extract complex semantic relations for biomedical text mining.
文摘The identification of design pattern instances is important for program understanding and software maintenance. Aiming at the mining of design patterns in existing systems, this paper proposes a subgraph isomorphism approach to discover several design patterns in a legacy system at a time. The attributed relational graph is used to describe design patterns and legacy systems. The sub-graph isomorphism approach consists of decomposition and composition process. During the decomposition process, graphs corresponding to the design patterns are decomposed into sub-graphs, some of which are graphs corresponding to the elemental design patterns. The composition process tries to get sub-graph isomorphism of the matched graph if sub-graph isomorphism of each subgraph is obtained. Due to the common structures between design patterns, the proposed approach can reduce the matching times of entities and relations. Compared with the existing methods, the proposed algorithm is not linearly dependent on the number of design pattern graphs. Key words design pattern mining - attributed relational graph - subgraph isomorphism CLC number TP 311.5 Foundation item: Supported by the National Natural Science Foundation of China (60273075) and the Science Foundation of Naval University of Engineering (HGDJJ03019)Biography: LI Qing-hua (1940-), male, Professor, research direction: parallel computing.
文摘Graph transformation systems have become a general formal modeling language to describe many models in software development process.Behavioral modeling of dynamic systems and model-to-model transformations are only a few examples in which graphs have been used to software development.But even the perfect graph transformation system must be equipped with automated analysis capabilities to let users understand whether such a formal specification fulfills their requirements.In this paper,we present a new solution to verify graph transformation systems using the Bogor model checker.The attributed graph grammars(AGG)-like graph transformation systems are translated to Bandera intermediate representation(BIR),the input language of Bogor,and Bogor verifies the model against some interesting properties defined by combining linear temporal logic(LTL) and special-purpose graph rules.Experimental results are encouraging,showing that in most cases our solution improves existing approaches in terms of both performance and expressiveness.
文摘现有的多视图属性图聚类方法通常是在融合多个视图的统一表示中学习一致信息与互补信息,然而先融合再学习的方法不仅会损失原始各个视图的特定信息,而且统一表示难以兼顾一致性与互补性.为了保留各个视图的原始信息,采用先学习再融合的方式,先分别学习每个视图的共享表示与特定表示再进行融合,更细粒度地学习多视图的一致信息和互补信息,构建一种基于共享和特定表示的多视图属性图聚类模型(multi-view attribute graph clustering based on shared and specific representation,MSAGC).具体来说,首先通过多视图编码器获得每个视图的初级表示,进而获得每个视图的共享信息和特定信息;然后对齐视图共享信息来学习多视图的一致信息,联合视图特定信息来利用多视图的互补信息,通过差异性约束来处理冗余信息;之后训练多视图解码器重构图的拓扑结构和属性特征矩阵;最后,附加自监督聚类模块使得图表示的学习和聚类任务趋向一致.MSAGC的有效性在真实的多视图属性图数据集上得到了很好地验证.