Contrastive graph clustering(CGC)has become a prominent method for self-supervised representation learning by contrasting augmented graph data pairs.However,the performance of CGC methods critically depends on the cho...Contrastive graph clustering(CGC)has become a prominent method for self-supervised representation learning by contrasting augmented graph data pairs.However,the performance of CGC methods critically depends on the choice of data augmentation,which usually limits the capacity of network generalization.Besides,most existing methods characterize positive and negative samples based on the nodes themselves,ignoring the influence of neighbors with different hop numbers on the node.In this study,a novel self-cumulative contrastive graph clustering(SC-CGC)method is devised,which is capable of dynamically adjusting the influence of neighbors with different hops.Our intuition is that better neighbors are closer and distant ones are further away in their feature space,thus we can perform neighbor contrasting without data augmentation.To be specific,SC-CGC relies on two neural networks,i.e.,autoencoder network(AE)and graph autoencoder network(GAE),to encode the node information and graph structure,respectively.To make these two networks interact and learn from each other,a dynamic fusion mechanism is devised to transfer the knowledge learned by AE to the corresponding GAE layer by layer.Then,a self-cumulative contrastive loss function is designed to characterize the structural information by dynamically accumulating the influence of the nodes with different hops.Finally,our approach simultaneously refines the representation learning and clustering assignments in a self-supervised manner.Extensive experiments on 8 realistic datasets demonstrate that SC-CGC consistently performs better over SOTA techniques.The code is available at https://github.com/Xiaoqiang-Yan/JAS-SCCGC.展开更多
Attribute-graph clustering aims to divide the graph nodes into distinct clusters in an unsupervised manner,which usually encodes the node attribute feature and the corresponding graph structure into a latent feature s...Attribute-graph clustering aims to divide the graph nodes into distinct clusters in an unsupervised manner,which usually encodes the node attribute feature and the corresponding graph structure into a latent feature space.However,traditional attribute-graph clustering methods often neglect the effect of neighbor information on clustering,leading to suboptimal clustering results as they fail to fully leverage the rich contextual information provided by neighboring nodes,which is crucial for capturing the intrinsic relationships between nodes and improving clustering performance.In this paper,we propose a novel Neighbor Dual-Consistency Constrained Attribute-Graph Clustering that leverages information from neighboring nodes in two significant aspects:neighbor feature consistency and neighbor distribution consistency.To enhance feature consistency among nodes and their neighbors,we introduce a neighbor contrastive loss that encourages the embeddings of nodes to be closer to those of their similar neighbors in the feature space while pushing them further apart from dissimilar neighbors.This method helps the model better capture local feature information.Furthermore,to ensure consistent cluster assignments between nodes and their neighbors,we introduce a neighbor distribution consistency module,which combines structural information from the graph with similarity of attributes to align cluster assignments between nodes and their neighbors.By integrating both local structural information and global attribute information,our approach effectively captures comprehensive patterns within the graph.Overall,our method demonstrates superior performance in capturing comprehensive patterns within the graph and achieves state-of-the-art clustering results on multiple datasets.展开更多
How to find these communities is an important research work. Recently, community discovery are mainly categorized to HITS algorithm, bipartite cores algorithm and maximum flow/minimum cut framework. In this paper, we ...How to find these communities is an important research work. Recently, community discovery are mainly categorized to HITS algorithm, bipartite cores algorithm and maximum flow/minimum cut framework. In this paper, we proposed a new method to extract communities. The MCL algorithm, which is short for the Markov Cluster Algorithm, a fast and scalable unsupervised cluster algorithm is used to extract communities. By putting mirror deleting procedure behind graph clustering, we decrease comparing cost considerably. After MCL and mirror deletion, we use community member select algorithm to produce the sets of community candidates. The experiment and results show the new method works effectively and properly.展开更多
Motif-based graph local clustering(MGLC)algorithms are gen-erally designed with the two-phase framework,which gets the motif weight for each edge beforehand and then conducts the local clustering algorithm on the weig...Motif-based graph local clustering(MGLC)algorithms are gen-erally designed with the two-phase framework,which gets the motif weight for each edge beforehand and then conducts the local clustering algorithm on the weighted graph to output the result.Despite correctness,this frame-work brings limitations on both practical and theoretical aspects and is less applicable in real interactive situations.This research develops a purely local and index-adaptive method,Index-adaptive Triangle-based Graph Local Clustering(TGLC+),to solve the MGLC problem w.r.t.triangle.TGLC+combines the approximated Monte-Carlo method Triangle-based Random Walk(TRW)and deterministic Brute-Force method Triangle-based Forward Push(TFP)adaptively to estimate the Personalized PageRank(PPR)vector without calculating the exact triangle-weighted transition probability and then outputs the clustering result by conducting the standard sweep procedure.This paper presents the efficiency of TGLC+through theoretical analysis and demonstrates its effectiveness through extensive experiments.To our knowl-edge,TGLC+is the first to solve the MGLC problem without computing the motif weight beforehand,thus achieving better efficiency with comparable effectiveness.TGLC+is suitable for large-scale and interactive graph analysis tasks,including visualization,system optimization,and decision-making.展开更多
Identifying composite crosscutting concerns(CCs) is a research task and challenge of aspect mining.In this paper,we propose a scatter-based graph clustering approach to identify composite CCs.Inspired by the state-o...Identifying composite crosscutting concerns(CCs) is a research task and challenge of aspect mining.In this paper,we propose a scatter-based graph clustering approach to identify composite CCs.Inspired by the state-of-the-art link analysis tech-niques,we propose a two-state model to approximate how CCs tangle with core modules.According to this model,we obtain scatter and centralization scores for each program element.Espe-cially,the scatter scores are adopted to select CC seeds.Further-more,to identify composite CCs,we adopt a novel similarity measurement and develop an undirected graph clustering to group these seeds.Finally,we compare it with the previous work and illustrate its effectiveness in identifying composite CCs.展开更多
To construct a high efficient text clustering algorithm the multilevel graph model and the refinement algorithm used in the uncoarsening phase is discussed. The model is applied to text clustering. The performance of ...To construct a high efficient text clustering algorithm the multilevel graph model and the refinement algorithm used in the uncoarsening phase is discussed. The model is applied to text clustering. The performance of clustering algorithm has to be improved with the refinement algorithm application. The experiment result demonstrated that the multilevel graph text clustering algorithm is available. Key words text clustering - multilevel coarsen graph model - refinement algorithm - high-dimensional clustering CLC number TP301 Foundation item: Supported by the National Natural Science Foundation of China (60173051)Biography: CHEN Jian-bin(1970-), male, Associate professor, Ph. D., research direction: data mining.展开更多
The graph can contain huge amount of data. It is heavily used for pattern recognition and matching tasks like symbol recognition, information retrieval, data mining etc. In all these applications, the objects or under...The graph can contain huge amount of data. It is heavily used for pattern recognition and matching tasks like symbol recognition, information retrieval, data mining etc. In all these applications, the objects or underlying data are represented in the form of graph and graph based matching is performed. The conventional algorithms of graph matching have higher complexity. This is because the most of the applications have large number of sub graphs and the matching of these sub graphs becomes computationally expensive. In this paper, we propose a graph based novel algorithm for fingerprint recognition. In our work we perform graph based clustering which reduces the computational complexity heavily. In our algorithm, we exploit structural features of the fingerprint for K-means clustering of the database. The proposed algorithm is evaluated using realtime fingerprint database and the simulation results show that our algorithm outperforms the existing algorithm for the same task.展开更多
知识图谱是一个跨学科研究主题,对推动语义理解、智能推理与大模型发展具有关键作用。文章从Web of Science核心合集中检索获取2010—2024年间1703篇知识图谱领域的研究论文,并通过h指数确定了78篇高被引文献。然后,使用VOSviewer构建...知识图谱是一个跨学科研究主题,对推动语义理解、智能推理与大模型发展具有关键作用。文章从Web of Science核心合集中检索获取2010—2024年间1703篇知识图谱领域的研究论文,并通过h指数确定了78篇高被引文献。然后,使用VOSviewer构建共被引网络并进行聚类分析,最终结合内容分析方法对识别出的50篇核心文献进行主题归纳与深入解读。研究发现,文献共被引方法能有效识别领域内的核心文献集群。研究选择的知识图谱领域的50篇核心文献,可归纳为四个主要研究方向:知识图谱理论基础与构建方法、知识图谱嵌入、基于知识图谱的知识推理以及基于知识图谱的推荐系统。展开更多
基金supported by the National Natural Science Foundation of China(62371423,62450002,62425107)China Postdoctoral Science Foundation(2020M682357).
文摘Contrastive graph clustering(CGC)has become a prominent method for self-supervised representation learning by contrasting augmented graph data pairs.However,the performance of CGC methods critically depends on the choice of data augmentation,which usually limits the capacity of network generalization.Besides,most existing methods characterize positive and negative samples based on the nodes themselves,ignoring the influence of neighbors with different hop numbers on the node.In this study,a novel self-cumulative contrastive graph clustering(SC-CGC)method is devised,which is capable of dynamically adjusting the influence of neighbors with different hops.Our intuition is that better neighbors are closer and distant ones are further away in their feature space,thus we can perform neighbor contrasting without data augmentation.To be specific,SC-CGC relies on two neural networks,i.e.,autoencoder network(AE)and graph autoencoder network(GAE),to encode the node information and graph structure,respectively.To make these two networks interact and learn from each other,a dynamic fusion mechanism is devised to transfer the knowledge learned by AE to the corresponding GAE layer by layer.Then,a self-cumulative contrastive loss function is designed to characterize the structural information by dynamically accumulating the influence of the nodes with different hops.Finally,our approach simultaneously refines the representation learning and clustering assignments in a self-supervised manner.Extensive experiments on 8 realistic datasets demonstrate that SC-CGC consistently performs better over SOTA techniques.The code is available at https://github.com/Xiaoqiang-Yan/JAS-SCCGC.
基金supported by National Natural Science Foundation of China(Nos.62272015,62441232).
文摘Attribute-graph clustering aims to divide the graph nodes into distinct clusters in an unsupervised manner,which usually encodes the node attribute feature and the corresponding graph structure into a latent feature space.However,traditional attribute-graph clustering methods often neglect the effect of neighbor information on clustering,leading to suboptimal clustering results as they fail to fully leverage the rich contextual information provided by neighboring nodes,which is crucial for capturing the intrinsic relationships between nodes and improving clustering performance.In this paper,we propose a novel Neighbor Dual-Consistency Constrained Attribute-Graph Clustering that leverages information from neighboring nodes in two significant aspects:neighbor feature consistency and neighbor distribution consistency.To enhance feature consistency among nodes and their neighbors,we introduce a neighbor contrastive loss that encourages the embeddings of nodes to be closer to those of their similar neighbors in the feature space while pushing them further apart from dissimilar neighbors.This method helps the model better capture local feature information.Furthermore,to ensure consistent cluster assignments between nodes and their neighbors,we introduce a neighbor distribution consistency module,which combines structural information from the graph with similarity of attributes to align cluster assignments between nodes and their neighbors.By integrating both local structural information and global attribute information,our approach effectively captures comprehensive patterns within the graph.Overall,our method demonstrates superior performance in capturing comprehensive patterns within the graph and achieves state-of-the-art clustering results on multiple datasets.
基金Supported bythe 211 Project of Ministry of Educa-tion of China
文摘How to find these communities is an important research work. Recently, community discovery are mainly categorized to HITS algorithm, bipartite cores algorithm and maximum flow/minimum cut framework. In this paper, we proposed a new method to extract communities. The MCL algorithm, which is short for the Markov Cluster Algorithm, a fast and scalable unsupervised cluster algorithm is used to extract communities. By putting mirror deleting procedure behind graph clustering, we decrease comparing cost considerably. After MCL and mirror deletion, we use community member select algorithm to produce the sets of community candidates. The experiment and results show the new method works effectively and properly.
基金supported by the Fundamental Research Funds for the Central Universities(No.2020JS005).
文摘Motif-based graph local clustering(MGLC)algorithms are gen-erally designed with the two-phase framework,which gets the motif weight for each edge beforehand and then conducts the local clustering algorithm on the weighted graph to output the result.Despite correctness,this frame-work brings limitations on both practical and theoretical aspects and is less applicable in real interactive situations.This research develops a purely local and index-adaptive method,Index-adaptive Triangle-based Graph Local Clustering(TGLC+),to solve the MGLC problem w.r.t.triangle.TGLC+combines the approximated Monte-Carlo method Triangle-based Random Walk(TRW)and deterministic Brute-Force method Triangle-based Forward Push(TFP)adaptively to estimate the Personalized PageRank(PPR)vector without calculating the exact triangle-weighted transition probability and then outputs the clustering result by conducting the standard sweep procedure.This paper presents the efficiency of TGLC+through theoretical analysis and demonstrates its effectiveness through extensive experiments.To our knowl-edge,TGLC+is the first to solve the MGLC problem without computing the motif weight beforehand,thus achieving better efficiency with comparable effectiveness.TGLC+is suitable for large-scale and interactive graph analysis tasks,including visualization,system optimization,and decision-making.
基金Supported by the National Pre-research Project (513150601)
文摘Identifying composite crosscutting concerns(CCs) is a research task and challenge of aspect mining.In this paper,we propose a scatter-based graph clustering approach to identify composite CCs.Inspired by the state-of-the-art link analysis tech-niques,we propose a two-state model to approximate how CCs tangle with core modules.According to this model,we obtain scatter and centralization scores for each program element.Espe-cially,the scatter scores are adopted to select CC seeds.Further-more,to identify composite CCs,we adopt a novel similarity measurement and develop an undirected graph clustering to group these seeds.Finally,we compare it with the previous work and illustrate its effectiveness in identifying composite CCs.
文摘To construct a high efficient text clustering algorithm the multilevel graph model and the refinement algorithm used in the uncoarsening phase is discussed. The model is applied to text clustering. The performance of clustering algorithm has to be improved with the refinement algorithm application. The experiment result demonstrated that the multilevel graph text clustering algorithm is available. Key words text clustering - multilevel coarsen graph model - refinement algorithm - high-dimensional clustering CLC number TP301 Foundation item: Supported by the National Natural Science Foundation of China (60173051)Biography: CHEN Jian-bin(1970-), male, Associate professor, Ph. D., research direction: data mining.
文摘The graph can contain huge amount of data. It is heavily used for pattern recognition and matching tasks like symbol recognition, information retrieval, data mining etc. In all these applications, the objects or underlying data are represented in the form of graph and graph based matching is performed. The conventional algorithms of graph matching have higher complexity. This is because the most of the applications have large number of sub graphs and the matching of these sub graphs becomes computationally expensive. In this paper, we propose a graph based novel algorithm for fingerprint recognition. In our work we perform graph based clustering which reduces the computational complexity heavily. In our algorithm, we exploit structural features of the fingerprint for K-means clustering of the database. The proposed algorithm is evaluated using realtime fingerprint database and the simulation results show that our algorithm outperforms the existing algorithm for the same task.
文摘知识图谱是一个跨学科研究主题,对推动语义理解、智能推理与大模型发展具有关键作用。文章从Web of Science核心合集中检索获取2010—2024年间1703篇知识图谱领域的研究论文,并通过h指数确定了78篇高被引文献。然后,使用VOSviewer构建共被引网络并进行聚类分析,最终结合内容分析方法对识别出的50篇核心文献进行主题归纳与深入解读。研究发现,文献共被引方法能有效识别领域内的核心文献集群。研究选择的知识图谱领域的50篇核心文献,可归纳为四个主要研究方向:知识图谱理论基础与构建方法、知识图谱嵌入、基于知识图谱的知识推理以及基于知识图谱的推荐系统。