Knowledge graph completion(KGC)aims to fill in missing entities and relations within knowledge graphs(KGs)to address their incompleteness.Most existing KGC models suffer from knowledge coverage as they are designed to...Knowledge graph completion(KGC)aims to fill in missing entities and relations within knowledge graphs(KGs)to address their incompleteness.Most existing KGC models suffer from knowledge coverage as they are designed to operate within a single KG.In contrast,Multilingual KGC(MKGC)leverages seed pairs from different language KGs to facilitate knowledge transfer and enhance the completion of the target KG.Previous studies on MKGC based on graph neural networks(GNNs)have primarily focused on using relationaware GNNs to capture the combined features of neighboring entities and relations.However,these studies still have some shortcomings,particularly in the context of MKGCs.First,each language’s specific semantics,structures,and expressions contribute to the increased heterogeneity of the KG.Therefore,the completion of MKGCs necessitates a thorough consideration of the heterogeneity of the KG and the effective integration of its heterogeneous features.Second,MKGCs typically have a large graph scale due to the need to store and manage information from multiple languages.However,current relation-aware GNNs often inherit complex GNN operations,resulting in unnecessary complexity.Therefore,it is necessary to simplify GNN operations.To address these limitations,we propose a Simplified Multi-view Graph Neural Network(SMGNN)for MKGC.SM-GNN incorporates two simplified multiview GNNs as components.One GNN is utilized for learning multi-view graph features to complete the KG.The other generates new alignment pairs,facilitating knowledge transfer between different views of the KG.We simplify the two multiview GNNs by retaining feature propagation while discarding linear transformation and nonlinear activation to reduce unnecessary complexity and effectively leverage graph contextual information.Extensive experiments demonstrate that our proposed model outperforms competing baselines.The code and dataset are available at the website of github.com/dbbice/SM-GNN.展开更多
Drug repurposing offers a promising alternative to traditional drug development and significantly re-duces costs and timelines by identifying new therapeutic uses for existing drugs.However,the current approaches ofte...Drug repurposing offers a promising alternative to traditional drug development and significantly re-duces costs and timelines by identifying new therapeutic uses for existing drugs.However,the current approaches often rely on limited data sources and simplistic hypotheses,which restrict their ability to capture the multi-faceted nature of biological systems.This study introduces adaptive multi-view learning(AMVL),a novel methodology that integrates chemical-induced transcriptional profiles(CTPs),knowledge graph(KG)embeddings,and large language model(LLM)representations,to enhance drug repurposing predictions.AMVL incorporates an innovative similarity matrix expansion strategy and leverages multi-view learning(MVL),matrix factorization,and ensemble optimization techniques to integrate heterogeneous multi-source data.Comprehensive evaluations on benchmark datasets(Fdata-set,Cdataset,and Ydataset)and the large-scale iDrug dataset demonstrate that AMVL outperforms state-of-the-art(SOTA)methods,achieving superior accuracy in predicting drug-disease associations across multiple metrics.Literature-based validation further confirmed the model's predictive capabilities,with seven out of the top ten predictions corroborated by post-2011 evidence.To promote transparency and reproducibility,all data and codes used in this study were open-sourced,providing resources for pro-cessing CTPs,KG,and LLM-based similarity calculations,along with the complete AMVL algorithm and benchmarking procedures.By unifying diverse data modalities,AMVL offers a robust and scalable so-lution for accelerating drug discovery,fostering advancements in translational medicine and integrating multi-omics data.We aim to inspire further innovations in multi-source data integration and support the development of more precise and efficient strategies for advancing drug discovery and translational medicine.展开更多
The existing multi-view subspace clustering algorithms based on tensor singular value decomposition(t-SVD)predominantly utilize tensor nuclear norm to explore the intra view correlation between views of the same sampl...The existing multi-view subspace clustering algorithms based on tensor singular value decomposition(t-SVD)predominantly utilize tensor nuclear norm to explore the intra view correlation between views of the same samples,while neglecting the correlation among the samples within different views.Moreover,the tensor nuclear norm is not fully considered as a convex approximation of the tensor rank function.Treating different singular values equally may result in suboptimal tensor representation.A hypergraph regularized multi-view subspace clustering algorithm with dual tensor log-determinant(HRMSC-DTL)was proposed.The algorithm used subspace learning in each view to learn a specific set of affinity matrices,and introduced a non-convex tensor log-determinant function to replace the tensor nuclear norm to better improve global low-rankness.It also introduced hyper-Laplacian regularization to preserve the local geometric structure embedded in the high-dimensional space.Furthermore,it rotated the original tensor and incorporated a dual tensor mechanism to fully exploit the intra view correlation of the original tensor and the inter view correlation of the rotated tensor.At the same time,an alternating direction of multipliers method(ADMM)was also designed to solve non-convex optimization model.Experimental evaluations on seven widely used datasets,along with comparisons to several state-of-the-art algorithms,demonstrated the superiority and effectiveness of the HRMSC-DTL algorithm in terms of clustering performance.展开更多
In the international shipping industry, digital intelligence transformation has become essential, with both governments and enterprises actively working to integrate diverse datasets. The domain of maritime and shippi...In the international shipping industry, digital intelligence transformation has become essential, with both governments and enterprises actively working to integrate diverse datasets. The domain of maritime and shipping is characterized by a vast array of document types, filled with complex, large-scale, and often chaotic knowledge and relationships. Effectively managing these documents is crucial for developing a Large Language Model (LLM) in the maritime domain, enabling practitioners to access and leverage valuable information. A Knowledge Graph (KG) offers a state-of-the-art solution for enhancing knowledge retrieval, providing more accurate responses and enabling context-aware reasoning. This paper presents a framework for utilizing maritime and shipping documents to construct a knowledge graph using GraphRAG, a hybrid tool combining graph-based retrieval and generation capabilities. The extraction of entities and relationships from these documents and the KG construction process are detailed. Furthermore, the KG is integrated with an LLM to develop a Q&A system, demonstrating that the system significantly improves answer accuracy compared to traditional LLMs. Additionally, the KG construction process is up to 50% faster than conventional LLM-based approaches, underscoring the efficiency of our method. This study provides a promising approach to digital intelligence in shipping, advancing knowledge accessibility and decision-making.展开更多
The increasing popularity of the Internet and the widespread use of information technology have led to a rise in the number and sophistication of network attacks and security threats.Intrusion detection systems are cr...The increasing popularity of the Internet and the widespread use of information technology have led to a rise in the number and sophistication of network attacks and security threats.Intrusion detection systems are crucial to network security,playing a pivotal role in safeguarding networks from potential threats.However,in the context of an evolving landscape of sophisticated and elusive attacks,existing intrusion detection methodologies often overlook critical aspects such as changes in network topology over time and interactions between hosts.To address these issues,this paper proposes a real-time network intrusion detection method based on graph neural networks.The proposedmethod leverages the advantages of graph neural networks and employs a straightforward graph construction method to represent network traffic as dynamic graph-structured data.Additionally,a graph convolution operation with a multi-head attention mechanism is utilized to enhance the model’s ability to capture the intricate relationships within the graph structure comprehensively.Furthermore,it uses an integrated graph neural network to address dynamic graphs’structural and topological changes at different time points and the challenges of edge embedding in intrusion detection data.The edge classification problem is effectively transformed into node classification by employing a line graph data representation,which facilitates fine-grained intrusion detection tasks on dynamic graph node feature representations.The efficacy of the proposed method is evaluated using two commonly used intrusion detection datasets,UNSW-NB15 and NF-ToN-IoT-v2,and results are compared with previous studies in this field.The experimental results demonstrate that our proposed method achieves 99.3%and 99.96%accuracy on the two datasets,respectively,and outperforms the benchmark model in several evaluation metrics.展开更多
Phenotypic prediction is a promising strategy for accelerating plant breeding.Data from multiple sources(called multi-view data)can provide complementary information to characterize a biological object from various as...Phenotypic prediction is a promising strategy for accelerating plant breeding.Data from multiple sources(called multi-view data)can provide complementary information to characterize a biological object from various aspects.By integrating multi-view information into phenotypic prediction,a multi-view best linear unbiased prediction(MVBLUP)method is proposed in this paper.To measure the importance of multiple data views,the differential evolution algorithm with an early stopping mechanism is used,by which we obtain a multi-view kinship matrix and then incorporate it into the BLUP model for phenotypic prediction.To further illustrate the characteristics of MVBLUP,we perform the empirical experiments on four multi-view datasets in different crops.Compared to the single-view method,the prediction accuracy of the MVBLUP method has improved by 0.038–0.201 on average.The results demonstrate that the MVBLUP is an effective integrative prediction method for multi-view data.展开更多
With the emphasis on user privacy and communication security, encrypted traffic has increased dramatically, which brings great challenges to traffic classification. The classification method of encrypted traffic based...With the emphasis on user privacy and communication security, encrypted traffic has increased dramatically, which brings great challenges to traffic classification. The classification method of encrypted traffic based on GNN can deal with encrypted traffic well. However, existing GNN-based approaches ignore the relationship between client or server packets. In this paper, we design a network traffic topology based on GCN, called Flow Mapping Graph (FMG). FMG establishes sequential edges between vertexes by the arrival order of packets and establishes jump-order edges between vertexes by connecting packets in different bursts with the same direction. It not only reflects the time characteristics of the packet but also strengthens the relationship between the client or server packets. According to FMG, a Traffic Mapping Classification model (TMC-GCN) is designed, which can automatically capture and learn the characteristics and structure information of the top vertex in FMG. The TMC-GCN model is used to classify the encrypted traffic. The encryption stream classification problem is transformed into a graph classification problem, which can effectively deal with data from different data sources and application scenarios. By comparing the performance of TMC-GCN with other classical models in four public datasets, including CICIOT2023, ISCXVPN2016, CICAAGM2017, and GraphDapp, the effectiveness of the FMG algorithm is verified. The experimental results show that the accuracy rate of the TMC-GCN model is 96.13%, the recall rate is 95.04%, and the F1 rate is 94.54%.展开更多
A graph G is H-free,if it contains no H as a subgraph.A graph G is said to be H-minor free,if it does not contain H as a minor.In 2010,Nikiforov asked that what the maximum spectral radius of an H-free graph of order ...A graph G is H-free,if it contains no H as a subgraph.A graph G is said to be H-minor free,if it does not contain H as a minor.In 2010,Nikiforov asked that what the maximum spectral radius of an H-free graph of order n is.In this paper,we consider some Brualdi-Solheid-Turan type problems on bipartite graphs.In 2015,Zhai,Lin and Gong in[Linear Algebra Appl.,2015,471:21-27]proved that if G is a bipartite graph with order n≥2k+2 and ρ(G)≥ρ(K_(k,n-k)),then G contains a C_(2k+2) unless G≌K_(k,n-k).First,we give a new and more simple proof for the above theorem.Second,we prove that if G is a bipartite graph with order n≥2k+2 and ρ(G)≥ρ(K_(k,n-k)),then G contains all T_(2k+3) unless G≌K_(k,n-k).Finally,we prove that among all outerplanar bipartite graphs on n≥308026 vertices,K_(1,n-1) attains the maximum spectral radius.展开更多
Spectrum-based fault localization (SBFL) generates a ranked list of suspicious elements by using the program execution spectrum, but the excessive number of elements ranked in parallel results in low localization accu...Spectrum-based fault localization (SBFL) generates a ranked list of suspicious elements by using the program execution spectrum, but the excessive number of elements ranked in parallel results in low localization accuracy. Most researchers consider intra-class dependencies to improve localization accuracy. However, some studies show that inter-class method call type faults account for more than 20%, which means such methods still have certain limitations. To solve the above problems, this paper proposes a two-phase software fault localization based on relational graph convolutional neural networks (Two-RGCNFL). Firstly, in Phase 1, the method call dependence graph (MCDG) of the program is constructed, the intra-class and inter-class dependencies in MCDG are extracted by using the relational graph convolutional neural network, and the classifier is used to identify the faulty methods. Then, the GraphSMOTE algorithm is improved to alleviate the impact of class imbalance on classification accuracy. Aiming at the problem of parallel ranking of element suspicious values in traditional SBFL technology, in Phase 2, Doc2Vec is used to learn static features, while spectrum information serves as dynamic features. A RankNet model based on siamese multi-layer perceptron is constructed to score and rank statements in the faulty method. This work conducts experiments on 5 real projects of Defects4J benchmark. Experimental results show that, compared with the traditional SBFL technique and two baseline methods, our approach improves the Top-1 accuracy by 262.86%, 29.59% and 53.01%, respectively, which verifies the effectiveness of Two-RGCNFL. Furthermore, this work verifies the importance of inter-class dependencies through ablation experiments.展开更多
The concept of matching energy was proposed by Gutman and Wagner firstly in 2012. Let G be a simple graph of order n and λ1, λ2, . . . , λn be the zeros of its matching polynomial. The matching energy of a graph G ...The concept of matching energy was proposed by Gutman and Wagner firstly in 2012. Let G be a simple graph of order n and λ1, λ2, . . . , λn be the zeros of its matching polynomial. The matching energy of a graph G is defined as ME(G) = Pni=1 |λi|. By the famous Coulson’s formula, matching energies can also be calculated by an improper integral depending on a parameter. A k-claw attaching graph Gu(k) refers to the graph obtained by attaching k pendent edges to the graph G at the vertex u, where u is called the root of Gu(k). In this paper, we use some theories of mathematical analysis to obtain a new technique to compare the matching energies of two k-claw attaching graphs Gu(k) and Hv(k) with the same order, that is, limk→∞[ME(Gu(k)) − ME(Hv(k))] = ME(G − u) − ME(H − v). By the technique, we finally determine unicyclic graphs of order n with the 9th to 13th minimal matching energies for all n ≥ 58.展开更多
The ability to accurately predict urban traffic flows is crucial for optimising city operations.Consequently,various methods for forecasting urban traffic have been developed,focusing on analysing historical data to u...The ability to accurately predict urban traffic flows is crucial for optimising city operations.Consequently,various methods for forecasting urban traffic have been developed,focusing on analysing historical data to understand complex mobility patterns.Deep learning techniques,such as graph neural networks(GNNs),are popular for their ability to capture spatio-temporal dependencies.However,these models often become overly complex due to the large number of hyper-parameters involved.In this study,we introduce Dynamic Multi-Graph Spatial-Temporal Graph Neural Ordinary Differential Equation Networks(DMST-GNODE),a framework based on ordinary differential equations(ODEs)that autonomously discovers effective spatial-temporal graph neural network(STGNN)architectures for traffic prediction tasks.The comparative analysis of DMST-GNODE and baseline models indicates that DMST-GNODE model demonstrates superior performance across multiple datasets,consistently achieving the lowest Root Mean Square Error(RMSE)and Mean Absolute Error(MAE)values,alongside the highest accuracy.On the BKK(Bangkok)dataset,it outperformed other models with an RMSE of 3.3165 and an accuracy of 0.9367 for a 20-min interval,maintaining this trend across 40 and 60 min.Similarly,on the PeMS08 dataset,DMST-GNODE achieved the best performance with an RMSE of 19.4863 and an accuracy of 0.9377 at 20 min,demonstrating its effectiveness over longer periods.The Los_Loop dataset results further emphasise this model’s advantage,with an RMSE of 3.3422 and an accuracy of 0.7643 at 20 min,consistently maintaining superiority across all time intervals.These numerical highlights indicate that DMST-GNODE not only outperforms baseline models but also achieves higher accuracy and lower errors across different time intervals and datasets.展开更多
Multi-view clustering is a critical research area in computer science aimed at effectively extracting meaningful patterns from complex,high-dimensional data that single-view methods cannot capture.Traditional fuzzy cl...Multi-view clustering is a critical research area in computer science aimed at effectively extracting meaningful patterns from complex,high-dimensional data that single-view methods cannot capture.Traditional fuzzy clustering techniques,such as Fuzzy C-Means(FCM),face significant challenges in handling uncertainty and the dependencies between different views.To overcome these limitations,we introduce a new multi-view fuzzy clustering approach that integrates picture fuzzy sets with a dual-anchor graph method for multi-view data,aiming to enhance clustering accuracy and robustness,termed Multi-view Picture Fuzzy Clustering(MPFC).In particular,the picture fuzzy set theory extends the capability to represent uncertainty by modeling three membership levels:membership degrees,neutral degrees,and refusal degrees.This allows for a more flexible representation of uncertain and conflicting data than traditional fuzzy models.Meanwhile,dual-anchor graphs exploit the similarity relationships between data points and integrate information across views.This combination improves stability,scalability,and robustness when handling noisy and heterogeneous data.Experimental results on several benchmark datasets demonstrate significant improvements in clustering accuracy and efficiency,outperforming traditional methods.Specifically,the MPFC algorithm demonstrates outstanding clustering performance on a variety of datasets,attaining a Purity(PUR)score of 0.6440 and an Accuracy(ACC)score of 0.6213 for the 3 Sources dataset,underscoring its robustness and efficiency.The proposed approach significantly contributes to fields such as pattern recognition,multi-view relational data analysis,and large-scale clustering problems.Future work will focus on extending the method for semi-supervised multi-view clustering,aiming to enhance adaptability,scalability,and performance in real-world applications.展开更多
Deep-time Earth research plays a pivotal role in deciphering the rates,patterns,and mechanisms of Earth's evolutionary processes throughout geological history,providing essential scientific foundations for climate...Deep-time Earth research plays a pivotal role in deciphering the rates,patterns,and mechanisms of Earth's evolutionary processes throughout geological history,providing essential scientific foundations for climate prediction,natural resource exploration,and sustainable planetary stewardship.To advance Deep-time Earth research in the era of big data and artificial intelligence,the International Union of Geological Sciences initiated the“Deeptime Digital Earth International Big Science Program”(DDE)in 2019.At the core of this ambitious program lies the development of geoscience knowledge graphs,serving as a transformative knowledge infrastructure that enables the integration,sharing,mining,and analysis of heterogeneous geoscience big data.The DDE knowledge graph initiative has made significant strides in three critical dimensions:(1)establishing a unified knowledge structure across geoscience disciplines that ensures consistent representation of geological entities and their interrelationships through standardized ontologies and semantic frameworks;(2)developing a robust and scalable software infrastructure capable of supporting both expert-driven and machine-assisted knowledge engineering for large-scale graph construction and management;(3)implementing a comprehensive three-tiered architecture encompassing basic,discipline-specific,and application-oriented knowledge graphs,spanning approximately 20 geoscience disciplines.Through its open knowledge framework and international collaborative network,this initiative has fostered multinational research collaborations,establishing a robust foundation for next-generation geoscience research while propelling the discipline toward FAIR(Findable,Accessible,Interoperable,Reusable)data practices in deep-time Earth systems research.展开更多
Accurate prediction of molecular properties is crucial for selecting compounds with ideal properties and reducing the costs and risks of trials.Traditional methods based on manually crafted features and graph-based me...Accurate prediction of molecular properties is crucial for selecting compounds with ideal properties and reducing the costs and risks of trials.Traditional methods based on manually crafted features and graph-based methods have shown promising results in molecular property prediction.However,traditional methods rely on expert knowledge and often fail to capture the complex structures and interactions within molecules.Similarly,graph-based methods typically overlook the chemical structure and function hidden in molecular motifs and struggle to effectively integrate global and local molecular information.To address these limitations,we propose a novel fingerprint-enhanced hierarchical graph neural network(FH-GNN)for molecular property prediction that simultaneously learns information from hierarchical molecular graphs and fingerprints.The FH-GNN captures diverse hierarchical chemical information by applying directed message-passing neural networks(D-MPNN)on a hierarchical molecular graph that integrates atomic-level,motif-level,and graph-level information along with their relationships.Addi-tionally,we used an adaptive attention mechanism to balance the importance of hierarchical graphs and fingerprint features,creating a comprehensive molecular embedding that integrated hierarchical mo-lecular structures with domain knowledge.Experiments on eight benchmark datasets from MoleculeNet showed that FH-GNN outperformed the baseline models in both classification and regression tasks for molecular property prediction,validating its capability to comprehensively capture molecular informa-tion.By integrating molecular structure and chemical knowledge,FH-GNN provides a powerful tool for the accurate prediction of molecular properties and aids in the discovery of potential drug candidates.展开更多
基金supported by the National Natural Science Foundation of China(Grant Nos.62120106008,61976077,61806065,62076085,and 91746209)the Fundamental Research Funds for the Central Universities(JZ2022HGTB0239).
文摘Knowledge graph completion(KGC)aims to fill in missing entities and relations within knowledge graphs(KGs)to address their incompleteness.Most existing KGC models suffer from knowledge coverage as they are designed to operate within a single KG.In contrast,Multilingual KGC(MKGC)leverages seed pairs from different language KGs to facilitate knowledge transfer and enhance the completion of the target KG.Previous studies on MKGC based on graph neural networks(GNNs)have primarily focused on using relationaware GNNs to capture the combined features of neighboring entities and relations.However,these studies still have some shortcomings,particularly in the context of MKGCs.First,each language’s specific semantics,structures,and expressions contribute to the increased heterogeneity of the KG.Therefore,the completion of MKGCs necessitates a thorough consideration of the heterogeneity of the KG and the effective integration of its heterogeneous features.Second,MKGCs typically have a large graph scale due to the need to store and manage information from multiple languages.However,current relation-aware GNNs often inherit complex GNN operations,resulting in unnecessary complexity.Therefore,it is necessary to simplify GNN operations.To address these limitations,we propose a Simplified Multi-view Graph Neural Network(SMGNN)for MKGC.SM-GNN incorporates two simplified multiview GNNs as components.One GNN is utilized for learning multi-view graph features to complete the KG.The other generates new alignment pairs,facilitating knowledge transfer between different views of the KG.We simplify the two multiview GNNs by retaining feature propagation while discarding linear transformation and nonlinear activation to reduce unnecessary complexity and effectively leverage graph contextual information.Extensive experiments demonstrate that our proposed model outperforms competing baselines.The code and dataset are available at the website of github.com/dbbice/SM-GNN.
基金supported by the National Natural Science Foundation of China(Grant No.:62101087)the China Postdoctoral Science Foundation(Grant No.:2021MD703942)+2 种基金the Chongqing Postdoctoral Research Project Special Funding,China(Grant No.:2021XM2016)the Science Foundation of Chongqing Municipal Commission of Education,China(Grant No.:KJQN202100642)the Chongqing Natural Science Foundation,China(Grant No.:cstc2021jcyj-msxmX0834).
文摘Drug repurposing offers a promising alternative to traditional drug development and significantly re-duces costs and timelines by identifying new therapeutic uses for existing drugs.However,the current approaches often rely on limited data sources and simplistic hypotheses,which restrict their ability to capture the multi-faceted nature of biological systems.This study introduces adaptive multi-view learning(AMVL),a novel methodology that integrates chemical-induced transcriptional profiles(CTPs),knowledge graph(KG)embeddings,and large language model(LLM)representations,to enhance drug repurposing predictions.AMVL incorporates an innovative similarity matrix expansion strategy and leverages multi-view learning(MVL),matrix factorization,and ensemble optimization techniques to integrate heterogeneous multi-source data.Comprehensive evaluations on benchmark datasets(Fdata-set,Cdataset,and Ydataset)and the large-scale iDrug dataset demonstrate that AMVL outperforms state-of-the-art(SOTA)methods,achieving superior accuracy in predicting drug-disease associations across multiple metrics.Literature-based validation further confirmed the model's predictive capabilities,with seven out of the top ten predictions corroborated by post-2011 evidence.To promote transparency and reproducibility,all data and codes used in this study were open-sourced,providing resources for pro-cessing CTPs,KG,and LLM-based similarity calculations,along with the complete AMVL algorithm and benchmarking procedures.By unifying diverse data modalities,AMVL offers a robust and scalable so-lution for accelerating drug discovery,fostering advancements in translational medicine and integrating multi-omics data.We aim to inspire further innovations in multi-source data integration and support the development of more precise and efficient strategies for advancing drug discovery and translational medicine.
基金supported by National Natural Science Foundation of China(No.61806006)Priority Academic Program Development of Jiangsu Higher Education Institutions。
文摘The existing multi-view subspace clustering algorithms based on tensor singular value decomposition(t-SVD)predominantly utilize tensor nuclear norm to explore the intra view correlation between views of the same samples,while neglecting the correlation among the samples within different views.Moreover,the tensor nuclear norm is not fully considered as a convex approximation of the tensor rank function.Treating different singular values equally may result in suboptimal tensor representation.A hypergraph regularized multi-view subspace clustering algorithm with dual tensor log-determinant(HRMSC-DTL)was proposed.The algorithm used subspace learning in each view to learn a specific set of affinity matrices,and introduced a non-convex tensor log-determinant function to replace the tensor nuclear norm to better improve global low-rankness.It also introduced hyper-Laplacian regularization to preserve the local geometric structure embedded in the high-dimensional space.Furthermore,it rotated the original tensor and incorporated a dual tensor mechanism to fully exploit the intra view correlation of the original tensor and the inter view correlation of the rotated tensor.At the same time,an alternating direction of multipliers method(ADMM)was also designed to solve non-convex optimization model.Experimental evaluations on seven widely used datasets,along with comparisons to several state-of-the-art algorithms,demonstrated the superiority and effectiveness of the HRMSC-DTL algorithm in terms of clustering performance.
文摘In the international shipping industry, digital intelligence transformation has become essential, with both governments and enterprises actively working to integrate diverse datasets. The domain of maritime and shipping is characterized by a vast array of document types, filled with complex, large-scale, and often chaotic knowledge and relationships. Effectively managing these documents is crucial for developing a Large Language Model (LLM) in the maritime domain, enabling practitioners to access and leverage valuable information. A Knowledge Graph (KG) offers a state-of-the-art solution for enhancing knowledge retrieval, providing more accurate responses and enabling context-aware reasoning. This paper presents a framework for utilizing maritime and shipping documents to construct a knowledge graph using GraphRAG, a hybrid tool combining graph-based retrieval and generation capabilities. The extraction of entities and relationships from these documents and the KG construction process are detailed. Furthermore, the KG is integrated with an LLM to develop a Q&A system, demonstrating that the system significantly improves answer accuracy compared to traditional LLMs. Additionally, the KG construction process is up to 50% faster than conventional LLM-based approaches, underscoring the efficiency of our method. This study provides a promising approach to digital intelligence in shipping, advancing knowledge accessibility and decision-making.
文摘The increasing popularity of the Internet and the widespread use of information technology have led to a rise in the number and sophistication of network attacks and security threats.Intrusion detection systems are crucial to network security,playing a pivotal role in safeguarding networks from potential threats.However,in the context of an evolving landscape of sophisticated and elusive attacks,existing intrusion detection methodologies often overlook critical aspects such as changes in network topology over time and interactions between hosts.To address these issues,this paper proposes a real-time network intrusion detection method based on graph neural networks.The proposedmethod leverages the advantages of graph neural networks and employs a straightforward graph construction method to represent network traffic as dynamic graph-structured data.Additionally,a graph convolution operation with a multi-head attention mechanism is utilized to enhance the model’s ability to capture the intricate relationships within the graph structure comprehensively.Furthermore,it uses an integrated graph neural network to address dynamic graphs’structural and topological changes at different time points and the challenges of edge embedding in intrusion detection data.The edge classification problem is effectively transformed into node classification by employing a line graph data representation,which facilitates fine-grained intrusion detection tasks on dynamic graph node feature representations.The efficacy of the proposed method is evaluated using two commonly used intrusion detection datasets,UNSW-NB15 and NF-ToN-IoT-v2,and results are compared with previous studies in this field.The experimental results demonstrate that our proposed method achieves 99.3%and 99.96%accuracy on the two datasets,respectively,and outperforms the benchmark model in several evaluation metrics.
基金supported by National Natural Science Foundation of China(32122066,32201855)STI2030—Major Projects(2023ZD04076).
文摘Phenotypic prediction is a promising strategy for accelerating plant breeding.Data from multiple sources(called multi-view data)can provide complementary information to characterize a biological object from various aspects.By integrating multi-view information into phenotypic prediction,a multi-view best linear unbiased prediction(MVBLUP)method is proposed in this paper.To measure the importance of multiple data views,the differential evolution algorithm with an early stopping mechanism is used,by which we obtain a multi-view kinship matrix and then incorporate it into the BLUP model for phenotypic prediction.To further illustrate the characteristics of MVBLUP,we perform the empirical experiments on four multi-view datasets in different crops.Compared to the single-view method,the prediction accuracy of the MVBLUP method has improved by 0.038–0.201 on average.The results demonstrate that the MVBLUP is an effective integrative prediction method for multi-view data.
基金supported by the National Key Research and Development Program of China No.2023YFA1009500.
文摘With the emphasis on user privacy and communication security, encrypted traffic has increased dramatically, which brings great challenges to traffic classification. The classification method of encrypted traffic based on GNN can deal with encrypted traffic well. However, existing GNN-based approaches ignore the relationship between client or server packets. In this paper, we design a network traffic topology based on GCN, called Flow Mapping Graph (FMG). FMG establishes sequential edges between vertexes by the arrival order of packets and establishes jump-order edges between vertexes by connecting packets in different bursts with the same direction. It not only reflects the time characteristics of the packet but also strengthens the relationship between the client or server packets. According to FMG, a Traffic Mapping Classification model (TMC-GCN) is designed, which can automatically capture and learn the characteristics and structure information of the top vertex in FMG. The TMC-GCN model is used to classify the encrypted traffic. The encryption stream classification problem is transformed into a graph classification problem, which can effectively deal with data from different data sources and application scenarios. By comparing the performance of TMC-GCN with other classical models in four public datasets, including CICIOT2023, ISCXVPN2016, CICAAGM2017, and GraphDapp, the effectiveness of the FMG algorithm is verified. The experimental results show that the accuracy rate of the TMC-GCN model is 96.13%, the recall rate is 95.04%, and the F1 rate is 94.54%.
基金Supported by NSFC(No.12271162)Natural Science Foundation of Shanghai(No.22ZR1416300).
文摘A graph G is H-free,if it contains no H as a subgraph.A graph G is said to be H-minor free,if it does not contain H as a minor.In 2010,Nikiforov asked that what the maximum spectral radius of an H-free graph of order n is.In this paper,we consider some Brualdi-Solheid-Turan type problems on bipartite graphs.In 2015,Zhai,Lin and Gong in[Linear Algebra Appl.,2015,471:21-27]proved that if G is a bipartite graph with order n≥2k+2 and ρ(G)≥ρ(K_(k,n-k)),then G contains a C_(2k+2) unless G≌K_(k,n-k).First,we give a new and more simple proof for the above theorem.Second,we prove that if G is a bipartite graph with order n≥2k+2 and ρ(G)≥ρ(K_(k,n-k)),then G contains all T_(2k+3) unless G≌K_(k,n-k).Finally,we prove that among all outerplanar bipartite graphs on n≥308026 vertices,K_(1,n-1) attains the maximum spectral radius.
基金funded by the Youth Fund of the National Natural Science Foundation of China(Grant No.42261070).
文摘Spectrum-based fault localization (SBFL) generates a ranked list of suspicious elements by using the program execution spectrum, but the excessive number of elements ranked in parallel results in low localization accuracy. Most researchers consider intra-class dependencies to improve localization accuracy. However, some studies show that inter-class method call type faults account for more than 20%, which means such methods still have certain limitations. To solve the above problems, this paper proposes a two-phase software fault localization based on relational graph convolutional neural networks (Two-RGCNFL). Firstly, in Phase 1, the method call dependence graph (MCDG) of the program is constructed, the intra-class and inter-class dependencies in MCDG are extracted by using the relational graph convolutional neural network, and the classifier is used to identify the faulty methods. Then, the GraphSMOTE algorithm is improved to alleviate the impact of class imbalance on classification accuracy. Aiming at the problem of parallel ranking of element suspicious values in traditional SBFL technology, in Phase 2, Doc2Vec is used to learn static features, while spectrum information serves as dynamic features. A RankNet model based on siamese multi-layer perceptron is constructed to score and rank statements in the faulty method. This work conducts experiments on 5 real projects of Defects4J benchmark. Experimental results show that, compared with the traditional SBFL technique and two baseline methods, our approach improves the Top-1 accuracy by 262.86%, 29.59% and 53.01%, respectively, which verifies the effectiveness of Two-RGCNFL. Furthermore, this work verifies the importance of inter-class dependencies through ablation experiments.
基金Supported by the National Natural Science Foundation of China(Nos.12271439,11871398)the National College Students Innovation and Entrepreneurship Training Program(No.201910699173)。
文摘The concept of matching energy was proposed by Gutman and Wagner firstly in 2012. Let G be a simple graph of order n and λ1, λ2, . . . , λn be the zeros of its matching polynomial. The matching energy of a graph G is defined as ME(G) = Pni=1 |λi|. By the famous Coulson’s formula, matching energies can also be calculated by an improper integral depending on a parameter. A k-claw attaching graph Gu(k) refers to the graph obtained by attaching k pendent edges to the graph G at the vertex u, where u is called the root of Gu(k). In this paper, we use some theories of mathematical analysis to obtain a new technique to compare the matching energies of two k-claw attaching graphs Gu(k) and Hv(k) with the same order, that is, limk→∞[ME(Gu(k)) − ME(Hv(k))] = ME(G − u) − ME(H − v). By the technique, we finally determine unicyclic graphs of order n with the 9th to 13th minimal matching energies for all n ≥ 58.
文摘The ability to accurately predict urban traffic flows is crucial for optimising city operations.Consequently,various methods for forecasting urban traffic have been developed,focusing on analysing historical data to understand complex mobility patterns.Deep learning techniques,such as graph neural networks(GNNs),are popular for their ability to capture spatio-temporal dependencies.However,these models often become overly complex due to the large number of hyper-parameters involved.In this study,we introduce Dynamic Multi-Graph Spatial-Temporal Graph Neural Ordinary Differential Equation Networks(DMST-GNODE),a framework based on ordinary differential equations(ODEs)that autonomously discovers effective spatial-temporal graph neural network(STGNN)architectures for traffic prediction tasks.The comparative analysis of DMST-GNODE and baseline models indicates that DMST-GNODE model demonstrates superior performance across multiple datasets,consistently achieving the lowest Root Mean Square Error(RMSE)and Mean Absolute Error(MAE)values,alongside the highest accuracy.On the BKK(Bangkok)dataset,it outperformed other models with an RMSE of 3.3165 and an accuracy of 0.9367 for a 20-min interval,maintaining this trend across 40 and 60 min.Similarly,on the PeMS08 dataset,DMST-GNODE achieved the best performance with an RMSE of 19.4863 and an accuracy of 0.9377 at 20 min,demonstrating its effectiveness over longer periods.The Los_Loop dataset results further emphasise this model’s advantage,with an RMSE of 3.3422 and an accuracy of 0.7643 at 20 min,consistently maintaining superiority across all time intervals.These numerical highlights indicate that DMST-GNODE not only outperforms baseline models but also achieves higher accuracy and lower errors across different time intervals and datasets.
基金funded by the Research Project:THTETN.05/24-25,VietnamAcademy of Science and Technology.
文摘Multi-view clustering is a critical research area in computer science aimed at effectively extracting meaningful patterns from complex,high-dimensional data that single-view methods cannot capture.Traditional fuzzy clustering techniques,such as Fuzzy C-Means(FCM),face significant challenges in handling uncertainty and the dependencies between different views.To overcome these limitations,we introduce a new multi-view fuzzy clustering approach that integrates picture fuzzy sets with a dual-anchor graph method for multi-view data,aiming to enhance clustering accuracy and robustness,termed Multi-view Picture Fuzzy Clustering(MPFC).In particular,the picture fuzzy set theory extends the capability to represent uncertainty by modeling three membership levels:membership degrees,neutral degrees,and refusal degrees.This allows for a more flexible representation of uncertain and conflicting data than traditional fuzzy models.Meanwhile,dual-anchor graphs exploit the similarity relationships between data points and integrate information across views.This combination improves stability,scalability,and robustness when handling noisy and heterogeneous data.Experimental results on several benchmark datasets demonstrate significant improvements in clustering accuracy and efficiency,outperforming traditional methods.Specifically,the MPFC algorithm demonstrates outstanding clustering performance on a variety of datasets,attaining a Purity(PUR)score of 0.6440 and an Accuracy(ACC)score of 0.6213 for the 3 Sources dataset,underscoring its robustness and efficiency.The proposed approach significantly contributes to fields such as pattern recognition,multi-view relational data analysis,and large-scale clustering problems.Future work will focus on extending the method for semi-supervised multi-view clustering,aiming to enhance adaptability,scalability,and performance in real-world applications.
基金Strategic Priority Research Program of the Chinese Academy of Sciences,No.XDB0740000National Key Research and Development Program of China,No.2022YFB3904200,No.2022YFF0711601+1 种基金Key Project of Innovation LREIS,No.PI009National Natural Science Foundation of China,No.42471503。
文摘Deep-time Earth research plays a pivotal role in deciphering the rates,patterns,and mechanisms of Earth's evolutionary processes throughout geological history,providing essential scientific foundations for climate prediction,natural resource exploration,and sustainable planetary stewardship.To advance Deep-time Earth research in the era of big data and artificial intelligence,the International Union of Geological Sciences initiated the“Deeptime Digital Earth International Big Science Program”(DDE)in 2019.At the core of this ambitious program lies the development of geoscience knowledge graphs,serving as a transformative knowledge infrastructure that enables the integration,sharing,mining,and analysis of heterogeneous geoscience big data.The DDE knowledge graph initiative has made significant strides in three critical dimensions:(1)establishing a unified knowledge structure across geoscience disciplines that ensures consistent representation of geological entities and their interrelationships through standardized ontologies and semantic frameworks;(2)developing a robust and scalable software infrastructure capable of supporting both expert-driven and machine-assisted knowledge engineering for large-scale graph construction and management;(3)implementing a comprehensive three-tiered architecture encompassing basic,discipline-specific,and application-oriented knowledge graphs,spanning approximately 20 geoscience disciplines.Through its open knowledge framework and international collaborative network,this initiative has fostered multinational research collaborations,establishing a robust foundation for next-generation geoscience research while propelling the discipline toward FAIR(Findable,Accessible,Interoperable,Reusable)data practices in deep-time Earth systems research.
基金supported by Macao Science and Technology Development Fund,Macao SAR,China(Grant No.:0043/2023/AFJ)the National Natural Science Foundation of China(Grant No.:22173038)Macao Polytechnic University,Macao SAR,China(Grant No.:RP/FCA-01/2022).
文摘Accurate prediction of molecular properties is crucial for selecting compounds with ideal properties and reducing the costs and risks of trials.Traditional methods based on manually crafted features and graph-based methods have shown promising results in molecular property prediction.However,traditional methods rely on expert knowledge and often fail to capture the complex structures and interactions within molecules.Similarly,graph-based methods typically overlook the chemical structure and function hidden in molecular motifs and struggle to effectively integrate global and local molecular information.To address these limitations,we propose a novel fingerprint-enhanced hierarchical graph neural network(FH-GNN)for molecular property prediction that simultaneously learns information from hierarchical molecular graphs and fingerprints.The FH-GNN captures diverse hierarchical chemical information by applying directed message-passing neural networks(D-MPNN)on a hierarchical molecular graph that integrates atomic-level,motif-level,and graph-level information along with their relationships.Addi-tionally,we used an adaptive attention mechanism to balance the importance of hierarchical graphs and fingerprint features,creating a comprehensive molecular embedding that integrated hierarchical mo-lecular structures with domain knowledge.Experiments on eight benchmark datasets from MoleculeNet showed that FH-GNN outperformed the baseline models in both classification and regression tasks for molecular property prediction,validating its capability to comprehensively capture molecular informa-tion.By integrating molecular structure and chemical knowledge,FH-GNN provides a powerful tool for the accurate prediction of molecular properties and aids in the discovery of potential drug candidates.