The article is based on language model,through the cue word engineering and agent thinking method,automatic knowledge extraction,with China accounting standards support to complete the corresponding knowledge map cons...The article is based on language model,through the cue word engineering and agent thinking method,automatic knowledge extraction,with China accounting standards support to complete the corresponding knowledge map construction.Through the way of extracting the accounting entities and their connections in the pattern layer,the data layer is provided for the fine-tuning and optimization of the large model.Studies found that,through the reasonable application of language model,knowledge can be realized in massive financial data neural five effective extracted tuples,and complete accounting knowledge map construction.展开更多
Drug repurposing offers a promising alternative to traditional drug development and significantly re-duces costs and timelines by identifying new therapeutic uses for existing drugs.However,the current approaches ofte...Drug repurposing offers a promising alternative to traditional drug development and significantly re-duces costs and timelines by identifying new therapeutic uses for existing drugs.However,the current approaches often rely on limited data sources and simplistic hypotheses,which restrict their ability to capture the multi-faceted nature of biological systems.This study introduces adaptive multi-view learning(AMVL),a novel methodology that integrates chemical-induced transcriptional profiles(CTPs),knowledge graph(KG)embeddings,and large language model(LLM)representations,to enhance drug repurposing predictions.AMVL incorporates an innovative similarity matrix expansion strategy and leverages multi-view learning(MVL),matrix factorization,and ensemble optimization techniques to integrate heterogeneous multi-source data.Comprehensive evaluations on benchmark datasets(Fdata-set,Cdataset,and Ydataset)and the large-scale iDrug dataset demonstrate that AMVL outperforms state-of-the-art(SOTA)methods,achieving superior accuracy in predicting drug-disease associations across multiple metrics.Literature-based validation further confirmed the model's predictive capabilities,with seven out of the top ten predictions corroborated by post-2011 evidence.To promote transparency and reproducibility,all data and codes used in this study were open-sourced,providing resources for pro-cessing CTPs,KG,and LLM-based similarity calculations,along with the complete AMVL algorithm and benchmarking procedures.By unifying diverse data modalities,AMVL offers a robust and scalable so-lution for accelerating drug discovery,fostering advancements in translational medicine and integrating multi-omics data.We aim to inspire further innovations in multi-source data integration and support the development of more precise and efficient strategies for advancing drug discovery and translational medicine.展开更多
With the development of the Semantic Web,the number of ontologies grows exponentially and the semantic relationships between ontologies become more and more complex,understanding the true semantics of specific terms o...With the development of the Semantic Web,the number of ontologies grows exponentially and the semantic relationships between ontologies become more and more complex,understanding the true semantics of specific terms or concepts in an ontology is crucial for the matching task.At present,the main challenges facing ontology matching tasks based on representation learning methods are how to improve the embedding quality of ontology knowledge and how to integrate multiple features of ontology efficiently.Therefore,we propose an Ontology Matching Method Based on the Gated Graph Attention Model(OM-GGAT).Firstly,the semantic knowledge related to concepts in the ontology is encoded into vectors using the OWL2Vec^(*)method,and the relevant path information from the root node to the concept is embedded to understand better the true meaning of the concept itself and the relationship between concepts.Secondly,the ontology is transformed into the corresponding graph structure according to the semantic relation.Then,when extracting the features of the ontology graph nodes,different attention weights are assigned to each adjacent node of the central concept with the help of the attention mechanism idea.Finally,gated networks are designed to further fuse semantic and structural embedding representations efficiently.To verify the effectiveness of the proposed method,comparative experiments on matching tasks were carried out on public datasets.The results show that the OM-GGAT model can effectively improve the efficiency of ontology matching.展开更多
Wheat is a critical crop,extensively consumed worldwide,and its production enhancement is essential to meet escalating demand.The presence of diseases like stem rust,leaf rust,yellow rust,and tan spot significantly di...Wheat is a critical crop,extensively consumed worldwide,and its production enhancement is essential to meet escalating demand.The presence of diseases like stem rust,leaf rust,yellow rust,and tan spot significantly diminishes wheat yield,making the early and precise identification of these diseases vital for effective disease management.With advancements in deep learning algorithms,researchers have proposed many methods for the automated detection of disease pathogens;however,accurately detectingmultiple disease pathogens simultaneously remains a challenge.This challenge arises due to the scarcity of RGB images for multiple diseases,class imbalance in existing public datasets,and the difficulty in extracting features that discriminate between multiple classes of disease pathogens.In this research,a novel method is proposed based on Transfer Generative Adversarial Networks for augmenting existing data,thereby overcoming the problems of class imbalance and data scarcity.This study proposes a customized architecture of Vision Transformers(ViT),where the feature vector is obtained by concatenating features extracted from the custom ViT and Graph Neural Networks.This paper also proposes a Model AgnosticMeta Learning(MAML)based ensemble classifier for accurate classification.The proposedmodel,validated on public datasets for wheat disease pathogen classification,achieved a test accuracy of 99.20%and an F1-score of 97.95%.Compared with existing state-of-the-art methods,this proposed model outperforms in terms of accuracy,F1-score,and the number of disease pathogens detection.In future,more diseases can be included for detection along with some other modalities like pests and weed.展开更多
The telecommunications industry is becoming increasingly aware of potential subscriber churn as a result of the growing popularity of smartphones in the mobile Internet era,the quick development of telecommunications ...The telecommunications industry is becoming increasingly aware of potential subscriber churn as a result of the growing popularity of smartphones in the mobile Internet era,the quick development of telecommunications services,the implementation of the number portability policy,and the intensifying competition among operators.At the same time,users'consumption preferences and choices are evolving.Excellent churn prediction models must be created in order to accurately predict the churn tendency,since keeping existing customers is far less expensive than acquiring new ones.But conventional or learning-based algorithms can only go so far into a single subscriber's data;they cannot take into consideration changes in a subscriber's subscription and ignore the coupling and correlation between various features.Additionally,the current churn prediction models have a high computational burden,a fuzzy weight distribution,and significant resource economic costs.The prediction algorithms involving network models currently in use primarily take into account the private information shared between users with text and pictures,ignoring the reference value supplied by other users with the same package.This work suggests a user churn prediction model based on Graph Attention Convolutional Neural Network(GAT-CNN)to address the aforementioned issues.The main contributions of this paper are as follows:Firstly,we present a three-tiered hierarchical cloud-edge cooperative framework that increases the volume of user feature input by means of two aggregations at the device,edge,and cloud layers.Second,we extend the use of users'own data by introducing self-attention and graph convolution models to track the relative changes of both users and packages simultaneously.Lastly,we build an integrated offline-online system for churn prediction based on the strengths of the two models,and we experimentally validate the efficacy of cloudside collaborative training and inference.In summary,the churn prediction model based on Graph Attention Convolutional Neural Network presented in this paper can effectively address the drawbacks of conventional algorithms and offer telecom operators crucial decision support in developing subscriber retention strategies and cutting operational expenses.展开更多
Accurately recommending candidate news to users is a basic challenge of personalized news recommendation systems.Traditional methods are usually difficult to learn and acquire complex semantic information in news text...Accurately recommending candidate news to users is a basic challenge of personalized news recommendation systems.Traditional methods are usually difficult to learn and acquire complex semantic information in news texts,resulting in unsatisfactory recommendation results.Besides,these traditional methods are more friendly to active users with rich historical behaviors.However,they can not effectively solve the long tail problem of inactive users.To address these issues,this research presents a novel general framework that combines Large Language Models(LLM)and Knowledge Graphs(KG)into traditional methods.To learn the contextual information of news text,we use LLMs’powerful text understanding ability to generate news representations with rich semantic information,and then,the generated news representations are used to enhance the news encoding in traditional methods.In addition,multi-hops relationship of news entities is mined and the structural information of news is encoded using KG,thus alleviating the challenge of long-tail distribution.Experimental results demonstrate that compared with various traditional models,on evaluation indicators such as AUC,MRR,nDCG@5 and nDCG@10,the framework significantly improves the recommendation performance.The successful integration of LLM and KG in our framework has established a feasible way for achieving more accurate personalized news recommendation.Our code is available at https://github.com/Xuan-ZW/LKPNR.展开更多
The predictive model and design of heavy-duty metal rubber shock absorber for the powertrains of heavy-load mining vehicles were investigated.The microstructural characteristics of the wire mesh were elucidated using ...The predictive model and design of heavy-duty metal rubber shock absorber for the powertrains of heavy-load mining vehicles were investigated.The microstructural characteristics of the wire mesh were elucidated using fractal graphs.A numerical model based on virtual fabrication technique was established to propose a design scheme for the wire mesh component.Four sets of wire mesh shock absorbers with various relative densities were prepared and a predictive model based on these relative densities was established through mechanical testing.To further enhance the predictive accuracy,a variable transposition fitting method was proposed to refine the model.Residual analysis was employed to quantitatively validate the results against those obtained from an experimental control group.The results show that the improved model exhibits higher predictive accuracy than the original model,with the determination coefficient(R^(2))of 0.9624.This study provides theoretical support for designing wire mesh shock absorbers with reduced testing requirements and enhanced design efficiency.展开更多
Since the beginning of the 21st century,advances in big data and artificial intelligence have driven a paradigm shift in the geosciences,moving the field from qualitative descriptions toward quantitative analysis,from...Since the beginning of the 21st century,advances in big data and artificial intelligence have driven a paradigm shift in the geosciences,moving the field from qualitative descriptions toward quantitative analysis,from observing phenomena to uncovering underlying mechanisms,from regional-scale investigations to global perspectives,and from experience-based inference toward data-and model-enabled intelligent prediction.AlphaEarth Foundations(AEF)is a next-generation geospatial intelligence platform that addresses these changes by introducing a unified 64-dimensional shared embedding space,enabling-for the first time-standardized representation and seamless integration of 12 distinct types of Earth observation data,including optical,radar,and lidar.This framework significantly improves data assimilation efficiency and resolves the persistent problem of“data silos”in geoscience research.AEF is helping redefine research methodologies and fostering breakthroughs,particularly in quantitative Earth system science.This paper systematically examines how AEF’s innovative architecture-featuring multi-source data fusion,high-dimensional feature representation learning,and a scalable computational framework-facilitates intelligent,precise,and realtime data-driven geoscientific research.Using case studies from resource and environmental applications,we demonstrate AEF’s broad potential and identify emerging innovation needs.Our findings show that AEF not only enhances the efficiency of solving traditional geoscientific problems but also stimulates novel research directions and methodological approaches.展开更多
With the continuous development of artificial intelligence and natural language processing technologies, traditional retrieval-augmented generation (RAG) techniques face numerous challenges in document answer precisio...With the continuous development of artificial intelligence and natural language processing technologies, traditional retrieval-augmented generation (RAG) techniques face numerous challenges in document answer precision and similarity measurement. This study, set against the backdrop of the shipping industry, combines top-down and bottom-up schema design strategies to achieve precise and flexible knowledge representation. The research adopts a semi-structured approach, innovatively constructing an adaptive schema generation mechanism based on reinforcement learning, which models the knowledge graph construction process as a Markov decision process. This method begins with general concepts, defining foundational industry concepts, and then delves into abstracting core concepts specific to the maritime domain through an adaptive pattern generation mechanism that dynamically adjusts the knowledge structure. Specifically, the study designs a four-layer knowledge construction framework, including the data layer, modeling layer, technology layer, and application layer. It draws on a mutual indexing strategy, integrating large language models and traditional information extraction techniques. By leveraging self-attention mechanisms and graph attention networks, it efficiently extracts semantic relationships. The introduction of logic-form-driven solvers and symbolic decomposition techniques for reasoning significantly enhances the model’s ability to understand complex semantic relationships. Additionally, the use of open information extraction and knowledge alignment techniques further improves the efficiency and accuracy of information retrieval. Experimental results demonstrate that the proposed method not only achieves significant performance improvements in knowledge graph retrieval within the shipping domain but also holds important theoretical innovation and practical application value.展开更多
The cost and strict input format requirements of GraphRAG make it less efficient for processing large documents. This paper proposes an alternative approach for constructing a knowledge graph (KG) from a PDF document ...The cost and strict input format requirements of GraphRAG make it less efficient for processing large documents. This paper proposes an alternative approach for constructing a knowledge graph (KG) from a PDF document with a focus on simplicity and cost-effectiveness. The process involves splitting the document into chunks, extracting concepts within each chunk using a large language model (LLM), and building relationships based on the proximity of concepts in the same chunk. Unlike traditional named entity recognition (NER), which identifies entities like “Shanghai”, the proposed method identifies concepts, such as “Convenient transportation in Shanghai” which is found to be more meaningful for KG construction. Each edge in the KG represents a relationship between concepts occurring in the same text chunk. The process is computationally inexpensive, leveraging locally set up tools like Mistral 7B openorca instruct and Ollama for model inference, ensuring the entire graph generation process is cost-free. A method of assigning weights to relationships, grouping similar pairs, and summarizing multiple relationships into a single edge with associated weight and relation details is introduced. Additionally, node degrees and communities are calculated for node sizing and coloring. This approach offers a scalable, cost-effective solution for generating meaningful knowledge graphs from large documents, achieving results comparable to GraphRAG while maintaining accessibility for personal machines.展开更多
Knowledge graphs (KGs) offer a structured, machine-readable format for organizing complex information. In heterogeneous catalysis, where data on catalytic materials, reaction conditions, mechanisms, and synthesis rout...Knowledge graphs (KGs) offer a structured, machine-readable format for organizing complex information. In heterogeneous catalysis, where data on catalytic materials, reaction conditions, mechanisms, and synthesis routes are dispersed across diverse sources, KGs provide a semantic framework that supports data integration under the FAIR (Findable, Accessible, Interoperable, and Reusable) principles. This review aims to survey recent developments in catalysis KGs, describe the main techniques for graph construction, and highlight how artificial intelligence, particularly large language models (LLMs), enhances graph generation and query. We conducted a systematic analysis of the literature, focusing on ontology-guided text mining pipelines, graph population methods, and maintenance strategies. Our review identifies key trends: ontology-based approaches enable the automated extraction of domain knowledge, LLM-driven retrieval-augmented generation supports natural-language queries, and scalable graph architectures range from a few thousand to over a million triples. We discuss state-of-the-art applications, such as catalyst recommendation systems and reaction mechanism discovery tools, and examine the major challenges, including data heterogeneity, ontology alignment, and long-term graph curation. We conclude that KGs, when combined with AI methods, hold significant promise for accelerating catalyst discovery and knowledge management, but progress depends on establishing community standards for ontology development and maintenance. This review provides a roadmap for researchers seeking to leverage KGs to advance heterogeneous catalysis research.展开更多
To address the underutilization of Chinese research materials in nonferrous metals,a method for constructing a domain of nonferrous metals knowledge graph(DNMKG)was established.Starting from a domain thesaurus,entitie...To address the underutilization of Chinese research materials in nonferrous metals,a method for constructing a domain of nonferrous metals knowledge graph(DNMKG)was established.Starting from a domain thesaurus,entities and relationships were mapped as resource description framework(RDF)triples to form the graph’s framework.Properties and related entities were extracted from open knowledge bases,enriching the graph.A large-scale,multi-source heterogeneous corpus of over 1×10^(9) words was compiled from recent literature to further expand DNMKG.Using the knowledge graph as prior knowledge,natural language processing techniques were applied to the corpus,generating word vectors.A novel entity evaluation algorithm was used to identify and extract real domain entities,which were added to DNMKG.A prototype system was developed to visualize the knowledge graph and support human−computer interaction.Results demonstrate that DNMKG can enhance knowledge discovery and improve research efficiency in the nonferrous metals field.展开更多
With the continuous advancement of information technology,corpora and knowledge graphs(KGs)have become indispensable tools in modern language learning.This study explores how the integration of corpora and KGs in inte...With the continuous advancement of information technology,corpora and knowledge graphs(KGs)have become indispensable tools in modern language learning.This study explores how the integration of corpora and KGs in integrated English teaching can enhance students’abilities in vocabulary acquisition,grammar understanding,and discourse analysis.Through a comprehensive literature review,it elaborates on the theoretical foundations and practical values of these two technological tools in English instruction.The study designs a teaching model based on corpora and KGs and analyzes its specific applications in vocabulary,grammar,and discourse teaching within the Integrated English course.Additionally,the article discusses the challenges that may arise during implementation and proposes corresponding solutions.Finally,it envisions future research directions and application prospects.展开更多
Large language models cross-domain named entity recognition task in the face of the scarcity of large language labeled data in a specific domain,due to the entity bias arising from the variation of entity information ...Large language models cross-domain named entity recognition task in the face of the scarcity of large language labeled data in a specific domain,due to the entity bias arising from the variation of entity information between different domains,which makes large language models prone to spurious correlations problems when dealing with specific domains and entities.In order to solve this problem,this paper proposes a cross-domain named entity recognition method based on causal graph structure enhancement,which captures the cross-domain invariant causal structural representations between feature representations of text sequences and annotation sequences by establishing a causal learning and intervention module,so as to improve the utilization of causal structural features by the large languagemodels in the target domains,and thus effectively alleviate the false entity bias triggered by the false relevance problem;meanwhile,through the semantic feature fusion module,the semantic information of the source and target domains is effectively combined.The results show an improvement of 2.47%and 4.12%in the political and medical domains,respectively,compared with the benchmark model,and an excellent performance in small-sample scenarios,which proves the effectiveness of causal graph structural enhancement in improving the accuracy of cross-domain entity recognition and reducing false correlations.展开更多
Pedestrian trajectory prediction is pivotal and challenging in applications such as autonomous driving,social robotics,and intelligent surveillance systems.Pedestrian trajectory is governed not only by individual inte...Pedestrian trajectory prediction is pivotal and challenging in applications such as autonomous driving,social robotics,and intelligent surveillance systems.Pedestrian trajectory is governed not only by individual intent but also by interactions with surrounding agents.These interactions are critical to trajectory prediction accuracy.While prior studies have employed Convolutional Neural Networks(CNNs)and Graph Convolutional Networks(GCNs)to model such interactions,these methods fail to distinguish varying influence levels among neighboring pedestrians.To address this,we propose a novel model based on a bidirectional graph attention network and spatio-temporal graphs to capture dynamic interactions.Specifically,we construct temporal and spatial graphs encoding the sequential evolution and spatial proximity among pedestrians.These features are then fused and processed by the Bidirectional Graph Attention Network(Bi-GAT),which models the bidirectional interactions between the target pedestrian and its neighbors.The model computes node attention weights(i.e.,similarity scores)to differentially aggregate neighbor information,enabling fine-grained interaction representations.Extensive experiments conducted on two widely used pedestrian trajectory prediction benchmark datasets demonstrate that our approach outperforms existing state-of-theartmethods regarding Average Displacement Error(ADE)and Final Displacement Error(FDE),highlighting its strong prediction accuracy and generalization capability.展开更多
Data curation is vital for selecting effective demonstration examples in graph-to-text generation.However,evaluating the quality ofKnowledgeGraphs(KGs)remains challenging.Prior research exhibits a narrowfocus on struc...Data curation is vital for selecting effective demonstration examples in graph-to-text generation.However,evaluating the quality ofKnowledgeGraphs(KGs)remains challenging.Prior research exhibits a narrowfocus on structural statistics,such as the shortest path length,while the correctness of graphs in representing the associated text is rarely explored.To address this gap,we introduce a dual-perspective evaluation framework for KG-text data,based on the computation of structural adequacy and semantic alignment.Froma structural perspective,we propose the Weighted Incremental EdgeMethod(WIEM)to quantify graph completeness by leveraging agreement between relation models to predict possible edges between entities.WIEM targets to find increments from models on“unseen links”,whose presence is inversely proportional to the structural adequacy of the original KG in representing the text.From a semantic perspective,we evaluate how well a KG aligns with the text in capturing the intended meaning.To do so,we instruct a large language model to convert KGs into natural language andmeasure the similarity between generated and reference texts.Based on these computations,we apply a Top-K union method,integrating the structural and semantic modules,to rank and select high-quality KGs.We evaluate our framework against various approaches for selecting few-shot examples in graph-to-text generation.Experiments on theAssociation for Computational LinguisticsAbstract Graph Dataset(ACL-AGD)and Automatic Content Extraction 05(ACE05)dataset demonstrate the effectiveness of our approach in distinguishing KG-text data of different qualities,evidenced by the largest performance gap between top-and bottom-ranked examples.We also find that the top examples selected through our dual-perspective framework consistently yield better performance than those selected by traditional measures.These results highlight the importance of data curation in improving graph-to-text generation.展开更多
Federated Graph Neural Networks (FedGNNs) have achieved significant success in representation learning for graph data, enabling collaborative training among multiple parties without sharing their raw graph data and so...Federated Graph Neural Networks (FedGNNs) have achieved significant success in representation learning for graph data, enabling collaborative training among multiple parties without sharing their raw graph data and solving the data isolation problem faced by centralized GNNs in data-sensitive scenarios. Despite the plethora of prior work on inference attacks against centralized GNNs, the vulnerability of FedGNNs to inference attacks has not yet been widely explored. It is still unclear whether the privacy leakage risks of centralized GNNs will also be introduced in FedGNNs. To bridge this gap, we present PIAFGNN, the first property inference attack (PIA) against FedGNNs. Compared with prior works on centralized GNNs, in PIAFGNN, the attacker can only obtain the global embedding gradient distributed by the central server. The attacker converts the task of stealing the target user’s local embeddings into a regression problem, using a regression model to generate the target graph node embeddings. By training shadow models and property classifiers, the attacker can infer the basic property information within the target graph that is of interest. Experiments on three benchmark graph datasets demonstrate that PIAFGNN achieves attack accuracy of over 70% in most cases, even approaching the attack accuracy of inference attacks against centralized GNNs in some instances, which is much higher than the attack accuracy of the random guessing method. Furthermore, we observe that common defense mechanisms cannot mitigate our attack without affecting the model’s performance on mainly classification tasks.展开更多
文摘The article is based on language model,through the cue word engineering and agent thinking method,automatic knowledge extraction,with China accounting standards support to complete the corresponding knowledge map construction.Through the way of extracting the accounting entities and their connections in the pattern layer,the data layer is provided for the fine-tuning and optimization of the large model.Studies found that,through the reasonable application of language model,knowledge can be realized in massive financial data neural five effective extracted tuples,and complete accounting knowledge map construction.
基金supported by the National Natural Science Foundation of China(Grant No.:62101087)the China Postdoctoral Science Foundation(Grant No.:2021MD703942)+2 种基金the Chongqing Postdoctoral Research Project Special Funding,China(Grant No.:2021XM2016)the Science Foundation of Chongqing Municipal Commission of Education,China(Grant No.:KJQN202100642)the Chongqing Natural Science Foundation,China(Grant No.:cstc2021jcyj-msxmX0834).
文摘Drug repurposing offers a promising alternative to traditional drug development and significantly re-duces costs and timelines by identifying new therapeutic uses for existing drugs.However,the current approaches often rely on limited data sources and simplistic hypotheses,which restrict their ability to capture the multi-faceted nature of biological systems.This study introduces adaptive multi-view learning(AMVL),a novel methodology that integrates chemical-induced transcriptional profiles(CTPs),knowledge graph(KG)embeddings,and large language model(LLM)representations,to enhance drug repurposing predictions.AMVL incorporates an innovative similarity matrix expansion strategy and leverages multi-view learning(MVL),matrix factorization,and ensemble optimization techniques to integrate heterogeneous multi-source data.Comprehensive evaluations on benchmark datasets(Fdata-set,Cdataset,and Ydataset)and the large-scale iDrug dataset demonstrate that AMVL outperforms state-of-the-art(SOTA)methods,achieving superior accuracy in predicting drug-disease associations across multiple metrics.Literature-based validation further confirmed the model's predictive capabilities,with seven out of the top ten predictions corroborated by post-2011 evidence.To promote transparency and reproducibility,all data and codes used in this study were open-sourced,providing resources for pro-cessing CTPs,KG,and LLM-based similarity calculations,along with the complete AMVL algorithm and benchmarking procedures.By unifying diverse data modalities,AMVL offers a robust and scalable so-lution for accelerating drug discovery,fostering advancements in translational medicine and integrating multi-omics data.We aim to inspire further innovations in multi-source data integration and support the development of more precise and efficient strategies for advancing drug discovery and translational medicine.
基金supported by the National Natural Science Foundation of China(grant numbers 62267005 and 42365008)the Guangxi Collaborative Innovation Center of Multi-Source Information Integration and Intelligent Processing.
文摘With the development of the Semantic Web,the number of ontologies grows exponentially and the semantic relationships between ontologies become more and more complex,understanding the true semantics of specific terms or concepts in an ontology is crucial for the matching task.At present,the main challenges facing ontology matching tasks based on representation learning methods are how to improve the embedding quality of ontology knowledge and how to integrate multiple features of ontology efficiently.Therefore,we propose an Ontology Matching Method Based on the Gated Graph Attention Model(OM-GGAT).Firstly,the semantic knowledge related to concepts in the ontology is encoded into vectors using the OWL2Vec^(*)method,and the relevant path information from the root node to the concept is embedded to understand better the true meaning of the concept itself and the relationship between concepts.Secondly,the ontology is transformed into the corresponding graph structure according to the semantic relation.Then,when extracting the features of the ontology graph nodes,different attention weights are assigned to each adjacent node of the central concept with the help of the attention mechanism idea.Finally,gated networks are designed to further fuse semantic and structural embedding representations efficiently.To verify the effectiveness of the proposed method,comparative experiments on matching tasks were carried out on public datasets.The results show that the OM-GGAT model can effectively improve the efficiency of ontology matching.
基金Researchers Supporting Project Number(RSPD2024R 553),King Saud University,Riyadh,Saudi Arabia.
文摘Wheat is a critical crop,extensively consumed worldwide,and its production enhancement is essential to meet escalating demand.The presence of diseases like stem rust,leaf rust,yellow rust,and tan spot significantly diminishes wheat yield,making the early and precise identification of these diseases vital for effective disease management.With advancements in deep learning algorithms,researchers have proposed many methods for the automated detection of disease pathogens;however,accurately detectingmultiple disease pathogens simultaneously remains a challenge.This challenge arises due to the scarcity of RGB images for multiple diseases,class imbalance in existing public datasets,and the difficulty in extracting features that discriminate between multiple classes of disease pathogens.In this research,a novel method is proposed based on Transfer Generative Adversarial Networks for augmenting existing data,thereby overcoming the problems of class imbalance and data scarcity.This study proposes a customized architecture of Vision Transformers(ViT),where the feature vector is obtained by concatenating features extracted from the custom ViT and Graph Neural Networks.This paper also proposes a Model AgnosticMeta Learning(MAML)based ensemble classifier for accurate classification.The proposedmodel,validated on public datasets for wheat disease pathogen classification,achieved a test accuracy of 99.20%and an F1-score of 97.95%.Compared with existing state-of-the-art methods,this proposed model outperforms in terms of accuracy,F1-score,and the number of disease pathogens detection.In future,more diseases can be included for detection along with some other modalities like pests and weed.
基金supported by National Key R&D Program of China(No.2022YFB3104500)Natural Science Foundation of Jiangsu Province(No.BK20222013)Scientific Research Foundation of Nanjing Institute of Technology(No.3534113223036)。
文摘The telecommunications industry is becoming increasingly aware of potential subscriber churn as a result of the growing popularity of smartphones in the mobile Internet era,the quick development of telecommunications services,the implementation of the number portability policy,and the intensifying competition among operators.At the same time,users'consumption preferences and choices are evolving.Excellent churn prediction models must be created in order to accurately predict the churn tendency,since keeping existing customers is far less expensive than acquiring new ones.But conventional or learning-based algorithms can only go so far into a single subscriber's data;they cannot take into consideration changes in a subscriber's subscription and ignore the coupling and correlation between various features.Additionally,the current churn prediction models have a high computational burden,a fuzzy weight distribution,and significant resource economic costs.The prediction algorithms involving network models currently in use primarily take into account the private information shared between users with text and pictures,ignoring the reference value supplied by other users with the same package.This work suggests a user churn prediction model based on Graph Attention Convolutional Neural Network(GAT-CNN)to address the aforementioned issues.The main contributions of this paper are as follows:Firstly,we present a three-tiered hierarchical cloud-edge cooperative framework that increases the volume of user feature input by means of two aggregations at the device,edge,and cloud layers.Second,we extend the use of users'own data by introducing self-attention and graph convolution models to track the relative changes of both users and packages simultaneously.Lastly,we build an integrated offline-online system for churn prediction based on the strengths of the two models,and we experimentally validate the efficacy of cloudside collaborative training and inference.In summary,the churn prediction model based on Graph Attention Convolutional Neural Network presented in this paper can effectively address the drawbacks of conventional algorithms and offer telecom operators crucial decision support in developing subscriber retention strategies and cutting operational expenses.
基金supported by National Key R&D Program of China(2022QY2000-02).
文摘Accurately recommending candidate news to users is a basic challenge of personalized news recommendation systems.Traditional methods are usually difficult to learn and acquire complex semantic information in news texts,resulting in unsatisfactory recommendation results.Besides,these traditional methods are more friendly to active users with rich historical behaviors.However,they can not effectively solve the long tail problem of inactive users.To address these issues,this research presents a novel general framework that combines Large Language Models(LLM)and Knowledge Graphs(KG)into traditional methods.To learn the contextual information of news text,we use LLMs’powerful text understanding ability to generate news representations with rich semantic information,and then,the generated news representations are used to enhance the news encoding in traditional methods.In addition,multi-hops relationship of news entities is mined and the structural information of news is encoded using KG,thus alleviating the challenge of long-tail distribution.Experimental results demonstrate that compared with various traditional models,on evaluation indicators such as AUC,MRR,nDCG@5 and nDCG@10,the framework significantly improves the recommendation performance.The successful integration of LLM and KG in our framework has established a feasible way for achieving more accurate personalized news recommendation.Our code is available at https://github.com/Xuan-ZW/LKPNR.
基金National Natural Science Foundation of China(12262028)Program for Young Talents of Science and Technology in Universities of Inner Mongolia Autonomous Region(NJYT22085)Inner Mongolia Autonomous Region Science and Technology Plan Project(2021GG0437)。
文摘The predictive model and design of heavy-duty metal rubber shock absorber for the powertrains of heavy-load mining vehicles were investigated.The microstructural characteristics of the wire mesh were elucidated using fractal graphs.A numerical model based on virtual fabrication technique was established to propose a design scheme for the wire mesh component.Four sets of wire mesh shock absorbers with various relative densities were prepared and a predictive model based on these relative densities was established through mechanical testing.To further enhance the predictive accuracy,a variable transposition fitting method was proposed to refine the model.Residual analysis was employed to quantitatively validate the results against those obtained from an experimental control group.The results show that the improved model exhibits higher predictive accuracy than the original model,with the determination coefficient(R^(2))of 0.9624.This study provides theoretical support for designing wire mesh shock absorbers with reduced testing requirements and enhanced design efficiency.
基金National Natural Science Foundation of China Key Project(No.42050103)Higher Education Disciplinary Innovation Program(No.B25052)+2 种基金the Guangdong Pearl River Talent Program Innovative and Entrepreneurial Team Project(No.2021ZT09H399)the Ministry of Education’s Frontiers Science Center for Deep-Time Digital Earth(DDE)(No.2652023001)Geological Survey Project of China Geological Survey(DD20240206201)。
文摘Since the beginning of the 21st century,advances in big data and artificial intelligence have driven a paradigm shift in the geosciences,moving the field from qualitative descriptions toward quantitative analysis,from observing phenomena to uncovering underlying mechanisms,from regional-scale investigations to global perspectives,and from experience-based inference toward data-and model-enabled intelligent prediction.AlphaEarth Foundations(AEF)is a next-generation geospatial intelligence platform that addresses these changes by introducing a unified 64-dimensional shared embedding space,enabling-for the first time-standardized representation and seamless integration of 12 distinct types of Earth observation data,including optical,radar,and lidar.This framework significantly improves data assimilation efficiency and resolves the persistent problem of“data silos”in geoscience research.AEF is helping redefine research methodologies and fostering breakthroughs,particularly in quantitative Earth system science.This paper systematically examines how AEF’s innovative architecture-featuring multi-source data fusion,high-dimensional feature representation learning,and a scalable computational framework-facilitates intelligent,precise,and realtime data-driven geoscientific research.Using case studies from resource and environmental applications,we demonstrate AEF’s broad potential and identify emerging innovation needs.Our findings show that AEF not only enhances the efficiency of solving traditional geoscientific problems but also stimulates novel research directions and methodological approaches.
文摘With the continuous development of artificial intelligence and natural language processing technologies, traditional retrieval-augmented generation (RAG) techniques face numerous challenges in document answer precision and similarity measurement. This study, set against the backdrop of the shipping industry, combines top-down and bottom-up schema design strategies to achieve precise and flexible knowledge representation. The research adopts a semi-structured approach, innovatively constructing an adaptive schema generation mechanism based on reinforcement learning, which models the knowledge graph construction process as a Markov decision process. This method begins with general concepts, defining foundational industry concepts, and then delves into abstracting core concepts specific to the maritime domain through an adaptive pattern generation mechanism that dynamically adjusts the knowledge structure. Specifically, the study designs a four-layer knowledge construction framework, including the data layer, modeling layer, technology layer, and application layer. It draws on a mutual indexing strategy, integrating large language models and traditional information extraction techniques. By leveraging self-attention mechanisms and graph attention networks, it efficiently extracts semantic relationships. The introduction of logic-form-driven solvers and symbolic decomposition techniques for reasoning significantly enhances the model’s ability to understand complex semantic relationships. Additionally, the use of open information extraction and knowledge alignment techniques further improves the efficiency and accuracy of information retrieval. Experimental results demonstrate that the proposed method not only achieves significant performance improvements in knowledge graph retrieval within the shipping domain but also holds important theoretical innovation and practical application value.
文摘The cost and strict input format requirements of GraphRAG make it less efficient for processing large documents. This paper proposes an alternative approach for constructing a knowledge graph (KG) from a PDF document with a focus on simplicity and cost-effectiveness. The process involves splitting the document into chunks, extracting concepts within each chunk using a large language model (LLM), and building relationships based on the proximity of concepts in the same chunk. Unlike traditional named entity recognition (NER), which identifies entities like “Shanghai”, the proposed method identifies concepts, such as “Convenient transportation in Shanghai” which is found to be more meaningful for KG construction. Each edge in the KG represents a relationship between concepts occurring in the same text chunk. The process is computationally inexpensive, leveraging locally set up tools like Mistral 7B openorca instruct and Ollama for model inference, ensuring the entire graph generation process is cost-free. A method of assigning weights to relationships, grouping similar pairs, and summarizing multiple relationships into a single edge with associated weight and relation details is introduced. Additionally, node degrees and communities are calculated for node sizing and coloring. This approach offers a scalable, cost-effective solution for generating meaningful knowledge graphs from large documents, achieving results comparable to GraphRAG while maintaining accessibility for personal machines.
基金support from the Full Bridge Fellowship for enabling the research stay at Virginia Tech.H.Xin acknowledge the financial support from the US Department of Energy,Office of Basic Energy Sciences under contract no.DE-SC0023323from the National Science Foundation through the grant 2245402 from CBET Catalysis and CDS&E programs.
文摘Knowledge graphs (KGs) offer a structured, machine-readable format for organizing complex information. In heterogeneous catalysis, where data on catalytic materials, reaction conditions, mechanisms, and synthesis routes are dispersed across diverse sources, KGs provide a semantic framework that supports data integration under the FAIR (Findable, Accessible, Interoperable, and Reusable) principles. This review aims to survey recent developments in catalysis KGs, describe the main techniques for graph construction, and highlight how artificial intelligence, particularly large language models (LLMs), enhances graph generation and query. We conducted a systematic analysis of the literature, focusing on ontology-guided text mining pipelines, graph population methods, and maintenance strategies. Our review identifies key trends: ontology-based approaches enable the automated extraction of domain knowledge, LLM-driven retrieval-augmented generation supports natural-language queries, and scalable graph architectures range from a few thousand to over a million triples. We discuss state-of-the-art applications, such as catalyst recommendation systems and reaction mechanism discovery tools, and examine the major challenges, including data heterogeneity, ontology alignment, and long-term graph curation. We conclude that KGs, when combined with AI methods, hold significant promise for accelerating catalyst discovery and knowledge management, but progress depends on establishing community standards for ontology development and maintenance. This review provides a roadmap for researchers seeking to leverage KGs to advance heterogeneous catalysis research.
文摘To address the underutilization of Chinese research materials in nonferrous metals,a method for constructing a domain of nonferrous metals knowledge graph(DNMKG)was established.Starting from a domain thesaurus,entities and relationships were mapped as resource description framework(RDF)triples to form the graph’s framework.Properties and related entities were extracted from open knowledge bases,enriching the graph.A large-scale,multi-source heterogeneous corpus of over 1×10^(9) words was compiled from recent literature to further expand DNMKG.Using the knowledge graph as prior knowledge,natural language processing techniques were applied to the corpus,generating word vectors.A novel entity evaluation algorithm was used to identify and extract real domain entities,which were added to DNMKG.A prototype system was developed to visualize the knowledge graph and support human−computer interaction.Results demonstrate that DNMKG can enhance knowledge discovery and improve research efficiency in the nonferrous metals field.
文摘With the continuous advancement of information technology,corpora and knowledge graphs(KGs)have become indispensable tools in modern language learning.This study explores how the integration of corpora and KGs in integrated English teaching can enhance students’abilities in vocabulary acquisition,grammar understanding,and discourse analysis.Through a comprehensive literature review,it elaborates on the theoretical foundations and practical values of these two technological tools in English instruction.The study designs a teaching model based on corpora and KGs and analyzes its specific applications in vocabulary,grammar,and discourse teaching within the Integrated English course.Additionally,the article discusses the challenges that may arise during implementation and proposes corresponding solutions.Finally,it envisions future research directions and application prospects.
基金supported by National Natural Science Foundation of China Joint Fund for Enterprise Innovation Development(U23B2029)National Natural Science Foundation of China(62076167,61772020)+1 种基金Key Scientific Research Project of Higher Education Institutions in Henan Province(24A520058,24A520060,23A520022)Postgraduate Education Reform and Quality Improvement Project of Henan Province(YJS2024AL053).
文摘Large language models cross-domain named entity recognition task in the face of the scarcity of large language labeled data in a specific domain,due to the entity bias arising from the variation of entity information between different domains,which makes large language models prone to spurious correlations problems when dealing with specific domains and entities.In order to solve this problem,this paper proposes a cross-domain named entity recognition method based on causal graph structure enhancement,which captures the cross-domain invariant causal structural representations between feature representations of text sequences and annotation sequences by establishing a causal learning and intervention module,so as to improve the utilization of causal structural features by the large languagemodels in the target domains,and thus effectively alleviate the false entity bias triggered by the false relevance problem;meanwhile,through the semantic feature fusion module,the semantic information of the source and target domains is effectively combined.The results show an improvement of 2.47%and 4.12%in the political and medical domains,respectively,compared with the benchmark model,and an excellent performance in small-sample scenarios,which proves the effectiveness of causal graph structural enhancement in improving the accuracy of cross-domain entity recognition and reducing false correlations.
基金funded by the National Natural Science Foundation of China,grant number 624010funded by the Natural Science Foundation of Anhui Province,grant number 2408085QF202+1 种基金funded by the Anhui Future Technology Research Institute Industry Guidance Fund Project,grant number 2023cyyd04funded by the Project of Research of Anhui Polytechnic University,grant number Xjky2022150.
文摘Pedestrian trajectory prediction is pivotal and challenging in applications such as autonomous driving,social robotics,and intelligent surveillance systems.Pedestrian trajectory is governed not only by individual intent but also by interactions with surrounding agents.These interactions are critical to trajectory prediction accuracy.While prior studies have employed Convolutional Neural Networks(CNNs)and Graph Convolutional Networks(GCNs)to model such interactions,these methods fail to distinguish varying influence levels among neighboring pedestrians.To address this,we propose a novel model based on a bidirectional graph attention network and spatio-temporal graphs to capture dynamic interactions.Specifically,we construct temporal and spatial graphs encoding the sequential evolution and spatial proximity among pedestrians.These features are then fused and processed by the Bidirectional Graph Attention Network(Bi-GAT),which models the bidirectional interactions between the target pedestrian and its neighbors.The model computes node attention weights(i.e.,similarity scores)to differentially aggregate neighbor information,enabling fine-grained interaction representations.Extensive experiments conducted on two widely used pedestrian trajectory prediction benchmark datasets demonstrate that our approach outperforms existing state-of-theartmethods regarding Average Displacement Error(ADE)and Final Displacement Error(FDE),highlighting its strong prediction accuracy and generalization capability.
文摘Data curation is vital for selecting effective demonstration examples in graph-to-text generation.However,evaluating the quality ofKnowledgeGraphs(KGs)remains challenging.Prior research exhibits a narrowfocus on structural statistics,such as the shortest path length,while the correctness of graphs in representing the associated text is rarely explored.To address this gap,we introduce a dual-perspective evaluation framework for KG-text data,based on the computation of structural adequacy and semantic alignment.Froma structural perspective,we propose the Weighted Incremental EdgeMethod(WIEM)to quantify graph completeness by leveraging agreement between relation models to predict possible edges between entities.WIEM targets to find increments from models on“unseen links”,whose presence is inversely proportional to the structural adequacy of the original KG in representing the text.From a semantic perspective,we evaluate how well a KG aligns with the text in capturing the intended meaning.To do so,we instruct a large language model to convert KGs into natural language andmeasure the similarity between generated and reference texts.Based on these computations,we apply a Top-K union method,integrating the structural and semantic modules,to rank and select high-quality KGs.We evaluate our framework against various approaches for selecting few-shot examples in graph-to-text generation.Experiments on theAssociation for Computational LinguisticsAbstract Graph Dataset(ACL-AGD)and Automatic Content Extraction 05(ACE05)dataset demonstrate the effectiveness of our approach in distinguishing KG-text data of different qualities,evidenced by the largest performance gap between top-and bottom-ranked examples.We also find that the top examples selected through our dual-perspective framework consistently yield better performance than those selected by traditional measures.These results highlight the importance of data curation in improving graph-to-text generation.
基金supported by the National Natural Science Foundation of China(Nos.62176122 and 62061146002).
文摘Federated Graph Neural Networks (FedGNNs) have achieved significant success in representation learning for graph data, enabling collaborative training among multiple parties without sharing their raw graph data and solving the data isolation problem faced by centralized GNNs in data-sensitive scenarios. Despite the plethora of prior work on inference attacks against centralized GNNs, the vulnerability of FedGNNs to inference attacks has not yet been widely explored. It is still unclear whether the privacy leakage risks of centralized GNNs will also be introduced in FedGNNs. To bridge this gap, we present PIAFGNN, the first property inference attack (PIA) against FedGNNs. Compared with prior works on centralized GNNs, in PIAFGNN, the attacker can only obtain the global embedding gradient distributed by the central server. The attacker converts the task of stealing the target user’s local embeddings into a regression problem, using a regression model to generate the target graph node embeddings. By training shadow models and property classifiers, the attacker can infer the basic property information within the target graph that is of interest. Experiments on three benchmark graph datasets demonstrate that PIAFGNN achieves attack accuracy of over 70% in most cases, even approaching the attack accuracy of inference attacks against centralized GNNs in some instances, which is much higher than the attack accuracy of the random guessing method. Furthermore, we observe that common defense mechanisms cannot mitigate our attack without affecting the model’s performance on mainly classification tasks.