Multi-modal knowledge graph completion(MMKGC)aims to complete missing entities or relations in multi-modal knowledge graphs,thereby discovering more previously unknown triples.Due to the continuous growth of data and ...Multi-modal knowledge graph completion(MMKGC)aims to complete missing entities or relations in multi-modal knowledge graphs,thereby discovering more previously unknown triples.Due to the continuous growth of data and knowledge and the limitations of data sources,the visual knowledge within the knowledge graphs is generally of low quality,and some entities suffer from the issue of missing visual modality.Nevertheless,previous studies of MMKGC have primarily focused on how to facilitate modality interaction and fusion while neglecting the problems of low modality quality and modality missing.In this case,mainstream MMKGC models only use pre-trained visual encoders to extract features and transfer the semantic information to the joint embeddings through modal fusion,which inevitably suffers from problems such as error propagation and increased uncertainty.To address these problems,we propose a Multi-modal knowledge graph Completion model based on Super-resolution and Detailed Description Generation(MMCSD).Specifically,we leverage a pre-trained residual network to enhance the resolution and improve the quality of the visual modality.Moreover,we design multi-level visual semantic extraction and entity description generation,thereby further extracting entity semantics from structural triples and visual images.Meanwhile,we train a variational multi-modal auto-encoder and utilize a pre-trained multi-modal language model to complement the missing visual features.We conducted experiments on FB15K-237 and DB13K,and the results showed that MMCSD can effectively perform MMKGC and achieve state-of-the-art performance.展开更多
In the international shipping industry, digital intelligence transformation has become essential, with both governments and enterprises actively working to integrate diverse datasets. The domain of maritime and shippi...In the international shipping industry, digital intelligence transformation has become essential, with both governments and enterprises actively working to integrate diverse datasets. The domain of maritime and shipping is characterized by a vast array of document types, filled with complex, large-scale, and often chaotic knowledge and relationships. Effectively managing these documents is crucial for developing a Large Language Model (LLM) in the maritime domain, enabling practitioners to access and leverage valuable information. A Knowledge Graph (KG) offers a state-of-the-art solution for enhancing knowledge retrieval, providing more accurate responses and enabling context-aware reasoning. This paper presents a framework for utilizing maritime and shipping documents to construct a knowledge graph using GraphRAG, a hybrid tool combining graph-based retrieval and generation capabilities. The extraction of entities and relationships from these documents and the KG construction process are detailed. Furthermore, the KG is integrated with an LLM to develop a Q&A system, demonstrating that the system significantly improves answer accuracy compared to traditional LLMs. Additionally, the KG construction process is up to 50% faster than conventional LLM-based approaches, underscoring the efficiency of our method. This study provides a promising approach to digital intelligence in shipping, advancing knowledge accessibility and decision-making.展开更多
Deep-time Earth research plays a pivotal role in deciphering the rates,patterns,and mechanisms of Earth's evolutionary processes throughout geological history,providing essential scientific foundations for climate...Deep-time Earth research plays a pivotal role in deciphering the rates,patterns,and mechanisms of Earth's evolutionary processes throughout geological history,providing essential scientific foundations for climate prediction,natural resource exploration,and sustainable planetary stewardship.To advance Deep-time Earth research in the era of big data and artificial intelligence,the International Union of Geological Sciences initiated the“Deeptime Digital Earth International Big Science Program”(DDE)in 2019.At the core of this ambitious program lies the development of geoscience knowledge graphs,serving as a transformative knowledge infrastructure that enables the integration,sharing,mining,and analysis of heterogeneous geoscience big data.The DDE knowledge graph initiative has made significant strides in three critical dimensions:(1)establishing a unified knowledge structure across geoscience disciplines that ensures consistent representation of geological entities and their interrelationships through standardized ontologies and semantic frameworks;(2)developing a robust and scalable software infrastructure capable of supporting both expert-driven and machine-assisted knowledge engineering for large-scale graph construction and management;(3)implementing a comprehensive three-tiered architecture encompassing basic,discipline-specific,and application-oriented knowledge graphs,spanning approximately 20 geoscience disciplines.Through its open knowledge framework and international collaborative network,this initiative has fostered multinational research collaborations,establishing a robust foundation for next-generation geoscience research while propelling the discipline toward FAIR(Findable,Accessible,Interoperable,Reusable)data practices in deep-time Earth systems research.展开更多
With the continuous advancement of information technology,corpora and knowledge graphs(KGs)have become indispensable tools in modern language learning.This study explores how the integration of corpora and KGs in inte...With the continuous advancement of information technology,corpora and knowledge graphs(KGs)have become indispensable tools in modern language learning.This study explores how the integration of corpora and KGs in integrated English teaching can enhance students’abilities in vocabulary acquisition,grammar understanding,and discourse analysis.Through a comprehensive literature review,it elaborates on the theoretical foundations and practical values of these two technological tools in English instruction.The study designs a teaching model based on corpora and KGs and analyzes its specific applications in vocabulary,grammar,and discourse teaching within the Integrated English course.Additionally,the article discusses the challenges that may arise during implementation and proposes corresponding solutions.Finally,it envisions future research directions and application prospects.展开更多
Drug repurposing offers a promising alternative to traditional drug development and significantly re-duces costs and timelines by identifying new therapeutic uses for existing drugs.However,the current approaches ofte...Drug repurposing offers a promising alternative to traditional drug development and significantly re-duces costs and timelines by identifying new therapeutic uses for existing drugs.However,the current approaches often rely on limited data sources and simplistic hypotheses,which restrict their ability to capture the multi-faceted nature of biological systems.This study introduces adaptive multi-view learning(AMVL),a novel methodology that integrates chemical-induced transcriptional profiles(CTPs),knowledge graph(KG)embeddings,and large language model(LLM)representations,to enhance drug repurposing predictions.AMVL incorporates an innovative similarity matrix expansion strategy and leverages multi-view learning(MVL),matrix factorization,and ensemble optimization techniques to integrate heterogeneous multi-source data.Comprehensive evaluations on benchmark datasets(Fdata-set,Cdataset,and Ydataset)and the large-scale iDrug dataset demonstrate that AMVL outperforms state-of-the-art(SOTA)methods,achieving superior accuracy in predicting drug-disease associations across multiple metrics.Literature-based validation further confirmed the model's predictive capabilities,with seven out of the top ten predictions corroborated by post-2011 evidence.To promote transparency and reproducibility,all data and codes used in this study were open-sourced,providing resources for pro-cessing CTPs,KG,and LLM-based similarity calculations,along with the complete AMVL algorithm and benchmarking procedures.By unifying diverse data modalities,AMVL offers a robust and scalable so-lution for accelerating drug discovery,fostering advancements in translational medicine and integrating multi-omics data.We aim to inspire further innovations in multi-source data integration and support the development of more precise and efficient strategies for advancing drug discovery and translational medicine.展开更多
Knowledge graphs (KGs) offer a structured, machine-readable format for organizing complex information. In heterogeneous catalysis, where data on catalytic materials, reaction conditions, mechanisms, and synthesis rout...Knowledge graphs (KGs) offer a structured, machine-readable format for organizing complex information. In heterogeneous catalysis, where data on catalytic materials, reaction conditions, mechanisms, and synthesis routes are dispersed across diverse sources, KGs provide a semantic framework that supports data integration under the FAIR (Findable, Accessible, Interoperable, and Reusable) principles. This review aims to survey recent developments in catalysis KGs, describe the main techniques for graph construction, and highlight how artificial intelligence, particularly large language models (LLMs), enhances graph generation and query. We conducted a systematic analysis of the literature, focusing on ontology-guided text mining pipelines, graph population methods, and maintenance strategies. Our review identifies key trends: ontology-based approaches enable the automated extraction of domain knowledge, LLM-driven retrieval-augmented generation supports natural-language queries, and scalable graph architectures range from a few thousand to over a million triples. We discuss state-of-the-art applications, such as catalyst recommendation systems and reaction mechanism discovery tools, and examine the major challenges, including data heterogeneity, ontology alignment, and long-term graph curation. We conclude that KGs, when combined with AI methods, hold significant promise for accelerating catalyst discovery and knowledge management, but progress depends on establishing community standards for ontology development and maintenance. This review provides a roadmap for researchers seeking to leverage KGs to advance heterogeneous catalysis research.展开更多
The article is based on language model,through the cue word engineering and agent thinking method,automatic knowledge extraction,with China accounting standards support to complete the corresponding knowledge map cons...The article is based on language model,through the cue word engineering and agent thinking method,automatic knowledge extraction,with China accounting standards support to complete the corresponding knowledge map construction.Through the way of extracting the accounting entities and their connections in the pattern layer,the data layer is provided for the fine-tuning and optimization of the large model.Studies found that,through the reasonable application of language model,knowledge can be realized in massive financial data neural five effective extracted tuples,and complete accounting knowledge map construction.展开更多
To address the underutilization of Chinese research materials in nonferrous metals,a method for constructing a domain of nonferrous metals knowledge graph(DNMKG)was established.Starting from a domain thesaurus,entitie...To address the underutilization of Chinese research materials in nonferrous metals,a method for constructing a domain of nonferrous metals knowledge graph(DNMKG)was established.Starting from a domain thesaurus,entities and relationships were mapped as resource description framework(RDF)triples to form the graph’s framework.Properties and related entities were extracted from open knowledge bases,enriching the graph.A large-scale,multi-source heterogeneous corpus of over 1×10^(9) words was compiled from recent literature to further expand DNMKG.Using the knowledge graph as prior knowledge,natural language processing techniques were applied to the corpus,generating word vectors.A novel entity evaluation algorithm was used to identify and extract real domain entities,which were added to DNMKG.A prototype system was developed to visualize the knowledge graph and support human−computer interaction.Results demonstrate that DNMKG can enhance knowledge discovery and improve research efficiency in the nonferrous metals field.展开更多
Existing wireless networks are flooded with video data transmissions,and the demand for high-speed and low-latency video services continues to surge.This has brought with it challenges to networks in the form of conge...Existing wireless networks are flooded with video data transmissions,and the demand for high-speed and low-latency video services continues to surge.This has brought with it challenges to networks in the form of congestion as well as the need for more resources and more dedicated caching schemes.Recently,Multi-access Edge Computing(MEC)-enabled heterogeneous networks,which leverage edge caches for proximity delivery,have emerged as a promising solution to all of these problems.Designing an effective edge caching scheme is critical to its success,however,in the face of limited resources.We propose a novel Knowledge Graph(KG)-based Dueling Deep Q-Network(KG-DDQN)for cooperative caching in MEC-enabled heterogeneous networks.The KGDDQN scheme leverages a KG to uncover video relations,providing valuable insights into user preferences for the caching scheme.Specifically,the KG guides the selection of related videos as caching candidates(i.e.,actions in the DDQN),thus providing a rich reference for implementing a personalized caching scheme while also improving the decision efficiency of the DDQN.Extensive simulation results validate the convergence effectiveness of the KG-DDQN,and it also outperforms baselines regarding cache hit rate and service delay.展开更多
Accurate prediction of drug-target interactions(DTIs)plays a pivotal role in drug discovery,facilitating optimization of lead compounds,drug repurposing and elucidation of drug side effects.However,traditional DTI pre...Accurate prediction of drug-target interactions(DTIs)plays a pivotal role in drug discovery,facilitating optimization of lead compounds,drug repurposing and elucidation of drug side effects.However,traditional DTI prediction methods are often limited by incomplete biological data and insufficient representation of protein features.In this study,we proposed KG-CNNDTI,a novel knowledge graph-enhanced framework for DTI prediction,which integrates heterogeneous biological information to improve model generalizability and predictive performance.The proposed model utilized protein embeddings derived from a biomedical knowledge graph via the Node2Vec algorithm,which were further enriched with contextualized sequence representations obtained from ProteinBERT.For compound representation,multiple molecular fingerprint schemes alongside the Uni-Mol pre-trained model were evaluated.The fused representations served as inputs to both classical machine learning models and a convolutional neural network-based predictor.Experimental evaluations across benchmark datasets demonstrated that KG-CNNDTI achieved superior performance compared to state-of-the-art methods,particularly in terms of Precision,Recall,F1-Score and area under the precision-recall curve(AUPR).Ablation analysis highlighted the substantial contribution of knowledge graph-derived features.Moreover,KG-CNNDTI was employed for virtual screening of natural products against Alzheimer's disease,resulting in 40 candidate compounds.5 were supported by literature evidence,among which 3 were further validated in vitro assays.展开更多
As a new data management paradigm,knowledge graphs can integrate multiple data sources and achieve quick responses,reasoning and better predictions in drug discovery.Characterized by powerful contagion and a high rate...As a new data management paradigm,knowledge graphs can integrate multiple data sources and achieve quick responses,reasoning and better predictions in drug discovery.Characterized by powerful contagion and a high rate of morbidity and mortality,porcine reproductive and respiratory syndrome(PRRS)is a common infectious disease in the global swine industry that causes economically great losses.Traditional Chinese medicine(TCM)has advantages in low adverse effects and a relatively affordable cost of application,and TCM is therefore conceived as a possibility to treat PRRS under the current circumstance that there is a lack of safe and effective approaches.Here,we constructed a knowledge graph containing common biomedical data from humans and Sus Scrofa as well as information from thousands of TCMs.Subsequently,we validated the effectiveness of the Sus Scrofa knowledge graph by the t-SNE algorithm and selected the optimal model(i.e.,transR)from six typical models,namely,transE,transR,DistMult,ComplEx,RESCAL and RotatE,according to five indicators,namely,MRR,MR,HITS@1,HITS@3 and HITS@10.Based on embedding vectors trained by the optimal model,anti-PRRSV TCMs were predicted by two paths,namely,VHC-Herb and VHPC-Herb,and potential anti-PRRSVTCMs were identified by retrieving the HERB database according to the phar-macological properties corresponding to symptoms of PRRS.Ultimately,Dan Shen's(Salvia miltiorrhiza Bunge)capacity to resist PRRSV infection was validated by a cell experiment in which the inhibition rate of PRRSV exceeded90%when the concentrations of Dan Shen extract were 0.004,0.008,0.016 and 0.032 mg/mL.In summary,this is the first report on the Sus Scrofa knowledge graph including TCM information,and our study reflects the important application values of deep learning on graphs in the swine industry as well as providing accessible TCM resources for PRRS.展开更多
Knowledge graphs convey precise semantic information that can be effectively interpreted by neural networks,and generating descriptive text based on these graphs places significant emphasis on content consistency.Howe...Knowledge graphs convey precise semantic information that can be effectively interpreted by neural networks,and generating descriptive text based on these graphs places significant emphasis on content consistency.However,knowledge graphs are inadequate for providing additional linguistic features such as paragraph structure and expressive modes,making it challenging to ensure content coherence in generating text that spans multiple sentences.This lack of coherence can further compromise the overall consistency of the content within a paragraph.In this work,we present the generation of scientific abstracts by leveraging knowledge graphs,with a focus on enhancing both content consistency and coherence.In particular,we construct the ACL Abstract Graph Dataset(ACL-AGD)which pairs knowledge graphs with text,incorporating sentence labels to guide text structure and diverse expressions.We then implement a Siamese network to complement and concretize the entities and relations based on paragraph structure by accomplishing two tasks:graph-to-text generation and entity alignment.Extensive experiments demonstrate that the logical paragraphs generated by our method exhibit entities with a uniform position distribution and appropriate frequency.In terms of content,our method accurately represents the information encoded in the knowledge graph,prevents the generation of irrelevant content,and achieves coherent and non-redundant adjacent sentences,even with a shared knowledge graph.展开更多
After the design of aerospace products is completed,a manufacturability assessment needs to be conducted based on 3D model's features in terms of modeling quality and process design,otherwise the cost of design ch...After the design of aerospace products is completed,a manufacturability assessment needs to be conducted based on 3D model's features in terms of modeling quality and process design,otherwise the cost of design changes will increase.Due to the poor structure and low reusability of product manufacturing feature information and assessment knowledge in the current aerospace product manufacturability assessment process,it is difficult to realize automated manufacturability assessment.To address these issues,a domain ontology model is established for aerospace product manufacturability assessment in this paper.On this basis,a structured representation method of manufacturability assessment knowledge and a knowledge graph data layer construction method are proposed.Based on the semantic information and association information expressed by the knowledge graph,a rule matching method based on subgraph matching is proposed to improve the precision and recall.Finally,applications and experiments based on the software platform verify the effectiveness of the proposed knowledge graph construction and rule matching method.展开更多
In the domain of knowledge graph embedding,conventional approaches typically transform entities and relations into continuous vector spaces.However,parameter efficiency becomes increasingly crucial when dealing with l...In the domain of knowledge graph embedding,conventional approaches typically transform entities and relations into continuous vector spaces.However,parameter efficiency becomes increasingly crucial when dealing with large-scale knowledge graphs that contain vast numbers of entities and relations.In particular,resource-intensive embeddings often lead to increased computational costs,and may limit scalability and adaptability in practical environ-ments,such as in low-resource settings or real-world applications.This paper explores an approach to knowledge graph representation learning that leverages small,reserved entities and relation sets for parameter-efficient embedding.We introduce a hierarchical attention network designed to refine and maximize the representational quality of embeddings by selectively focusing on these reserved sets,thereby reducing model complexity.Empirical assessments validate that our model achieves high performance on the benchmark dataset with fewer parameters and smaller embedding dimensions.The ablation studies further highlight the impact and contribution of each component in the proposed hierarchical attention structure.展开更多
AIM:To develop a traditional Chinese medicine(TCM)knowledge graph(KG)for diabetic retinopathy(DR)diagnosis and treatment by integrating literature and medical records,thereby enhancing TCM knowledge accessibility and ...AIM:To develop a traditional Chinese medicine(TCM)knowledge graph(KG)for diabetic retinopathy(DR)diagnosis and treatment by integrating literature and medical records,thereby enhancing TCM knowledge accessibility and providing innovative approaches for TCM inheritance and DR management.METHODS:First,a KG framework was established with a schema-layer design.Second,high-quality literature and electronic medical records served as data sources.Named entity recognition was performed using the ALBERT-BiLSTMCRF model,and semantic relationships were curated by domain experts.Third,knowledge fusion was mainly achieved through an alias library.Subsequently,the data layer was mapped to the schema layer to refine the KG,and knowledge was stored in Neo4j.Finally,exploratory work on intelligent question answering was conducted based on the constructed KG.RESULTS:In Neo4j,a KG for TCM diagnosis and treatment was constructed,incorporating 6 types of labels,5 types of relationships,5 types of attributes,822 nodes,and 1,318 relationship instances.This systematic KG supports logical reasoning and intelligent question answering.The question answering model achieved a precision of 95%,a recall of 95%,and a weighted F1-score of 95%.CONCLUSION:This study proposes a semi-automatic knowledge-mapping scheme to balance integration efficiency and accuracy.Clinical data-driven entity and relationship construction enables digital dialectical reasoning.Exploratory applications show the KG’s potential in intelligent question answering,providing new insights for TCM health management.展开更多
The cost and strict input format requirements of GraphRAG make it less efficient for processing large documents. This paper proposes an alternative approach for constructing a knowledge graph (KG) from a PDF document ...The cost and strict input format requirements of GraphRAG make it less efficient for processing large documents. This paper proposes an alternative approach for constructing a knowledge graph (KG) from a PDF document with a focus on simplicity and cost-effectiveness. The process involves splitting the document into chunks, extracting concepts within each chunk using a large language model (LLM), and building relationships based on the proximity of concepts in the same chunk. Unlike traditional named entity recognition (NER), which identifies entities like “Shanghai”, the proposed method identifies concepts, such as “Convenient transportation in Shanghai” which is found to be more meaningful for KG construction. Each edge in the KG represents a relationship between concepts occurring in the same text chunk. The process is computationally inexpensive, leveraging locally set up tools like Mistral 7B openorca instruct and Ollama for model inference, ensuring the entire graph generation process is cost-free. A method of assigning weights to relationships, grouping similar pairs, and summarizing multiple relationships into a single edge with associated weight and relation details is introduced. Additionally, node degrees and communities are calculated for node sizing and coloring. This approach offers a scalable, cost-effective solution for generating meaningful knowledge graphs from large documents, achieving results comparable to GraphRAG while maintaining accessibility for personal machines.展开更多
Data curation is vital for selecting effective demonstration examples in graph-to-text generation.However,evaluating the quality ofKnowledgeGraphs(KGs)remains challenging.Prior research exhibits a narrowfocus on struc...Data curation is vital for selecting effective demonstration examples in graph-to-text generation.However,evaluating the quality ofKnowledgeGraphs(KGs)remains challenging.Prior research exhibits a narrowfocus on structural statistics,such as the shortest path length,while the correctness of graphs in representing the associated text is rarely explored.To address this gap,we introduce a dual-perspective evaluation framework for KG-text data,based on the computation of structural adequacy and semantic alignment.Froma structural perspective,we propose the Weighted Incremental EdgeMethod(WIEM)to quantify graph completeness by leveraging agreement between relation models to predict possible edges between entities.WIEM targets to find increments from models on“unseen links”,whose presence is inversely proportional to the structural adequacy of the original KG in representing the text.From a semantic perspective,we evaluate how well a KG aligns with the text in capturing the intended meaning.To do so,we instruct a large language model to convert KGs into natural language andmeasure the similarity between generated and reference texts.Based on these computations,we apply a Top-K union method,integrating the structural and semantic modules,to rank and select high-quality KGs.We evaluate our framework against various approaches for selecting few-shot examples in graph-to-text generation.Experiments on theAssociation for Computational LinguisticsAbstract Graph Dataset(ACL-AGD)and Automatic Content Extraction 05(ACE05)dataset demonstrate the effectiveness of our approach in distinguishing KG-text data of different qualities,evidenced by the largest performance gap between top-and bottom-ranked examples.We also find that the top examples selected through our dual-perspective framework consistently yield better performance than those selected by traditional measures.These results highlight the importance of data curation in improving graph-to-text generation.展开更多
With the continuous development of artificial intelligence and natural language processing technologies, traditional retrieval-augmented generation (RAG) techniques face numerous challenges in document answer precisio...With the continuous development of artificial intelligence and natural language processing technologies, traditional retrieval-augmented generation (RAG) techniques face numerous challenges in document answer precision and similarity measurement. This study, set against the backdrop of the shipping industry, combines top-down and bottom-up schema design strategies to achieve precise and flexible knowledge representation. The research adopts a semi-structured approach, innovatively constructing an adaptive schema generation mechanism based on reinforcement learning, which models the knowledge graph construction process as a Markov decision process. This method begins with general concepts, defining foundational industry concepts, and then delves into abstracting core concepts specific to the maritime domain through an adaptive pattern generation mechanism that dynamically adjusts the knowledge structure. Specifically, the study designs a four-layer knowledge construction framework, including the data layer, modeling layer, technology layer, and application layer. It draws on a mutual indexing strategy, integrating large language models and traditional information extraction techniques. By leveraging self-attention mechanisms and graph attention networks, it efficiently extracts semantic relationships. The introduction of logic-form-driven solvers and symbolic decomposition techniques for reasoning significantly enhances the model’s ability to understand complex semantic relationships. Additionally, the use of open information extraction and knowledge alignment techniques further improves the efficiency and accuracy of information retrieval. Experimental results demonstrate that the proposed method not only achieves significant performance improvements in knowledge graph retrieval within the shipping domain but also holds important theoretical innovation and practical application value.展开更多
Knowledge graphs(KGs),which organize real-world knowledge in triples,often suffer from issues of incompleteness.To address this,multi-hop knowledge graph reasoning(KGR)methods have been proposed for interpretable know...Knowledge graphs(KGs),which organize real-world knowledge in triples,often suffer from issues of incompleteness.To address this,multi-hop knowledge graph reasoning(KGR)methods have been proposed for interpretable knowledge graph completion.The primary approaches to KGR can be broadly classified into two categories:reinforcement learning(RL)-based methods and sequence-to-sequence(seq2seq)-based methods.While each method has its own distinct advantages,they also come with inherent limitations.To leverage the strengths of each method while addressing their weaknesses,we propose a cyclical training method that alternates for several loops between the seq2seq training phase and the policy-based RL training phase using a transformer architecture.Additionally,a multimodal data encoding(MDE)module is introduced to improve the representation of entities and relations in KGs.TheMDE module treats entities and relations as distinct modalities,processing each with a dedicated network specialized for its respective modality.It then combines the representations of entities and relations in a dynamic and fine-grained manner using a gating mechanism.The experimental results from the knowledge graph completion task highlight the effectiveness of the proposed framework.Across five benchmark datasets,our framework achieves an average improvement of 1.7%in the Hits@1 metric and a 0.8%average increase in the Mean Reciprocal Rank(MRR)compared to other strong baseline methods.Notably,the maximum improvement in Hits@1 exceeds 4%,further demonstrating the effectiveness of the proposed approach.展开更多
基金funded by Research Project,grant number BHQ090003000X03。
文摘Multi-modal knowledge graph completion(MMKGC)aims to complete missing entities or relations in multi-modal knowledge graphs,thereby discovering more previously unknown triples.Due to the continuous growth of data and knowledge and the limitations of data sources,the visual knowledge within the knowledge graphs is generally of low quality,and some entities suffer from the issue of missing visual modality.Nevertheless,previous studies of MMKGC have primarily focused on how to facilitate modality interaction and fusion while neglecting the problems of low modality quality and modality missing.In this case,mainstream MMKGC models only use pre-trained visual encoders to extract features and transfer the semantic information to the joint embeddings through modal fusion,which inevitably suffers from problems such as error propagation and increased uncertainty.To address these problems,we propose a Multi-modal knowledge graph Completion model based on Super-resolution and Detailed Description Generation(MMCSD).Specifically,we leverage a pre-trained residual network to enhance the resolution and improve the quality of the visual modality.Moreover,we design multi-level visual semantic extraction and entity description generation,thereby further extracting entity semantics from structural triples and visual images.Meanwhile,we train a variational multi-modal auto-encoder and utilize a pre-trained multi-modal language model to complement the missing visual features.We conducted experiments on FB15K-237 and DB13K,and the results showed that MMCSD can effectively perform MMKGC and achieve state-of-the-art performance.
文摘In the international shipping industry, digital intelligence transformation has become essential, with both governments and enterprises actively working to integrate diverse datasets. The domain of maritime and shipping is characterized by a vast array of document types, filled with complex, large-scale, and often chaotic knowledge and relationships. Effectively managing these documents is crucial for developing a Large Language Model (LLM) in the maritime domain, enabling practitioners to access and leverage valuable information. A Knowledge Graph (KG) offers a state-of-the-art solution for enhancing knowledge retrieval, providing more accurate responses and enabling context-aware reasoning. This paper presents a framework for utilizing maritime and shipping documents to construct a knowledge graph using GraphRAG, a hybrid tool combining graph-based retrieval and generation capabilities. The extraction of entities and relationships from these documents and the KG construction process are detailed. Furthermore, the KG is integrated with an LLM to develop a Q&A system, demonstrating that the system significantly improves answer accuracy compared to traditional LLMs. Additionally, the KG construction process is up to 50% faster than conventional LLM-based approaches, underscoring the efficiency of our method. This study provides a promising approach to digital intelligence in shipping, advancing knowledge accessibility and decision-making.
基金Strategic Priority Research Program of the Chinese Academy of Sciences,No.XDB0740000National Key Research and Development Program of China,No.2022YFB3904200,No.2022YFF0711601+1 种基金Key Project of Innovation LREIS,No.PI009National Natural Science Foundation of China,No.42471503。
文摘Deep-time Earth research plays a pivotal role in deciphering the rates,patterns,and mechanisms of Earth's evolutionary processes throughout geological history,providing essential scientific foundations for climate prediction,natural resource exploration,and sustainable planetary stewardship.To advance Deep-time Earth research in the era of big data and artificial intelligence,the International Union of Geological Sciences initiated the“Deeptime Digital Earth International Big Science Program”(DDE)in 2019.At the core of this ambitious program lies the development of geoscience knowledge graphs,serving as a transformative knowledge infrastructure that enables the integration,sharing,mining,and analysis of heterogeneous geoscience big data.The DDE knowledge graph initiative has made significant strides in three critical dimensions:(1)establishing a unified knowledge structure across geoscience disciplines that ensures consistent representation of geological entities and their interrelationships through standardized ontologies and semantic frameworks;(2)developing a robust and scalable software infrastructure capable of supporting both expert-driven and machine-assisted knowledge engineering for large-scale graph construction and management;(3)implementing a comprehensive three-tiered architecture encompassing basic,discipline-specific,and application-oriented knowledge graphs,spanning approximately 20 geoscience disciplines.Through its open knowledge framework and international collaborative network,this initiative has fostered multinational research collaborations,establishing a robust foundation for next-generation geoscience research while propelling the discipline toward FAIR(Findable,Accessible,Interoperable,Reusable)data practices in deep-time Earth systems research.
文摘With the continuous advancement of information technology,corpora and knowledge graphs(KGs)have become indispensable tools in modern language learning.This study explores how the integration of corpora and KGs in integrated English teaching can enhance students’abilities in vocabulary acquisition,grammar understanding,and discourse analysis.Through a comprehensive literature review,it elaborates on the theoretical foundations and practical values of these two technological tools in English instruction.The study designs a teaching model based on corpora and KGs and analyzes its specific applications in vocabulary,grammar,and discourse teaching within the Integrated English course.Additionally,the article discusses the challenges that may arise during implementation and proposes corresponding solutions.Finally,it envisions future research directions and application prospects.
基金supported by the National Natural Science Foundation of China(Grant No.:62101087)the China Postdoctoral Science Foundation(Grant No.:2021MD703942)+2 种基金the Chongqing Postdoctoral Research Project Special Funding,China(Grant No.:2021XM2016)the Science Foundation of Chongqing Municipal Commission of Education,China(Grant No.:KJQN202100642)the Chongqing Natural Science Foundation,China(Grant No.:cstc2021jcyj-msxmX0834).
文摘Drug repurposing offers a promising alternative to traditional drug development and significantly re-duces costs and timelines by identifying new therapeutic uses for existing drugs.However,the current approaches often rely on limited data sources and simplistic hypotheses,which restrict their ability to capture the multi-faceted nature of biological systems.This study introduces adaptive multi-view learning(AMVL),a novel methodology that integrates chemical-induced transcriptional profiles(CTPs),knowledge graph(KG)embeddings,and large language model(LLM)representations,to enhance drug repurposing predictions.AMVL incorporates an innovative similarity matrix expansion strategy and leverages multi-view learning(MVL),matrix factorization,and ensemble optimization techniques to integrate heterogeneous multi-source data.Comprehensive evaluations on benchmark datasets(Fdata-set,Cdataset,and Ydataset)and the large-scale iDrug dataset demonstrate that AMVL outperforms state-of-the-art(SOTA)methods,achieving superior accuracy in predicting drug-disease associations across multiple metrics.Literature-based validation further confirmed the model's predictive capabilities,with seven out of the top ten predictions corroborated by post-2011 evidence.To promote transparency and reproducibility,all data and codes used in this study were open-sourced,providing resources for pro-cessing CTPs,KG,and LLM-based similarity calculations,along with the complete AMVL algorithm and benchmarking procedures.By unifying diverse data modalities,AMVL offers a robust and scalable so-lution for accelerating drug discovery,fostering advancements in translational medicine and integrating multi-omics data.We aim to inspire further innovations in multi-source data integration and support the development of more precise and efficient strategies for advancing drug discovery and translational medicine.
基金support from the Full Bridge Fellowship for enabling the research stay at Virginia Tech.H.Xin acknowledge the financial support from the US Department of Energy,Office of Basic Energy Sciences under contract no.DE-SC0023323from the National Science Foundation through the grant 2245402 from CBET Catalysis and CDS&E programs.
文摘Knowledge graphs (KGs) offer a structured, machine-readable format for organizing complex information. In heterogeneous catalysis, where data on catalytic materials, reaction conditions, mechanisms, and synthesis routes are dispersed across diverse sources, KGs provide a semantic framework that supports data integration under the FAIR (Findable, Accessible, Interoperable, and Reusable) principles. This review aims to survey recent developments in catalysis KGs, describe the main techniques for graph construction, and highlight how artificial intelligence, particularly large language models (LLMs), enhances graph generation and query. We conducted a systematic analysis of the literature, focusing on ontology-guided text mining pipelines, graph population methods, and maintenance strategies. Our review identifies key trends: ontology-based approaches enable the automated extraction of domain knowledge, LLM-driven retrieval-augmented generation supports natural-language queries, and scalable graph architectures range from a few thousand to over a million triples. We discuss state-of-the-art applications, such as catalyst recommendation systems and reaction mechanism discovery tools, and examine the major challenges, including data heterogeneity, ontology alignment, and long-term graph curation. We conclude that KGs, when combined with AI methods, hold significant promise for accelerating catalyst discovery and knowledge management, but progress depends on establishing community standards for ontology development and maintenance. This review provides a roadmap for researchers seeking to leverage KGs to advance heterogeneous catalysis research.
文摘The article is based on language model,through the cue word engineering and agent thinking method,automatic knowledge extraction,with China accounting standards support to complete the corresponding knowledge map construction.Through the way of extracting the accounting entities and their connections in the pattern layer,the data layer is provided for the fine-tuning and optimization of the large model.Studies found that,through the reasonable application of language model,knowledge can be realized in massive financial data neural five effective extracted tuples,and complete accounting knowledge map construction.
文摘To address the underutilization of Chinese research materials in nonferrous metals,a method for constructing a domain of nonferrous metals knowledge graph(DNMKG)was established.Starting from a domain thesaurus,entities and relationships were mapped as resource description framework(RDF)triples to form the graph’s framework.Properties and related entities were extracted from open knowledge bases,enriching the graph.A large-scale,multi-source heterogeneous corpus of over 1×10^(9) words was compiled from recent literature to further expand DNMKG.Using the knowledge graph as prior knowledge,natural language processing techniques were applied to the corpus,generating word vectors.A novel entity evaluation algorithm was used to identify and extract real domain entities,which were added to DNMKG.A prototype system was developed to visualize the knowledge graph and support human−computer interaction.Results demonstrate that DNMKG can enhance knowledge discovery and improve research efficiency in the nonferrous metals field.
基金supported by the National Natural Science Foundation of China(Nos.62201419,62372357)the Natural Science Foundation of Chongqing(CSTB2023NSCQ-LMX0032)the ISN State Key Laboratory.
文摘Existing wireless networks are flooded with video data transmissions,and the demand for high-speed and low-latency video services continues to surge.This has brought with it challenges to networks in the form of congestion as well as the need for more resources and more dedicated caching schemes.Recently,Multi-access Edge Computing(MEC)-enabled heterogeneous networks,which leverage edge caches for proximity delivery,have emerged as a promising solution to all of these problems.Designing an effective edge caching scheme is critical to its success,however,in the face of limited resources.We propose a novel Knowledge Graph(KG)-based Dueling Deep Q-Network(KG-DDQN)for cooperative caching in MEC-enabled heterogeneous networks.The KGDDQN scheme leverages a KG to uncover video relations,providing valuable insights into user preferences for the caching scheme.Specifically,the KG guides the selection of related videos as caching candidates(i.e.,actions in the DDQN),thus providing a rich reference for implementing a personalized caching scheme while also improving the decision efficiency of the DDQN.Extensive simulation results validate the convergence effectiveness of the KG-DDQN,and it also outperforms baselines regarding cache hit rate and service delay.
基金supported by the National Natural Science Foundation of China(Nos.82173746 and U23A20530)Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism(Shanghai Municipal Education Commission)。
文摘Accurate prediction of drug-target interactions(DTIs)plays a pivotal role in drug discovery,facilitating optimization of lead compounds,drug repurposing and elucidation of drug side effects.However,traditional DTI prediction methods are often limited by incomplete biological data and insufficient representation of protein features.In this study,we proposed KG-CNNDTI,a novel knowledge graph-enhanced framework for DTI prediction,which integrates heterogeneous biological information to improve model generalizability and predictive performance.The proposed model utilized protein embeddings derived from a biomedical knowledge graph via the Node2Vec algorithm,which were further enriched with contextualized sequence representations obtained from ProteinBERT.For compound representation,multiple molecular fingerprint schemes alongside the Uni-Mol pre-trained model were evaluated.The fused representations served as inputs to both classical machine learning models and a convolutional neural network-based predictor.Experimental evaluations across benchmark datasets demonstrated that KG-CNNDTI achieved superior performance compared to state-of-the-art methods,particularly in terms of Precision,Recall,F1-Score and area under the precision-recall curve(AUPR).Ablation analysis highlighted the substantial contribution of knowledge graph-derived features.Moreover,KG-CNNDTI was employed for virtual screening of natural products against Alzheimer's disease,resulting in 40 candidate compounds.5 were supported by literature evidence,among which 3 were further validated in vitro assays.
基金supported by the China Fundamental Research Funds for the Central Universities(No.2662022XXYJ001,2662022JC004,2662023XXPY005)。
文摘As a new data management paradigm,knowledge graphs can integrate multiple data sources and achieve quick responses,reasoning and better predictions in drug discovery.Characterized by powerful contagion and a high rate of morbidity and mortality,porcine reproductive and respiratory syndrome(PRRS)is a common infectious disease in the global swine industry that causes economically great losses.Traditional Chinese medicine(TCM)has advantages in low adverse effects and a relatively affordable cost of application,and TCM is therefore conceived as a possibility to treat PRRS under the current circumstance that there is a lack of safe and effective approaches.Here,we constructed a knowledge graph containing common biomedical data from humans and Sus Scrofa as well as information from thousands of TCMs.Subsequently,we validated the effectiveness of the Sus Scrofa knowledge graph by the t-SNE algorithm and selected the optimal model(i.e.,transR)from six typical models,namely,transE,transR,DistMult,ComplEx,RESCAL and RotatE,according to five indicators,namely,MRR,MR,HITS@1,HITS@3 and HITS@10.Based on embedding vectors trained by the optimal model,anti-PRRSV TCMs were predicted by two paths,namely,VHC-Herb and VHPC-Herb,and potential anti-PRRSVTCMs were identified by retrieving the HERB database according to the phar-macological properties corresponding to symptoms of PRRS.Ultimately,Dan Shen's(Salvia miltiorrhiza Bunge)capacity to resist PRRSV infection was validated by a cell experiment in which the inhibition rate of PRRSV exceeded90%when the concentrations of Dan Shen extract were 0.004,0.008,0.016 and 0.032 mg/mL.In summary,this is the first report on the Sus Scrofa knowledge graph including TCM information,and our study reflects the important application values of deep learning on graphs in the swine industry as well as providing accessible TCM resources for PRRS.
文摘Knowledge graphs convey precise semantic information that can be effectively interpreted by neural networks,and generating descriptive text based on these graphs places significant emphasis on content consistency.However,knowledge graphs are inadequate for providing additional linguistic features such as paragraph structure and expressive modes,making it challenging to ensure content coherence in generating text that spans multiple sentences.This lack of coherence can further compromise the overall consistency of the content within a paragraph.In this work,we present the generation of scientific abstracts by leveraging knowledge graphs,with a focus on enhancing both content consistency and coherence.In particular,we construct the ACL Abstract Graph Dataset(ACL-AGD)which pairs knowledge graphs with text,incorporating sentence labels to guide text structure and diverse expressions.We then implement a Siamese network to complement and concretize the entities and relations based on paragraph structure by accomplishing two tasks:graph-to-text generation and entity alignment.Extensive experiments demonstrate that the logical paragraphs generated by our method exhibit entities with a uniform position distribution and appropriate frequency.In terms of content,our method accurately represents the information encoded in the knowledge graph,prevents the generation of irrelevant content,and achieves coherent and non-redundant adjacent sentences,even with a shared knowledge graph.
基金Sponsored by the National Key Research and Development Program from Ministry of Science and Technology of the People's Republic of China (Grant No.2020YFB1711403)。
文摘After the design of aerospace products is completed,a manufacturability assessment needs to be conducted based on 3D model's features in terms of modeling quality and process design,otherwise the cost of design changes will increase.Due to the poor structure and low reusability of product manufacturing feature information and assessment knowledge in the current aerospace product manufacturability assessment process,it is difficult to realize automated manufacturability assessment.To address these issues,a domain ontology model is established for aerospace product manufacturability assessment in this paper.On this basis,a structured representation method of manufacturability assessment knowledge and a knowledge graph data layer construction method are proposed.Based on the semantic information and association information expressed by the knowledge graph,a rule matching method based on subgraph matching is proposed to improve the precision and recall.Finally,applications and experiments based on the software platform verify the effectiveness of the proposed knowledge graph construction and rule matching method.
基金supported by the National Science and Technology Council(NSTC),Taiwan,under Grants Numbers 112-2622-E-029-009 and 112-2221-E-029-019.
文摘In the domain of knowledge graph embedding,conventional approaches typically transform entities and relations into continuous vector spaces.However,parameter efficiency becomes increasingly crucial when dealing with large-scale knowledge graphs that contain vast numbers of entities and relations.In particular,resource-intensive embeddings often lead to increased computational costs,and may limit scalability and adaptability in practical environ-ments,such as in low-resource settings or real-world applications.This paper explores an approach to knowledge graph representation learning that leverages small,reserved entities and relation sets for parameter-efficient embedding.We introduce a hierarchical attention network designed to refine and maximize the representational quality of embeddings by selectively focusing on these reserved sets,thereby reducing model complexity.Empirical assessments validate that our model achieves high performance on the benchmark dataset with fewer parameters and smaller embedding dimensions.The ablation studies further highlight the impact and contribution of each component in the proposed hierarchical attention structure.
基金Supported by Hunan Province Traditional Chinese Medicine Research Project(No.B2023043)Hunan Provincial Department of Education Scientific Research Project(No.22B0386)+1 种基金Research Project of Hunan Provincial Health Commission(No.20256982)Hunan University of Traditional Chinese Medicine Campus Level Research Fund Project(No.2022XJZKC004).
文摘AIM:To develop a traditional Chinese medicine(TCM)knowledge graph(KG)for diabetic retinopathy(DR)diagnosis and treatment by integrating literature and medical records,thereby enhancing TCM knowledge accessibility and providing innovative approaches for TCM inheritance and DR management.METHODS:First,a KG framework was established with a schema-layer design.Second,high-quality literature and electronic medical records served as data sources.Named entity recognition was performed using the ALBERT-BiLSTMCRF model,and semantic relationships were curated by domain experts.Third,knowledge fusion was mainly achieved through an alias library.Subsequently,the data layer was mapped to the schema layer to refine the KG,and knowledge was stored in Neo4j.Finally,exploratory work on intelligent question answering was conducted based on the constructed KG.RESULTS:In Neo4j,a KG for TCM diagnosis and treatment was constructed,incorporating 6 types of labels,5 types of relationships,5 types of attributes,822 nodes,and 1,318 relationship instances.This systematic KG supports logical reasoning and intelligent question answering.The question answering model achieved a precision of 95%,a recall of 95%,and a weighted F1-score of 95%.CONCLUSION:This study proposes a semi-automatic knowledge-mapping scheme to balance integration efficiency and accuracy.Clinical data-driven entity and relationship construction enables digital dialectical reasoning.Exploratory applications show the KG’s potential in intelligent question answering,providing new insights for TCM health management.
文摘The cost and strict input format requirements of GraphRAG make it less efficient for processing large documents. This paper proposes an alternative approach for constructing a knowledge graph (KG) from a PDF document with a focus on simplicity and cost-effectiveness. The process involves splitting the document into chunks, extracting concepts within each chunk using a large language model (LLM), and building relationships based on the proximity of concepts in the same chunk. Unlike traditional named entity recognition (NER), which identifies entities like “Shanghai”, the proposed method identifies concepts, such as “Convenient transportation in Shanghai” which is found to be more meaningful for KG construction. Each edge in the KG represents a relationship between concepts occurring in the same text chunk. The process is computationally inexpensive, leveraging locally set up tools like Mistral 7B openorca instruct and Ollama for model inference, ensuring the entire graph generation process is cost-free. A method of assigning weights to relationships, grouping similar pairs, and summarizing multiple relationships into a single edge with associated weight and relation details is introduced. Additionally, node degrees and communities are calculated for node sizing and coloring. This approach offers a scalable, cost-effective solution for generating meaningful knowledge graphs from large documents, achieving results comparable to GraphRAG while maintaining accessibility for personal machines.
文摘Data curation is vital for selecting effective demonstration examples in graph-to-text generation.However,evaluating the quality ofKnowledgeGraphs(KGs)remains challenging.Prior research exhibits a narrowfocus on structural statistics,such as the shortest path length,while the correctness of graphs in representing the associated text is rarely explored.To address this gap,we introduce a dual-perspective evaluation framework for KG-text data,based on the computation of structural adequacy and semantic alignment.Froma structural perspective,we propose the Weighted Incremental EdgeMethod(WIEM)to quantify graph completeness by leveraging agreement between relation models to predict possible edges between entities.WIEM targets to find increments from models on“unseen links”,whose presence is inversely proportional to the structural adequacy of the original KG in representing the text.From a semantic perspective,we evaluate how well a KG aligns with the text in capturing the intended meaning.To do so,we instruct a large language model to convert KGs into natural language andmeasure the similarity between generated and reference texts.Based on these computations,we apply a Top-K union method,integrating the structural and semantic modules,to rank and select high-quality KGs.We evaluate our framework against various approaches for selecting few-shot examples in graph-to-text generation.Experiments on theAssociation for Computational LinguisticsAbstract Graph Dataset(ACL-AGD)and Automatic Content Extraction 05(ACE05)dataset demonstrate the effectiveness of our approach in distinguishing KG-text data of different qualities,evidenced by the largest performance gap between top-and bottom-ranked examples.We also find that the top examples selected through our dual-perspective framework consistently yield better performance than those selected by traditional measures.These results highlight the importance of data curation in improving graph-to-text generation.
文摘With the continuous development of artificial intelligence and natural language processing technologies, traditional retrieval-augmented generation (RAG) techniques face numerous challenges in document answer precision and similarity measurement. This study, set against the backdrop of the shipping industry, combines top-down and bottom-up schema design strategies to achieve precise and flexible knowledge representation. The research adopts a semi-structured approach, innovatively constructing an adaptive schema generation mechanism based on reinforcement learning, which models the knowledge graph construction process as a Markov decision process. This method begins with general concepts, defining foundational industry concepts, and then delves into abstracting core concepts specific to the maritime domain through an adaptive pattern generation mechanism that dynamically adjusts the knowledge structure. Specifically, the study designs a four-layer knowledge construction framework, including the data layer, modeling layer, technology layer, and application layer. It draws on a mutual indexing strategy, integrating large language models and traditional information extraction techniques. By leveraging self-attention mechanisms and graph attention networks, it efficiently extracts semantic relationships. The introduction of logic-form-driven solvers and symbolic decomposition techniques for reasoning significantly enhances the model’s ability to understand complex semantic relationships. Additionally, the use of open information extraction and knowledge alignment techniques further improves the efficiency and accuracy of information retrieval. Experimental results demonstrate that the proposed method not only achieves significant performance improvements in knowledge graph retrieval within the shipping domain but also holds important theoretical innovation and practical application value.
基金supported by the National Key Research and Development Program of China(No.2023YFF0905400)the National Natural Science Foundation of China(No.U2341229).
文摘Knowledge graphs(KGs),which organize real-world knowledge in triples,often suffer from issues of incompleteness.To address this,multi-hop knowledge graph reasoning(KGR)methods have been proposed for interpretable knowledge graph completion.The primary approaches to KGR can be broadly classified into two categories:reinforcement learning(RL)-based methods and sequence-to-sequence(seq2seq)-based methods.While each method has its own distinct advantages,they also come with inherent limitations.To leverage the strengths of each method while addressing their weaknesses,we propose a cyclical training method that alternates for several loops between the seq2seq training phase and the policy-based RL training phase using a transformer architecture.Additionally,a multimodal data encoding(MDE)module is introduced to improve the representation of entities and relations in KGs.TheMDE module treats entities and relations as distinct modalities,processing each with a dedicated network specialized for its respective modality.It then combines the representations of entities and relations in a dynamic and fine-grained manner using a gating mechanism.The experimental results from the knowledge graph completion task highlight the effectiveness of the proposed framework.Across five benchmark datasets,our framework achieves an average improvement of 1.7%in the Hits@1 metric and a 0.8%average increase in the Mean Reciprocal Rank(MRR)compared to other strong baseline methods.Notably,the maximum improvement in Hits@1 exceeds 4%,further demonstrating the effectiveness of the proposed approach.