Due to the complex structural hierarchy,with deeply nested associative relations between entities such as equipment,specifications,and business processes,intelligent power grid engineering is challenging.Meanwhile,lim...Due to the complex structural hierarchy,with deeply nested associative relations between entities such as equipment,specifications,and business processes,intelligent power grid engineering is challenging.Meanwhile,limited by the fragmented data and loss of contextual information,the generated reports are prone to the problems such as content redundancy and omission of critical information,failing to meet the demands of efficient decision-making and accurate management in modern power systems.To address these issues,this paper proposes a knowledge graph(KG)-enhanced framework to automatically generate electric power engineering reports.In the KG construction phase,a feature-fused entity recognition model named BERT-BiLSTM-CRF is adopted to improve the accuracy of entity recognition in scenarios involving power engineering professional terminology,thereby solving the problem of ambiguous entity boundaries in traditional models;then a BERT-attention relation extraction model is proposed to enhance the completeness of extracting complex hierarchical and implicit relations in power grid data.In the report generation phase,an improved Transformer architecture is adopted to accurately transform structured knowledge into natural language reports that comply with engineering specifications,addressing the issue of semantic inconsistency caused by the loss of structural information in existing models.By validating with real-world projects,the results show that the proposed framework significantly outperforms existing baseline models in entity recognition,confirming its superiority and applicability in practical engineering.展开更多
In the context of power generation companies, vast amounts of specialized data and expert knowledge have been accumulated. However, challenges such as data silos and fragmented knowledge hinder the effective utilizati...In the context of power generation companies, vast amounts of specialized data and expert knowledge have been accumulated. However, challenges such as data silos and fragmented knowledge hinder the effective utilization of this information. This study proposes a novel framework for intelligent Question-and-Answer (Q&A) systems based on Retrieval-Augmented Generation (RAG) to address these issues. The system efficiently acquires domain-specific knowledge by leveraging external databases, including Relational Databases (RDBs) and graph databases, without additional fine-tuning for Large Language Models (LLMs). Crucially, the framework integrates a Dynamic Knowledge Base Updating Mechanism (DKBUM) and a Weighted Context-Aware Similarity (WCAS) method to enhance retrieval accuracy and mitigate inherent limitations of LLMs, such as hallucinations and lack of specialization. Additionally, the proposed DKBUM dynamically adjusts knowledge weights within the database, ensuring that the most recent and relevant information is utilized, while WCAS refines the alignment between queries and knowledge items by enhanced context understanding. Experimental validation demonstrates that the system can generate timely, accurate, and context-sensitive responses, making it a robust solution for managing complex business logic in specialized industries.展开更多
Objective:This study aimed to develop a Nursing Retrieval-Augmented Generation(NurRAG)system based on large language models(LLMs)and to evaluate its accuracy and clinical applicability in nursing question answering.Me...Objective:This study aimed to develop a Nursing Retrieval-Augmented Generation(NurRAG)system based on large language models(LLMs)and to evaluate its accuracy and clinical applicability in nursing question answering.Methods:A multidisciplinary team consisting of nursing experts,artificial intelligence researchers,and information engineers collaboratively designed the NurRAG framework following the principles of retrieval-augmented generation.The system included four functional modules:1)construction of a nursing knowledge base through document normalization,embedding,and vector indexing;2)nursing question filtering using a supervised classifier;3)semantic retrieval and re-ranking for evidence selection;and 4)evidence-conditioned language model generation to produce citation-based nursing answers.The system was securely deployed on hospital intranet servers using Docker containers.Performance evaluation was conducted with 1,000 expert-verified nursing question–answer pairs.Semantic fidelity was assessed using Recall Oriented Understudy for Gisting Evaluation–Longest Common Subsequence(ROUGE-L),and clinical correctness was measured using Accuracy.Results:The NurRAG system achieved significant improvements in both semantic fidelity and answer accuracy compared with conventional large language models.For ChatGLM2-6B,ROUGE-L increased from(30.73±1.48)%to(64.27±0.27)%,and accuracy increased from(49.08±0.92)%to(75.83±0.35)%.For LLaMA2-7B,ROUGE-L increased from(28.76±0.89)%to(60.33±0.21)%,and accuracy increased from(43.27±0.83)%to(73.29±0.33)%.All differences were statistically significant(P<0.001).A quantitative case analysis further demonstrated that NurRAG effectively reduced hallucinated outputs and generated evidence-based,guideline-concordant nursing responses.Conclusion:The NurRAG system integrates domain-specific retrieval with LLMs generation to provide accurate,reliable,and traceable evidence-based nursing answers.The findings demonstrate the system’s feasibility and potential to improve the accuracy of clinical knowledge access,support evidence-based nursing decision-making,and promote the safe application of artificial intelligence in nursing practice.展开更多
Amazon Web Services(AWS)Cloud Trail auditing service provides detailed records of operational and security events,enabling cloud administrators to monitor user activity and manage compliance.Although signaturebased th...Amazon Web Services(AWS)Cloud Trail auditing service provides detailed records of operational and security events,enabling cloud administrators to monitor user activity and manage compliance.Although signaturebased threat detection methods have been enhanced with machine learning and Large Language Models(LLMs),these approaches remain limited in addressing emerging threats.This study evaluates a two-step Retrieval Augmented Generation(RAG)approach using Gemini 2.5 Pro to enhance threat detection accuracy and contextual relevance.The RAG system integrates external cybersecurity knowledge sources including the MITRE ATT&CK framework,AWS Threat Technique Catalogue,and threat reports to overcome limitations of static pre-trained LLMs.We constructed an evaluation dataset of 200 unique CloudTrail events(122 malicious,78 benign)using the Stratus Red Team adversary emulation framework,covering 9 MITRE ATT&CK techniques across 8 tactics.Events were sampled from 1724 total events using stratified sampling.Ground truth labels were created through systematic expert annotation with 90%inter-annotator agreement.The RAG-enabled model achieved estimated 78%accuracy,85%precision,and 79%F1-score,representing 70.5%accuracy improvement and 76.4%F1-score improvement over baseline Gemini 2.5 Pro(46%accuracy,45%F1-score).Performance are based on evaluation results on 200-event dataset.Cost-latency analysis revealed processing time of 4.1 s and cost of$0.00376 per event,comparable to commercial SIEM solutions while providing superior MITRE ATT&CK attribution.The findings demonstrate that RAG substantially enhances context-aware threat detection,providing actionable insights for cloud security operations.展开更多
Knowledge graphs convey precise semantic information that can be effectively interpreted by neural networks,and generating descriptive text based on these graphs places significant emphasis on content consistency.Howe...Knowledge graphs convey precise semantic information that can be effectively interpreted by neural networks,and generating descriptive text based on these graphs places significant emphasis on content consistency.However,knowledge graphs are inadequate for providing additional linguistic features such as paragraph structure and expressive modes,making it challenging to ensure content coherence in generating text that spans multiple sentences.This lack of coherence can further compromise the overall consistency of the content within a paragraph.In this work,we present the generation of scientific abstracts by leveraging knowledge graphs,with a focus on enhancing both content consistency and coherence.In particular,we construct the ACL Abstract Graph Dataset(ACL-AGD)which pairs knowledge graphs with text,incorporating sentence labels to guide text structure and diverse expressions.We then implement a Siamese network to complement and concretize the entities and relations based on paragraph structure by accomplishing two tasks:graph-to-text generation and entity alignment.Extensive experiments demonstrate that the logical paragraphs generated by our method exhibit entities with a uniform position distribution and appropriate frequency.In terms of content,our method accurately represents the information encoded in the knowledge graph,prevents the generation of irrelevant content,and achieves coherent and non-redundant adjacent sentences,even with a shared knowledge graph.展开更多
Data curation is vital for selecting effective demonstration examples in graph-to-text generation.However,evaluating the quality ofKnowledgeGraphs(KGs)remains challenging.Prior research exhibits a narrowfocus on struc...Data curation is vital for selecting effective demonstration examples in graph-to-text generation.However,evaluating the quality ofKnowledgeGraphs(KGs)remains challenging.Prior research exhibits a narrowfocus on structural statistics,such as the shortest path length,while the correctness of graphs in representing the associated text is rarely explored.To address this gap,we introduce a dual-perspective evaluation framework for KG-text data,based on the computation of structural adequacy and semantic alignment.Froma structural perspective,we propose the Weighted Incremental EdgeMethod(WIEM)to quantify graph completeness by leveraging agreement between relation models to predict possible edges between entities.WIEM targets to find increments from models on“unseen links”,whose presence is inversely proportional to the structural adequacy of the original KG in representing the text.From a semantic perspective,we evaluate how well a KG aligns with the text in capturing the intended meaning.To do so,we instruct a large language model to convert KGs into natural language andmeasure the similarity between generated and reference texts.Based on these computations,we apply a Top-K union method,integrating the structural and semantic modules,to rank and select high-quality KGs.We evaluate our framework against various approaches for selecting few-shot examples in graph-to-text generation.Experiments on theAssociation for Computational LinguisticsAbstract Graph Dataset(ACL-AGD)and Automatic Content Extraction 05(ACE05)dataset demonstrate the effectiveness of our approach in distinguishing KG-text data of different qualities,evidenced by the largest performance gap between top-and bottom-ranked examples.We also find that the top examples selected through our dual-perspective framework consistently yield better performance than those selected by traditional measures.These results highlight the importance of data curation in improving graph-to-text generation.展开更多
As autonomous driving systems advance rapidly,there is a surge in demand for high-definition(HD)maps that provide accurate and dependable prior information on static environments around vehicles.As one of the main hig...As autonomous driving systems advance rapidly,there is a surge in demand for high-definition(HD)maps that provide accurate and dependable prior information on static environments around vehicles.As one of the main high-level elements in HD maps,the road lane centerline is essential for downstream tasks such as autonomous navigation and planning.Considering the complex topology and significant overlap concerns of road centerlines,previous studies have rarely examined the centerline HD map mapping problem.Recent learningbased pipelines take heuristic post-processing predictions to generate a structured centerline output without instance information.To ameliorate this situation,we propose a novel,end-to-end road centerlines vectorized graph generation pipeline,termed CenterLineFormer.CenterLineFormer takes a single onboard camera image as input and predicts a directed graph representing the lane-layer map in the bird’s-eye view(BEV).We propose a strategy for better view transformation that uses a cross-attention mechanism to generate a dense BEV feature map.With our pipeline,we can describe the connection relationship between different centerlines and generate structured lane graphs for downstream modules as planning and control.Qualitatively,our experiments emphasize that our pipeline achieves a superior performance against previous baselines on nuScenes dataset.We also show that CenterLineFormer can generate accurate centerline graph topologies on night driving and complex traffic intersection scenes.展开更多
This article examines the implementation of a virtual health assistant powered by Retrieval-Augmented Generation (RAG) and GPT-4, aimed at enhancing clinical support through personalized, real-time interactions with p...This article examines the implementation of a virtual health assistant powered by Retrieval-Augmented Generation (RAG) and GPT-4, aimed at enhancing clinical support through personalized, real-time interactions with patients. The system is hypothesized to improve healthcare accessibility, operational efficiency, and patient outcomes by automating routine tasks and delivering accurate health information. The assistant leverages natural language processing and real-time data retrieval models to respond to patient inquiries, schedule appointments, provide medication reminders, assist with symptom triage, and answer insurance-related questions. By integrating RAG-based virtual care, the system reduces the burden on healthcare specialists and helps mitigate healthcare disparities, particularly in rural areas where traditional care is limited. Although the initial scope of testing did not validate all potential benefits, the results demonstrated high patient satisfaction and strong response accuracy, both critical for systems of this nature. These findings underscore the transformative potential of AI-driven virtual health assistants in enhancing patient engagement, streamlining operational workflows, and improving healthcare accessibility, ultimately contributing to better outcomes and more cost-effective care delivery.展开更多
Generation-based linguistic steganography is a popular research area of information hiding.The text generative steganographic method based on conditional probability coding is the direction that researchers have recen...Generation-based linguistic steganography is a popular research area of information hiding.The text generative steganographic method based on conditional probability coding is the direction that researchers have recently paid attention to.However,in the course of our experiment,we found that the secret information hiding in the text tends to destroy the statistical distribution characteristics of the original text,which indicates that this method has the problem of the obvious reduction of text quality when the embedding rate increases,and that the topic of generated texts is uncontrollable,so there is still room for improvement in concealment.In this paper,we propose a topic-controlled steganography method which is guided by graph-to-text generation.The proposed model can automatically generate steganographic texts carrying secret messages from knowledge graphs,and the topic of the generated texts is controllable.We also provide a graph path coding method with corresponding detailed algorithms for graph-to-text generation.Different from traditional linguistic steganography methods,we encode the secret information during graph path coding rather than using conditional probability.We test our method in different aspects and compare it with other text generative steganographic methods.The experimental results show that the model proposed in this paper can effectively improve the quality of the generated text and significantly improve the concealment of steganographic text.展开更多
Limit equilibrium method (LEM) and strength reduction method (SRM) are the most widely used methods for slope stability analysis. However, it can be noted that they both have some limitations in practical applicat...Limit equilibrium method (LEM) and strength reduction method (SRM) are the most widely used methods for slope stability analysis. However, it can be noted that they both have some limitations in practical application. In the LEM, the constitutive model cannot be considered and many assumptions are needed between slices of soil/rock. The SRM requires iterative calculations and does not give the slip surface directly. A method for slope stability analysis based on the graph theory is recently developed to directly calculate the minimum safety factor and potential critical slip surface according to the stress results of numerical simulation. The method is based on current stress state and can overcome the disadvantages mentioned above in the two traditional methods. The influences of edge generation and mesh geometry on the position of slip surface and the safety factor of slope are studied, in which a new method for edge generation is proposed, and reasonable mesh size is suggested. The results of benchmark examples and a rock slope show good accuracy and efficiency of the presented method.展开更多
Aiming at the problems of incomplete characterization of text relations,poor guidance of potential representations,and low quality of model generation in the field of controllable long text generation,this paper propo...Aiming at the problems of incomplete characterization of text relations,poor guidance of potential representations,and low quality of model generation in the field of controllable long text generation,this paper proposes a new GSPT-CVAE model(Graph Structured Processing,Single Vector,and Potential Attention Com-puting Transformer-Based Conditioned Variational Autoencoder model).The model obtains a more comprehensive representation of textual relations by graph-structured processing of the input text,and at the same time obtains a single vector representation by weighted merging of the vector sequences after graph-structured processing to get an effective potential representation.In the process of potential representation guiding text generation,the model adopts a combination of traditional embedding and potential attention calculation to give full play to the guiding role of potential representation for generating text,to improve the controllability and effectiveness of text generation.The experimental results show that the model has excellent representation learning ability and can learn rich and useful textual relationship representations.The model also achieves satisfactory results in the effectiveness and controllability of text generation and can generate long texts that match the given constraints.The ROUGE-1 F1 score of this model is 0.243,the ROUGE-2 F1 score is 0.041,the ROUGE-L F1 score is 0.22,and the PPL-Word score is 34.303,which gives the GSPT-CVAE model a certain advantage over the baseline model.Meanwhile,this paper compares this model with the state-of-the-art generative models T5,GPT-4,Llama2,and so on,and the experimental results show that the GSPT-CVAE model has a certain competitiveness.展开更多
With the wider growth of web-based documents,the necessity of automatic document clustering and text summarization is increased.Here,document summarization that is extracting the essential task with appropriate inform...With the wider growth of web-based documents,the necessity of automatic document clustering and text summarization is increased.Here,document summarization that is extracting the essential task with appropriate information,removal of unnecessary data and providing the data in a cohesive and coherent manner is determined to be a most confronting task.In this research,a novel intelligent model for document clustering is designed with graph model and Fuzzy based association rule generation(gFAR).Initially,the graph model is used to map the relationship among the data(multi-source)followed by the establishment of document clustering with the generation of association rule using the fuzzy concept.This method shows benefit in redundancy elimination by mapping the relevant document using graph model and reduces the time consumption and improves the accuracy using the association rule generation with fuzzy.This framework is provided in an interpretable way for document clustering.It iteratively reduces the error rate during relationship mapping among the data(clusters)with the assistance of weighted document content.Also,this model represents the significance of data features with class discrimination.It is also helpful in measuring the significance of the features during the data clustering process.The simulation is done with MATLAB 2016b environment and evaluated with the empirical standards like Relative Risk Patterns(RRP),ROUGE score,and Discrimination Information Measure(DMI)respectively.Here,DailyMail and DUC 2004 dataset is used to extract the empirical results.The proposed gFAR model gives better trade-off while compared with various prevailing approaches.展开更多
In order to minimize wind turbine failures,fault diagnosis of wind turbines is becoming increasinglyimportant,deep learning methods excel at multivariate monitoring and data modeling,but they are often limited toEucli...In order to minimize wind turbine failures,fault diagnosis of wind turbines is becoming increasinglyimportant,deep learning methods excel at multivariate monitoring and data modeling,but they are often limited toEuclidean space and struggle to capture the complex coupling between wind turbine sensors.To addressthis problem,we convert SCADA data into graph data,where sensors act as nodes and their topologicalconnections act as edges,to represent these complex relationships more efficiently.Specifically,a wind turbineanomaly identification method based on deep graph convolutional neural network using similarity graphgeneration strategy(SGG-DGCN)is proposed.Firstly,a plurality of similarity graphs containing similarityinformation between nodes are generated by different distance metrics.Then,the generated similarity graphs arefused using the proposed similarity graph generation strategy.Finally,the fused similarity graphs are fed into theDGCN model for anomaly identification.To verify the effectiveness of the proposed SGG-DGCN model,we conducted a large number of experiments.The experimental results show that the proposed SGG-DGCNmodel has the highest accuracy compared with other models.In addition,the results of ablation experimentalso demonstrate that the proposed SGG strategy can effectively improve the accuracy of WT anomalyidentification.展开更多
Large language models(LLMs)excel in various natural language processing tasks and are increasingly applied in specialized fields like medicine.However,their deployment in the medical domain is challenged by limited do...Large language models(LLMs)excel in various natural language processing tasks and are increasingly applied in specialized fields like medicine.However,their deployment in the medical domain is challenged by limited domain-specific data and the tendency to generate inaccurate information,known as“hallucinations.”While domainspecific fine-tuning has improved open-source LLMs,they still underperform compared to proprietary models like ChatGPT and PaLM.To address this gap,retrieval-augmented generation(RAG)techniques have been explored to enhance LLMs by integrating external knowledge bases.Nevertheless,the success of RAG depends on the quality of retrieved documents,and its application within the medical field remains in the early stages.In this paper,we introduce the“Bailicai”framework as an exploratory approach to integrating RAG with LLMs in the medical field.The framework employs fine-tuning to improve the RAG process,where“falsely relevant”and“completely irrelevant”interference documents are intentionally included in the training data.This enables Bailicai to develop the ability to assess the quality of retrieved documents and selectively incorporate them.The framework is organized into four modules:(1)medical knowledge injection,(2)self-knowledge boundary identification,(3)directed acyclic graph task decomposition,and(4)retrieval-augmented generation.Through the synergy of these modules,Bailicai achieves superior performance on multiple medical benchmarks,outperforming existing large models in the medical domain,RAG-based methods,and proprietary models such as GPT-3.5.Furthermore,Bailicai effectively mitigates the hallucination problem common in LLMs applied to medical tasks and enhances the robustness of RAG when dealing with irrelevant or misleading documents,enabling more accurate information retrieval and integration.展开更多
Machine learning,particularly graph learning,is gaining increasing recognition for its transformative impact across various fields.One such promising application is in the realm of molecule design and discovery,notabl...Machine learning,particularly graph learning,is gaining increasing recognition for its transformative impact across various fields.One such promising application is in the realm of molecule design and discovery,notably within the pharmaceutical industry.Our survey offers a comprehensive overview of state-of-the-art methods in molecule design,particularly focusing on de novo drug design,which incorporates(deep)graph learning tech-niques.We categorize these methods into three distinct groups:i)all-at-once,ii)fragment-based,and iii)node-by-node.Additionally,we introduce some key public datasets and outline the commonly used evaluation metrics for both the generation and optimization of molecules.In the end,we discuss the existing challenges in this field and suggest potential directions for future research.展开更多
This paper presents the techniques of verification and Test Generation(TG) for sequential machines (Finite State Machines, FSMs) based on state traversing of State Transition Graph(STG). The problems of traversing, re...This paper presents the techniques of verification and Test Generation(TG) for sequential machines (Finite State Machines, FSMs) based on state traversing of State Transition Graph(STG). The problems of traversing, redundancy and transition fault model are identified. In order to achieve high fault coverage collapsing testing is proposed. Further, the heuristic knowledge for speeding up verification and TG are described.展开更多
As the development of web service (WS), applications based on web services (WS), which are convent and platform-independent, have become increasingly popular in recent years. However, how to identify, generate and com...As the development of web service (WS), applications based on web services (WS), which are convent and platform-independent, have become increasingly popular in recent years. However, how to identify, generate and compose services has become an open issue recently. This paper proposes a method based on program slicing to realize the generation and composition of web services. This paper introduces the method about how to generate a WSDL file and a SOAP message from source codes as well as the theory of function dependence graph (FDG). In addition, this paper gives the way to generate a proxy service for each service, which allows users to easily call a service. The results of experiments show that our generation and composition methods of WS are feasible and flexible.展开更多
This paper presents modeling tools based on Boolean satisfiability (SAT) to solve problems of test generation for combinational circuits. It exploits an added layer to maintain circuit-related information and value ju...This paper presents modeling tools based on Boolean satisfiability (SAT) to solve problems of test generation for combinational circuits. It exploits an added layer to maintain circuit-related information and value justification relations to a generic SAT algorithm. It dovetails binary decision graphs (BDD) and SAT techniques to improve the efficiency of automatic test pattern generation (ATPG). More specifically, it first exploits inexpensive reconvergent fanout analysis of circuit to gather information on the local signal correlation by using BDD learning, then uses the above learned information to restrict and focus the overall search space of SAT-based ATPG. Its learning technique is effective and lightweight. The experimental results demonstrate the effectiveness of the approach.展开更多
Predicting molecular properties is essential for advancing for advancing drug discovery and design. Recently, Graph Neural Networks (GNNs) have gained prominence due to their ability to capture the complex structural ...Predicting molecular properties is essential for advancing for advancing drug discovery and design. Recently, Graph Neural Networks (GNNs) have gained prominence due to their ability to capture the complex structural and relational information inherent in molecular graphs. Despite their effectiveness, the “black-box” nature of GNNs remains a significant obstacle to their widespread adoption in chemistry, as it hinders interpretability and trust. In this context, several explanation methods based on factual reasoning have emerged. These methods aim to interpret the predictions made by GNNs by analyzing the key features contributing to the prediction. However, these approaches fail to answer critical questions: “How to ensure that the structure-property mapping learned by GNNs is consistent with established domain knowledge”. In this paper, we propose MMGCF, a novel counterfactual explanation framework designed specifically for the prediction of GNN-based molecular properties. MMGCF constructs a hierarchical tree structure on molecular motifs, enabling the systematic generation of counterfactuals through motif perturbations. This framework identifies causally significant motifs and elucidates their impact on model predictions, offering insights into the relationship between structural modifications and predicted properties. Our method demonstrates its effectiveness through comprehensive quantitative and qualitative evaluations of four real-world molecular datasets.展开更多
As large language models(LLMs)continue to demonstrate their potential in handling complex tasks,their value in knowledge-intensive industrial scenarios is becoming increasingly evident.Fault diagnosis,a critical domai...As large language models(LLMs)continue to demonstrate their potential in handling complex tasks,their value in knowledge-intensive industrial scenarios is becoming increasingly evident.Fault diagnosis,a critical domain in the industrial sector,has long faced the dual challenges of managing vast amounts of experiential knowledge and improving human-machine collaboration efficiency.Traditional fault diagnosis systems,which are primarily based on expert systems,suffer from three major limitations:(1)ineffective organization of fault diagnosis knowledge,(2)lack of adaptability between static knowledge frameworks and dynamic engineering environments,and(3)difficulties in integrating expert knowledge with real-time data streams.These systemic shortcomings restrict the ability of conventional approaches to handle uncertainty.In this study,we proposed an intelligent computer numerical control(CNC)fault diagnosis system,integrating LLMs with knowledge graph(KG).First,we constructed a comprehensive KG that consolidated multi-source data for structured representation.Second,we designed a retrievalaugmented generation(RAG)framework leveraging the KG to support multi-turn interactive fault diagnosis while incorporating real-time engineering data into the decision-making process.Finally,we introduced a learning mechanism to facilitate dynamic knowledge updates.The experimental results demonstrated that our system significantly improved fault diagnosis accuracy,outperforming engineers with two years of professional experience on our constructed benchmark datasets.By integrating LLMs and KG,our framework surpassed the limitations of traditional expert systems rooted in symbolic reasoning,offering a novel approach to addressing the cognitive paradox of unstructured knowledge modeling and dynamic environment adaptation in industrial settings.展开更多
基金supported by State Grid Shanghai Economic Research Institute under Grant No.SGTYHT/23-JS-004.
文摘Due to the complex structural hierarchy,with deeply nested associative relations between entities such as equipment,specifications,and business processes,intelligent power grid engineering is challenging.Meanwhile,limited by the fragmented data and loss of contextual information,the generated reports are prone to the problems such as content redundancy and omission of critical information,failing to meet the demands of efficient decision-making and accurate management in modern power systems.To address these issues,this paper proposes a knowledge graph(KG)-enhanced framework to automatically generate electric power engineering reports.In the KG construction phase,a feature-fused entity recognition model named BERT-BiLSTM-CRF is adopted to improve the accuracy of entity recognition in scenarios involving power engineering professional terminology,thereby solving the problem of ambiguous entity boundaries in traditional models;then a BERT-attention relation extraction model is proposed to enhance the completeness of extracting complex hierarchical and implicit relations in power grid data.In the report generation phase,an improved Transformer architecture is adopted to accurately transform structured knowledge into natural language reports that comply with engineering specifications,addressing the issue of semantic inconsistency caused by the loss of structural information in existing models.By validating with real-world projects,the results show that the proposed framework significantly outperforms existing baseline models in entity recognition,confirming its superiority and applicability in practical engineering.
文摘In the context of power generation companies, vast amounts of specialized data and expert knowledge have been accumulated. However, challenges such as data silos and fragmented knowledge hinder the effective utilization of this information. This study proposes a novel framework for intelligent Question-and-Answer (Q&A) systems based on Retrieval-Augmented Generation (RAG) to address these issues. The system efficiently acquires domain-specific knowledge by leveraging external databases, including Relational Databases (RDBs) and graph databases, without additional fine-tuning for Large Language Models (LLMs). Crucially, the framework integrates a Dynamic Knowledge Base Updating Mechanism (DKBUM) and a Weighted Context-Aware Similarity (WCAS) method to enhance retrieval accuracy and mitigate inherent limitations of LLMs, such as hallucinations and lack of specialization. Additionally, the proposed DKBUM dynamically adjusts knowledge weights within the database, ensuring that the most recent and relevant information is utilized, while WCAS refines the alignment between queries and knowledge items by enhanced context understanding. Experimental validation demonstrates that the system can generate timely, accurate, and context-sensitive responses, making it a robust solution for managing complex business logic in specialized industries.
基金supported by the Young and Middle-aged Research Fund Project of Shenzhen People's Hospital(Grant No.SYHL2024-N0010)the Shenzhen Basic Research Program(General Program,Grant No.JCYJ20240813104409013)。
文摘Objective:This study aimed to develop a Nursing Retrieval-Augmented Generation(NurRAG)system based on large language models(LLMs)and to evaluate its accuracy and clinical applicability in nursing question answering.Methods:A multidisciplinary team consisting of nursing experts,artificial intelligence researchers,and information engineers collaboratively designed the NurRAG framework following the principles of retrieval-augmented generation.The system included four functional modules:1)construction of a nursing knowledge base through document normalization,embedding,and vector indexing;2)nursing question filtering using a supervised classifier;3)semantic retrieval and re-ranking for evidence selection;and 4)evidence-conditioned language model generation to produce citation-based nursing answers.The system was securely deployed on hospital intranet servers using Docker containers.Performance evaluation was conducted with 1,000 expert-verified nursing question–answer pairs.Semantic fidelity was assessed using Recall Oriented Understudy for Gisting Evaluation–Longest Common Subsequence(ROUGE-L),and clinical correctness was measured using Accuracy.Results:The NurRAG system achieved significant improvements in both semantic fidelity and answer accuracy compared with conventional large language models.For ChatGLM2-6B,ROUGE-L increased from(30.73±1.48)%to(64.27±0.27)%,and accuracy increased from(49.08±0.92)%to(75.83±0.35)%.For LLaMA2-7B,ROUGE-L increased from(28.76±0.89)%to(60.33±0.21)%,and accuracy increased from(43.27±0.83)%to(73.29±0.33)%.All differences were statistically significant(P<0.001).A quantitative case analysis further demonstrated that NurRAG effectively reduced hallucinated outputs and generated evidence-based,guideline-concordant nursing responses.Conclusion:The NurRAG system integrates domain-specific retrieval with LLMs generation to provide accurate,reliable,and traceable evidence-based nursing answers.The findings demonstrate the system’s feasibility and potential to improve the accuracy of clinical knowledge access,support evidence-based nursing decision-making,and promote the safe application of artificial intelligence in nursing practice.
文摘Amazon Web Services(AWS)Cloud Trail auditing service provides detailed records of operational and security events,enabling cloud administrators to monitor user activity and manage compliance.Although signaturebased threat detection methods have been enhanced with machine learning and Large Language Models(LLMs),these approaches remain limited in addressing emerging threats.This study evaluates a two-step Retrieval Augmented Generation(RAG)approach using Gemini 2.5 Pro to enhance threat detection accuracy and contextual relevance.The RAG system integrates external cybersecurity knowledge sources including the MITRE ATT&CK framework,AWS Threat Technique Catalogue,and threat reports to overcome limitations of static pre-trained LLMs.We constructed an evaluation dataset of 200 unique CloudTrail events(122 malicious,78 benign)using the Stratus Red Team adversary emulation framework,covering 9 MITRE ATT&CK techniques across 8 tactics.Events were sampled from 1724 total events using stratified sampling.Ground truth labels were created through systematic expert annotation with 90%inter-annotator agreement.The RAG-enabled model achieved estimated 78%accuracy,85%precision,and 79%F1-score,representing 70.5%accuracy improvement and 76.4%F1-score improvement over baseline Gemini 2.5 Pro(46%accuracy,45%F1-score).Performance are based on evaluation results on 200-event dataset.Cost-latency analysis revealed processing time of 4.1 s and cost of$0.00376 per event,comparable to commercial SIEM solutions while providing superior MITRE ATT&CK attribution.The findings demonstrate that RAG substantially enhances context-aware threat detection,providing actionable insights for cloud security operations.
文摘Knowledge graphs convey precise semantic information that can be effectively interpreted by neural networks,and generating descriptive text based on these graphs places significant emphasis on content consistency.However,knowledge graphs are inadequate for providing additional linguistic features such as paragraph structure and expressive modes,making it challenging to ensure content coherence in generating text that spans multiple sentences.This lack of coherence can further compromise the overall consistency of the content within a paragraph.In this work,we present the generation of scientific abstracts by leveraging knowledge graphs,with a focus on enhancing both content consistency and coherence.In particular,we construct the ACL Abstract Graph Dataset(ACL-AGD)which pairs knowledge graphs with text,incorporating sentence labels to guide text structure and diverse expressions.We then implement a Siamese network to complement and concretize the entities and relations based on paragraph structure by accomplishing two tasks:graph-to-text generation and entity alignment.Extensive experiments demonstrate that the logical paragraphs generated by our method exhibit entities with a uniform position distribution and appropriate frequency.In terms of content,our method accurately represents the information encoded in the knowledge graph,prevents the generation of irrelevant content,and achieves coherent and non-redundant adjacent sentences,even with a shared knowledge graph.
文摘Data curation is vital for selecting effective demonstration examples in graph-to-text generation.However,evaluating the quality ofKnowledgeGraphs(KGs)remains challenging.Prior research exhibits a narrowfocus on structural statistics,such as the shortest path length,while the correctness of graphs in representing the associated text is rarely explored.To address this gap,we introduce a dual-perspective evaluation framework for KG-text data,based on the computation of structural adequacy and semantic alignment.Froma structural perspective,we propose the Weighted Incremental EdgeMethod(WIEM)to quantify graph completeness by leveraging agreement between relation models to predict possible edges between entities.WIEM targets to find increments from models on“unseen links”,whose presence is inversely proportional to the structural adequacy of the original KG in representing the text.From a semantic perspective,we evaluate how well a KG aligns with the text in capturing the intended meaning.To do so,we instruct a large language model to convert KGs into natural language andmeasure the similarity between generated and reference texts.Based on these computations,we apply a Top-K union method,integrating the structural and semantic modules,to rank and select high-quality KGs.We evaluate our framework against various approaches for selecting few-shot examples in graph-to-text generation.Experiments on theAssociation for Computational LinguisticsAbstract Graph Dataset(ACL-AGD)and Automatic Content Extraction 05(ACE05)dataset demonstrate the effectiveness of our approach in distinguishing KG-text data of different qualities,evidenced by the largest performance gap between top-and bottom-ranked examples.We also find that the top examples selected through our dual-perspective framework consistently yield better performance than those selected by traditional measures.These results highlight the importance of data curation in improving graph-to-text generation.
基金the National Key Research and Development Program of China(No.2018YFB1305005)。
文摘As autonomous driving systems advance rapidly,there is a surge in demand for high-definition(HD)maps that provide accurate and dependable prior information on static environments around vehicles.As one of the main high-level elements in HD maps,the road lane centerline is essential for downstream tasks such as autonomous navigation and planning.Considering the complex topology and significant overlap concerns of road centerlines,previous studies have rarely examined the centerline HD map mapping problem.Recent learningbased pipelines take heuristic post-processing predictions to generate a structured centerline output without instance information.To ameliorate this situation,we propose a novel,end-to-end road centerlines vectorized graph generation pipeline,termed CenterLineFormer.CenterLineFormer takes a single onboard camera image as input and predicts a directed graph representing the lane-layer map in the bird’s-eye view(BEV).We propose a strategy for better view transformation that uses a cross-attention mechanism to generate a dense BEV feature map.With our pipeline,we can describe the connection relationship between different centerlines and generate structured lane graphs for downstream modules as planning and control.Qualitatively,our experiments emphasize that our pipeline achieves a superior performance against previous baselines on nuScenes dataset.We also show that CenterLineFormer can generate accurate centerline graph topologies on night driving and complex traffic intersection scenes.
文摘This article examines the implementation of a virtual health assistant powered by Retrieval-Augmented Generation (RAG) and GPT-4, aimed at enhancing clinical support through personalized, real-time interactions with patients. The system is hypothesized to improve healthcare accessibility, operational efficiency, and patient outcomes by automating routine tasks and delivering accurate health information. The assistant leverages natural language processing and real-time data retrieval models to respond to patient inquiries, schedule appointments, provide medication reminders, assist with symptom triage, and answer insurance-related questions. By integrating RAG-based virtual care, the system reduces the burden on healthcare specialists and helps mitigate healthcare disparities, particularly in rural areas where traditional care is limited. Although the initial scope of testing did not validate all potential benefits, the results demonstrated high patient satisfaction and strong response accuracy, both critical for systems of this nature. These findings underscore the transformative potential of AI-driven virtual health assistants in enhancing patient engagement, streamlining operational workflows, and improving healthcare accessibility, ultimately contributing to better outcomes and more cost-effective care delivery.
基金supported in part by the National Natural Science Foundation of China [62102136]the 2020 Opening Fund for Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering [2020SDSJ06]the Construction Fund for Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering [2019ZYYD007].
文摘Generation-based linguistic steganography is a popular research area of information hiding.The text generative steganographic method based on conditional probability coding is the direction that researchers have recently paid attention to.However,in the course of our experiment,we found that the secret information hiding in the text tends to destroy the statistical distribution characteristics of the original text,which indicates that this method has the problem of the obvious reduction of text quality when the embedding rate increases,and that the topic of generated texts is uncontrollable,so there is still room for improvement in concealment.In this paper,we propose a topic-controlled steganography method which is guided by graph-to-text generation.The proposed model can automatically generate steganographic texts carrying secret messages from knowledge graphs,and the topic of the generated texts is controllable.We also provide a graph path coding method with corresponding detailed algorithms for graph-to-text generation.Different from traditional linguistic steganography methods,we encode the secret information during graph path coding rather than using conditional probability.We test our method in different aspects and compare it with other text generative steganographic methods.The experimental results show that the model proposed in this paper can effectively improve the quality of the generated text and significantly improve the concealment of steganographic text.
基金support of the National Natural Science Foundation of China (Grant No. 41130751)China Scholarship Council, Research Program for Western China Communication (Grant No. 2011ZB04)China Central University Funding
文摘Limit equilibrium method (LEM) and strength reduction method (SRM) are the most widely used methods for slope stability analysis. However, it can be noted that they both have some limitations in practical application. In the LEM, the constitutive model cannot be considered and many assumptions are needed between slices of soil/rock. The SRM requires iterative calculations and does not give the slip surface directly. A method for slope stability analysis based on the graph theory is recently developed to directly calculate the minimum safety factor and potential critical slip surface according to the stress results of numerical simulation. The method is based on current stress state and can overcome the disadvantages mentioned above in the two traditional methods. The influences of edge generation and mesh geometry on the position of slip surface and the safety factor of slope are studied, in which a new method for edge generation is proposed, and reasonable mesh size is suggested. The results of benchmark examples and a rock slope show good accuracy and efficiency of the presented method.
文摘Aiming at the problems of incomplete characterization of text relations,poor guidance of potential representations,and low quality of model generation in the field of controllable long text generation,this paper proposes a new GSPT-CVAE model(Graph Structured Processing,Single Vector,and Potential Attention Com-puting Transformer-Based Conditioned Variational Autoencoder model).The model obtains a more comprehensive representation of textual relations by graph-structured processing of the input text,and at the same time obtains a single vector representation by weighted merging of the vector sequences after graph-structured processing to get an effective potential representation.In the process of potential representation guiding text generation,the model adopts a combination of traditional embedding and potential attention calculation to give full play to the guiding role of potential representation for generating text,to improve the controllability and effectiveness of text generation.The experimental results show that the model has excellent representation learning ability and can learn rich and useful textual relationship representations.The model also achieves satisfactory results in the effectiveness and controllability of text generation and can generate long texts that match the given constraints.The ROUGE-1 F1 score of this model is 0.243,the ROUGE-2 F1 score is 0.041,the ROUGE-L F1 score is 0.22,and the PPL-Word score is 34.303,which gives the GSPT-CVAE model a certain advantage over the baseline model.Meanwhile,this paper compares this model with the state-of-the-art generative models T5,GPT-4,Llama2,and so on,and the experimental results show that the GSPT-CVAE model has a certain competitiveness.
文摘With the wider growth of web-based documents,the necessity of automatic document clustering and text summarization is increased.Here,document summarization that is extracting the essential task with appropriate information,removal of unnecessary data and providing the data in a cohesive and coherent manner is determined to be a most confronting task.In this research,a novel intelligent model for document clustering is designed with graph model and Fuzzy based association rule generation(gFAR).Initially,the graph model is used to map the relationship among the data(multi-source)followed by the establishment of document clustering with the generation of association rule using the fuzzy concept.This method shows benefit in redundancy elimination by mapping the relevant document using graph model and reduces the time consumption and improves the accuracy using the association rule generation with fuzzy.This framework is provided in an interpretable way for document clustering.It iteratively reduces the error rate during relationship mapping among the data(clusters)with the assistance of weighted document content.Also,this model represents the significance of data features with class discrimination.It is also helpful in measuring the significance of the features during the data clustering process.The simulation is done with MATLAB 2016b environment and evaluated with the empirical standards like Relative Risk Patterns(RRP),ROUGE score,and Discrimination Information Measure(DMI)respectively.Here,DailyMail and DUC 2004 dataset is used to extract the empirical results.The proposed gFAR model gives better trade-off while compared with various prevailing approaches.
基金supported by National Natural Science Foundation of China(Nos.U52305124,U62201399)the Zhejiang Natural Science Foundation of China(Nos.LQ23E050002)+4 种基金the Basic Scientific Research Project of Wenzhou City(Nos.G2022008,G2023028)the General Scientific Research Project of Educational Department of Zhejiang Province(Nos.Y202249008,Y202249041)China Postdoctoral Science Foundation(Nos.2023M740988)Zhejiang Provincial Postdoctoral Science Foundation(Nos.ZJ2023122)the Master’s Innovation Foundation of Wenzhou University(Nos.3162024004106).
文摘In order to minimize wind turbine failures,fault diagnosis of wind turbines is becoming increasinglyimportant,deep learning methods excel at multivariate monitoring and data modeling,but they are often limited toEuclidean space and struggle to capture the complex coupling between wind turbine sensors.To addressthis problem,we convert SCADA data into graph data,where sensors act as nodes and their topologicalconnections act as edges,to represent these complex relationships more efficiently.Specifically,a wind turbineanomaly identification method based on deep graph convolutional neural network using similarity graphgeneration strategy(SGG-DGCN)is proposed.Firstly,a plurality of similarity graphs containing similarityinformation between nodes are generated by different distance metrics.Then,the generated similarity graphs arefused using the proposed similarity graph generation strategy.Finally,the fused similarity graphs are fed into theDGCN model for anomaly identification.To verify the effectiveness of the proposed SGG-DGCN model,we conducted a large number of experiments.The experimental results show that the proposed SGG-DGCNmodel has the highest accuracy compared with other models.In addition,the results of ablation experimentalso demonstrate that the proposed SGG strategy can effectively improve the accuracy of WT anomalyidentification.
基金supported by the State Key Program of National Natural Science of China(No.61533018)the National Natural Science Foundation of China(No.61402220)+3 种基金the Philosophy and Social Science Foundation of Hunan Province(No.16YBA323)the Natural Science Foundation of Hunan Province(Nos.2020JJ4525,2022JJ30495,and 2025JJ50384)the Scientific Research Fund of Hunan Provincial Education Department(Nos.18B279,19A439,and 22A0316)the CCF-Zhipu AI Large Model Fund.
文摘Large language models(LLMs)excel in various natural language processing tasks and are increasingly applied in specialized fields like medicine.However,their deployment in the medical domain is challenged by limited domain-specific data and the tendency to generate inaccurate information,known as“hallucinations.”While domainspecific fine-tuning has improved open-source LLMs,they still underperform compared to proprietary models like ChatGPT and PaLM.To address this gap,retrieval-augmented generation(RAG)techniques have been explored to enhance LLMs by integrating external knowledge bases.Nevertheless,the success of RAG depends on the quality of retrieved documents,and its application within the medical field remains in the early stages.In this paper,we introduce the“Bailicai”framework as an exploratory approach to integrating RAG with LLMs in the medical field.The framework employs fine-tuning to improve the RAG process,where“falsely relevant”and“completely irrelevant”interference documents are intentionally included in the training data.This enables Bailicai to develop the ability to assess the quality of retrieved documents and selectively incorporate them.The framework is organized into four modules:(1)medical knowledge injection,(2)self-knowledge boundary identification,(3)directed acyclic graph task decomposition,and(4)retrieval-augmented generation.Through the synergy of these modules,Bailicai achieves superior performance on multiple medical benchmarks,outperforming existing large models in the medical domain,RAG-based methods,and proprietary models such as GPT-3.5.Furthermore,Bailicai effectively mitigates the hallucination problem common in LLMs applied to medical tasks and enhances the robustness of RAG when dealing with irrelevant or misleading documents,enabling more accurate information retrieval and integration.
基金supported by National Key Research and De-velopment Program of China(2020AAA0107600)the National Nat-ural Science Foundation of China(62222607).
文摘Machine learning,particularly graph learning,is gaining increasing recognition for its transformative impact across various fields.One such promising application is in the realm of molecule design and discovery,notably within the pharmaceutical industry.Our survey offers a comprehensive overview of state-of-the-art methods in molecule design,particularly focusing on de novo drug design,which incorporates(deep)graph learning tech-niques.We categorize these methods into three distinct groups:i)all-at-once,ii)fragment-based,and iii)node-by-node.Additionally,we introduce some key public datasets and outline the commonly used evaluation metrics for both the generation and optimization of molecules.In the end,we discuss the existing challenges in this field and suggest potential directions for future research.
基金Supported by the National Natural science Foundation of China(No.69576038)
文摘This paper presents the techniques of verification and Test Generation(TG) for sequential machines (Finite State Machines, FSMs) based on state traversing of State Transition Graph(STG). The problems of traversing, redundancy and transition fault model are identified. In order to achieve high fault coverage collapsing testing is proposed. Further, the heuristic knowledge for speeding up verification and TG are described.
文摘As the development of web service (WS), applications based on web services (WS), which are convent and platform-independent, have become increasingly popular in recent years. However, how to identify, generate and compose services has become an open issue recently. This paper proposes a method based on program slicing to realize the generation and composition of web services. This paper introduces the method about how to generate a WSDL file and a SOAP message from source codes as well as the theory of function dependence graph (FDG). In addition, this paper gives the way to generate a proxy service for each service, which allows users to easily call a service. The results of experiments show that our generation and composition methods of WS are feasible and flexible.
基金Supported by Joint Research Fund for Overseas Chinese Young Scholars (No. 50128503) and National Natural Science Foundation of China (No. 50390060)
文摘This paper presents modeling tools based on Boolean satisfiability (SAT) to solve problems of test generation for combinational circuits. It exploits an added layer to maintain circuit-related information and value justification relations to a generic SAT algorithm. It dovetails binary decision graphs (BDD) and SAT techniques to improve the efficiency of automatic test pattern generation (ATPG). More specifically, it first exploits inexpensive reconvergent fanout analysis of circuit to gather information on the local signal correlation by using BDD learning, then uses the above learned information to restrict and focus the overall search space of SAT-based ATPG. Its learning technique is effective and lightweight. The experimental results demonstrate the effectiveness of the approach.
文摘Predicting molecular properties is essential for advancing for advancing drug discovery and design. Recently, Graph Neural Networks (GNNs) have gained prominence due to their ability to capture the complex structural and relational information inherent in molecular graphs. Despite their effectiveness, the “black-box” nature of GNNs remains a significant obstacle to their widespread adoption in chemistry, as it hinders interpretability and trust. In this context, several explanation methods based on factual reasoning have emerged. These methods aim to interpret the predictions made by GNNs by analyzing the key features contributing to the prediction. However, these approaches fail to answer critical questions: “How to ensure that the structure-property mapping learned by GNNs is consistent with established domain knowledge”. In this paper, we propose MMGCF, a novel counterfactual explanation framework designed specifically for the prediction of GNN-based molecular properties. MMGCF constructs a hierarchical tree structure on molecular motifs, enabling the systematic generation of counterfactuals through motif perturbations. This framework identifies causally significant motifs and elucidates their impact on model predictions, offering insights into the relationship between structural modifications and predicted properties. Our method demonstrates its effectiveness through comprehensive quantitative and qualitative evaluations of four real-world molecular datasets.
基金funded by the National Natural Science Foundation of China(72104224,L2424237,71974107,L2224059,L2124002,and 91646102)the Beijing Natural Science Foundation(9232015)+4 种基金the Beijing Social Science Foundation(24GLC058)the Construction Project of China Knowledge Center for Engineering Sciences and Technology(CKCEST-2023-1-7)the MOE(Ministry of Education in China)Project of Humanities and Social Sciences(16JDGC011)the Tsinghua University Initiative Scientific Research Program(2019Z02CAU)the Tsinghua University Project of Volvo-Supported Green Economy and Sustainable Development(20183910020)。
文摘As large language models(LLMs)continue to demonstrate their potential in handling complex tasks,their value in knowledge-intensive industrial scenarios is becoming increasingly evident.Fault diagnosis,a critical domain in the industrial sector,has long faced the dual challenges of managing vast amounts of experiential knowledge and improving human-machine collaboration efficiency.Traditional fault diagnosis systems,which are primarily based on expert systems,suffer from three major limitations:(1)ineffective organization of fault diagnosis knowledge,(2)lack of adaptability between static knowledge frameworks and dynamic engineering environments,and(3)difficulties in integrating expert knowledge with real-time data streams.These systemic shortcomings restrict the ability of conventional approaches to handle uncertainty.In this study,we proposed an intelligent computer numerical control(CNC)fault diagnosis system,integrating LLMs with knowledge graph(KG).First,we constructed a comprehensive KG that consolidated multi-source data for structured representation.Second,we designed a retrievalaugmented generation(RAG)framework leveraging the KG to support multi-turn interactive fault diagnosis while incorporating real-time engineering data into the decision-making process.Finally,we introduced a learning mechanism to facilitate dynamic knowledge updates.The experimental results demonstrated that our system significantly improved fault diagnosis accuracy,outperforming engineers with two years of professional experience on our constructed benchmark datasets.By integrating LLMs and KG,our framework surpassed the limitations of traditional expert systems rooted in symbolic reasoning,offering a novel approach to addressing the cognitive paradox of unstructured knowledge modeling and dynamic environment adaptation in industrial settings.