期刊文献+
共找到16篇文章
< 1 >
每页显示 20 50 100
Retrieval-Augmented Large Language Model for AWS Cloud Threat Detection and Modelling:Cloudtrail Mitre ATT&CK Mapping
1
作者 Goodness Adediran Kenny Awuson-David Yussuf Ahmed 《Computers, Materials & Continua》 2026年第5期2307-2331,共25页
Amazon Web Services(AWS)Cloud Trail auditing service provides detailed records of operational and security events,enabling cloud administrators to monitor user activity and manage compliance.Although signaturebased th... Amazon Web Services(AWS)Cloud Trail auditing service provides detailed records of operational and security events,enabling cloud administrators to monitor user activity and manage compliance.Although signaturebased threat detection methods have been enhanced with machine learning and Large Language Models(LLMs),these approaches remain limited in addressing emerging threats.This study evaluates a two-step Retrieval Augmented Generation(RAG)approach using Gemini 2.5 Pro to enhance threat detection accuracy and contextual relevance.The RAG system integrates external cybersecurity knowledge sources including the MITRE ATT&CK framework,AWS Threat Technique Catalogue,and threat reports to overcome limitations of static pre-trained LLMs.We constructed an evaluation dataset of 200 unique CloudTrail events(122 malicious,78 benign)using the Stratus Red Team adversary emulation framework,covering 9 MITRE ATT&CK techniques across 8 tactics.Events were sampled from 1724 total events using stratified sampling.Ground truth labels were created through systematic expert annotation with 90%inter-annotator agreement.The RAG-enabled model achieved estimated 78%accuracy,85%precision,and 79%F1-score,representing 70.5%accuracy improvement and 76.4%F1-score improvement over baseline Gemini 2.5 Pro(46%accuracy,45%F1-score).Performance are based on evaluation results on 200-event dataset.Cost-latency analysis revealed processing time of 4.1 s and cost of$0.00376 per event,comparable to commercial SIEM solutions while providing superior MITRE ATT&CK attribution.The findings demonstrate that RAG substantially enhances context-aware threat detection,providing actionable insights for cloud security operations. 展开更多
关键词 retrieval-augmented generation Amazon web services LLM cloud service provider threat detection threat modelling MITRE ATT&CK RAG-enabled model RAG-enabled LLM system
在线阅读 下载PDF
A Dynamic Knowledge Base Updating Mechanism-Based Retrieval-Augmented Generation Framework for Intelligent Question-and-Answer Systems 被引量:1
2
作者 Yu Li 《Journal of Computer and Communications》 2025年第1期41-58,共18页
In the context of power generation companies, vast amounts of specialized data and expert knowledge have been accumulated. However, challenges such as data silos and fragmented knowledge hinder the effective utilizati... In the context of power generation companies, vast amounts of specialized data and expert knowledge have been accumulated. However, challenges such as data silos and fragmented knowledge hinder the effective utilization of this information. This study proposes a novel framework for intelligent Question-and-Answer (Q&A) systems based on Retrieval-Augmented Generation (RAG) to address these issues. The system efficiently acquires domain-specific knowledge by leveraging external databases, including Relational Databases (RDBs) and graph databases, without additional fine-tuning for Large Language Models (LLMs). Crucially, the framework integrates a Dynamic Knowledge Base Updating Mechanism (DKBUM) and a Weighted Context-Aware Similarity (WCAS) method to enhance retrieval accuracy and mitigate inherent limitations of LLMs, such as hallucinations and lack of specialization. Additionally, the proposed DKBUM dynamically adjusts knowledge weights within the database, ensuring that the most recent and relevant information is utilized, while WCAS refines the alignment between queries and knowledge items by enhanced context understanding. Experimental validation demonstrates that the system can generate timely, accurate, and context-sensitive responses, making it a robust solution for managing complex business logic in specialized industries. 展开更多
关键词 retrieval-augmented Generation Question-and-Answer Large Language Models Dynamic Knowledge Base Updating Mechanism Weighted Context-Aware Similarity
在线阅读 下载PDF
Nursing Retrieval-Augmented Generation:Retrieval augmented generation for nursing question answering with large language models
3
作者 Liping Xiong Qiqiao Zeng +1 位作者 Weixiang Luo Ronghui Liu 《International Journal of Nursing Sciences》 2025年第6期516-523,I0001,共9页
Objective:This study aimed to develop a Nursing Retrieval-Augmented Generation(NurRAG)system based on large language models(LLMs)and to evaluate its accuracy and clinical applicability in nursing question answering.Me... Objective:This study aimed to develop a Nursing Retrieval-Augmented Generation(NurRAG)system based on large language models(LLMs)and to evaluate its accuracy and clinical applicability in nursing question answering.Methods:A multidisciplinary team consisting of nursing experts,artificial intelligence researchers,and information engineers collaboratively designed the NurRAG framework following the principles of retrieval-augmented generation.The system included four functional modules:1)construction of a nursing knowledge base through document normalization,embedding,and vector indexing;2)nursing question filtering using a supervised classifier;3)semantic retrieval and re-ranking for evidence selection;and 4)evidence-conditioned language model generation to produce citation-based nursing answers.The system was securely deployed on hospital intranet servers using Docker containers.Performance evaluation was conducted with 1,000 expert-verified nursing question–answer pairs.Semantic fidelity was assessed using Recall Oriented Understudy for Gisting Evaluation–Longest Common Subsequence(ROUGE-L),and clinical correctness was measured using Accuracy.Results:The NurRAG system achieved significant improvements in both semantic fidelity and answer accuracy compared with conventional large language models.For ChatGLM2-6B,ROUGE-L increased from(30.73±1.48)%to(64.27±0.27)%,and accuracy increased from(49.08±0.92)%to(75.83±0.35)%.For LLaMA2-7B,ROUGE-L increased from(28.76±0.89)%to(60.33±0.21)%,and accuracy increased from(43.27±0.83)%to(73.29±0.33)%.All differences were statistically significant(P<0.001).A quantitative case analysis further demonstrated that NurRAG effectively reduced hallucinated outputs and generated evidence-based,guideline-concordant nursing responses.Conclusion:The NurRAG system integrates domain-specific retrieval with LLMs generation to provide accurate,reliable,and traceable evidence-based nursing answers.The findings demonstrate the system’s feasibility and potential to improve the accuracy of clinical knowledge access,support evidence-based nursing decision-making,and promote the safe application of artificial intelligence in nursing practice. 展开更多
关键词 Evidence-based nursing Large language models Nursing knowledge base Question-answering system retrieval-augmented generation
在线阅读 下载PDF
Improving Clinical Support through Retrieval-Augmented Generation Powered Virtual Health Assistants
4
作者 Biju Baburajan Anandavally 《Journal of Computer and Communications》 2024年第11期86-94,共9页
This article examines the implementation of a virtual health assistant powered by Retrieval-Augmented Generation (RAG) and GPT-4, aimed at enhancing clinical support through personalized, real-time interactions with p... This article examines the implementation of a virtual health assistant powered by Retrieval-Augmented Generation (RAG) and GPT-4, aimed at enhancing clinical support through personalized, real-time interactions with patients. The system is hypothesized to improve healthcare accessibility, operational efficiency, and patient outcomes by automating routine tasks and delivering accurate health information. The assistant leverages natural language processing and real-time data retrieval models to respond to patient inquiries, schedule appointments, provide medication reminders, assist with symptom triage, and answer insurance-related questions. By integrating RAG-based virtual care, the system reduces the burden on healthcare specialists and helps mitigate healthcare disparities, particularly in rural areas where traditional care is limited. Although the initial scope of testing did not validate all potential benefits, the results demonstrated high patient satisfaction and strong response accuracy, both critical for systems of this nature. These findings underscore the transformative potential of AI-driven virtual health assistants in enhancing patient engagement, streamlining operational workflows, and improving healthcare accessibility, ultimately contributing to better outcomes and more cost-effective care delivery. 展开更多
关键词 retrieval-augmented Generation (RAG) GPT-4 Healthcare Assistants Artificial Intelligence
暂未订购
Bailicai:A Domain-Optimized Retrieval-Augmented Generation Framework for Medical Applications
5
作者 Long Cui Yongbin Liu +4 位作者 Chunping Ouyang Ying Yu Jiangtao Zhang Yaping Wan Fei Yang 《Big Data Mining and Analytics》 2026年第2期376-392,共17页
Large language models(LLMs)excel in various natural language processing tasks and are increasingly applied in specialized fields like medicine.However,their deployment in the medical domain is challenged by limited do... Large language models(LLMs)excel in various natural language processing tasks and are increasingly applied in specialized fields like medicine.However,their deployment in the medical domain is challenged by limited domain-specific data and the tendency to generate inaccurate information,known as“hallucinations.”While domainspecific fine-tuning has improved open-source LLMs,they still underperform compared to proprietary models like ChatGPT and PaLM.To address this gap,retrieval-augmented generation(RAG)techniques have been explored to enhance LLMs by integrating external knowledge bases.Nevertheless,the success of RAG depends on the quality of retrieved documents,and its application within the medical field remains in the early stages.In this paper,we introduce the“Bailicai”framework as an exploratory approach to integrating RAG with LLMs in the medical field.The framework employs fine-tuning to improve the RAG process,where“falsely relevant”and“completely irrelevant”interference documents are intentionally included in the training data.This enables Bailicai to develop the ability to assess the quality of retrieved documents and selectively incorporate them.The framework is organized into four modules:(1)medical knowledge injection,(2)self-knowledge boundary identification,(3)directed acyclic graph task decomposition,and(4)retrieval-augmented generation.Through the synergy of these modules,Bailicai achieves superior performance on multiple medical benchmarks,outperforming existing large models in the medical domain,RAG-based methods,and proprietary models such as GPT-3.5.Furthermore,Bailicai effectively mitigates the hallucination problem common in LLMs applied to medical tasks and enhances the robustness of RAG when dealing with irrelevant or misleading documents,enabling more accurate information retrieval and integration. 展开更多
关键词 large language models(LLMs) retrieval-augmented generation(RAG) domain-specific language models
原文传递
Transforming Healthcare with State-of-the-Art Medical-LLMs:A Comprehensive Evaluation of Current Advances Using Benchmarking Framework
6
作者 Himadri Nath Saha Dipanwita Chakraborty Bhattacharya +5 位作者 Sancharita Dutta Arnab Bera Srutorshi Basuray Satyasaran Changdar Saptarshi Banerjee Jon Turdiev 《Computers, Materials & Continua》 2026年第2期234-289,共56页
The emergence of Medical Large Language Models has significantly transformed healthcare.Medical Large Language Models(Med-LLMs)serve as transformative tools that enhance clinical practice through applications in decis... The emergence of Medical Large Language Models has significantly transformed healthcare.Medical Large Language Models(Med-LLMs)serve as transformative tools that enhance clinical practice through applications in decision support,documentation,and diagnostics.This evaluation examines the performance of leading Med-LLMs,including GPT-4Med,Med-PaLM,MEDITRON,PubMedGPT,and MedAlpaca,across diverse medical datasets.It provides graphical comparisons of their effectiveness in distinct healthcare domains.The study introduces a domain-specific categorization system that aligns these models with optimal applications in clinical decision-making,documentation,drug discovery,research,patient interaction,and public health.The paper addresses deployment challenges of Medical-LLMs,emphasizing trustworthiness and explainability as essential requirements for healthcare AI.It presents current evaluation techniques that improve model transparency in high-stakes medical contexts and analyzes regulatory frameworks using benchmarking datasets such asMedQA,MedMCQA,PubMedQA,and MIMIC.By identifying ongoing challenges in biasmitigation,reliability,and ethical compliance,thiswork serves as a resource for selecting appropriate Med-LLMs and outlines future directions in the field.This analysis offers a roadmap for developing Med-LLMs that balance technological innovation with the trust and transparency required for clinical integration,a perspective often overlooked in existing literature. 展开更多
关键词 Medical large language models(Med-LLM) AI in healthcare natural language processing(NLP)in medicine fine-tuning medical LLMs retrieval-augmented generation(RAG)in medicine multi-modal learning in healthcare explainability and transparency in medical AI FDA regulations for AI in medicine evaluation and benchmarking of medical large language models
在线阅读 下载PDF
Key Technologies and Application Prospects of Railway Natural Language Large Model
7
作者 SHI Tianyun LI Xinqin +4 位作者 DAI Mingrui SHI Weifeng LI Guohua DU Wenran SHEN Meiying(Translated) 《Chinese Railways》 2024年第2期11-20,共10页
The emergence of artificial intelligence natural language large models has brought new dawn for the in-depth empowerment of the industry.Research on key technologies and applications of railway natural language large ... The emergence of artificial intelligence natural language large models has brought new dawn for the in-depth empowerment of the industry.Research on key technologies and applications of railway natural language large model is of great significance to promoting and coordinating the development of railway artificial intelligence.This paper puts forward the application scenarios of railway natural language large model according to the application requirements of railway artificial intelligence;designs the overall architecture of the railway natural language large model by relying on the railway artificial intelligence platform,studies the key technologies of the natural language large model,builds a railway industry large model oriented to intelligent question-answering,and verifies the model with actual data;finally,this paper prospects for the development and application of railway natural language large model from the aspects of railway traffic organization,railway operation safety and passenger service. 展开更多
关键词 intelligent HSR artificial intelligence railway natural language large model application scenarios large model architecture large model fine-tuning retrieval-augmented generation railway knowledge question-answering
原文传递
Numerical Constraint-Aware Dense Retrieval with Two-Phase Contrastive Learning
8
作者 Meng Wang Yisong Wang Feifan Wu 《Big Data Mining and Analytics》 2026年第2期341-359,共19页
Retrieval-Augmented Generation(RAG)enhances Large Language Models(LLMs)by integrating external knowledge,leading to significant improvements in both factual accuracy and task performance.However,existing dense retriev... Retrieval-Augmented Generation(RAG)enhances Large Language Models(LLMs)by integrating external knowledge,leading to significant improvements in both factual accuracy and task performance.However,existing dense retrievers face considerable challenges when handling numerical constraints,particularly in queries requiring precise filtering conditions.To systematically explore these issues,we introduce Numerical Constraint Question(NumConQ),a comprehensive multi-domain benchmark dataset that contains more than 6500 queries covering healthcare,finance,education,sports,and movies.Empirical analysis reveals that state-of-the-art dense retrievers achieve only 16.3%accuracy in numerical constraint satisfaction,significantly underperforming relative to their semantic matching capabilities.To address these limitations,we propose Numerical Constraint-aware Retriever(NC-Retriever),which features:(1)a two-phase contrastive learning framework that combines in-batch negative samplings with progressively introduced hard negatives,and(2)a hybrid numerical representation scheme for consistent tokenization.Extensive experiments show that NC-Retriever achieves a relative improvement of 65.84%in recall@10 and a 78.28%increase in precision@10 compared to current state-of-the-art methods.The code and benchmark dataset are available at https://github.com/Tongji-KGLLM/NumConQ. 展开更多
关键词 dense retriever benchmark dataset contrastive learning numerical constraint query retrieval-augmented Generation(RAG)
原文传递
Semantic Feedback-Based RAG for Radiology Report Generation
9
作者 Xing Jia Yun Xiong +3 位作者 Songwen Pei Yumeng Zhang Cairong Yan Zhijun Fang 《Big Data Mining and Analytics》 2026年第2期393-406,共14页
Radiology report generation aims to produce textual reports automatically based on input images,a critical process that aids in accurate diagnoses and lightens the workload of radiologists.Following recent advances in... Radiology report generation aims to produce textual reports automatically based on input images,a critical process that aids in accurate diagnoses and lightens the workload of radiologists.Following recent advances in Large Language Models(LLMs),several Retrieval-Augmented Generation(RAG)based report generation models have been proposed.Despite the continuously improved performance,these report generation models often suffer from two main limitations,i.e.,interference of irrelevant information,and lack of alignment between the input image and the resulting generated report.In this study,we propose the Semantic feedback based RAG Radiology report generation model,namely RAGSemRad.RAGSemRad comprises two key components:the fine-grained semantic retrieval module and the semantic assessment module.The fine-grained semantic retrieval module is designed to retrieve adequate and relevant prompt information,while ignoring irrelevant interference.This is achieved by clustering the data at the semantic level and leveraging the domain knowledge within a large pre-trained visual-language model,thus alleviating the issues of hallucination and databias.Further,the semantic assessment module enhances the performance of the upper bound by enhancing the alignment between the input image and the resulting generated report,utilizing supervision signals derived from paired image-label data.Experimental evaluations are conducted on two benchmarks,IU X-Ray and MIMIC-CXR,to assess the performance of RAGSemRad.The results demonstrate RAGSemRad exhibits competitive performance compared to the state-of-the-art methods,showcasing its potential to advance automatic radiology report generation. 展开更多
关键词 radiology report generation retrieval-augmented Generation(RAG) semantic assessment
原文传递
RADDI:A Retrieval Augmented Framework for Drug-Drug Interaction Prediction
10
作者 Hengbo Zhang Yingying Wang +1 位作者 Xinyu Gao Yun Xiong 《Big Data Mining and Analytics》 2026年第2期360-375,共16页
Drug-drug interactions(DDIs)can significantly impact drug efficacy and safety,potentially leading to severe adverse effects.Existing works on DDI event prediction have typically relied on labels of specific events for... Drug-drug interactions(DDIs)can significantly impact drug efficacy and safety,potentially leading to severe adverse effects.Existing works on DDI event prediction have typically relied on labels of specific events for supervision,neglecting the importance of mining textual descriptions.This limits their ability to address two challenges:(1)the lack of observable data for new drugs,hindering meaningful feature extraction;(2)the highly imbalanced event distribution,which causes models to overfit to common categories and struggle with rare interactions.To address these challenges,we propose RADDI,a retrieval-augmented DDI prediction method.This approach improves prediction accuracy and adapts to the dynamic nature of new drug discovery.Specifically,to solve the first challenge,RADDI introduces a collaborative prediction strategy that integrates general knowledge transfer with specialized knowledge retrieval.This approach uses pretrained language models to generate embeddings for drug descriptions at a coarse level,enabling broad interaction classification.At a finer level,RADDI incorporates retrieval augmentation,using drug pair descriptions as retrieval keys and interaction categories as retrieval targets,thereby enhancing semantic understanding.For the second challenge,we design a class-aware probability distribution strategy to mitigate class imbalance.By leveraging the prior distribution of event categories,RADDI adjusts the retrieval sample weights and normalizes category probabilities,thereby improving the prediction accuracy for rare-class interactions while reducing over-reliance on high-frequency categories.Experiments on benchmark datasets demonstrate that RADDI excels in zero-shot DDI prediction scenarios,effectively balancing generalization to new drugs and maintaining high accuracy across various interaction categories. 展开更多
关键词 drug-drug interaction(DDI)prediction retrieval-augmented pretrained language model
原文传递
From Simulated Interaction to Intelligent Generation:A Generative Multi‑Agent Virtual Patient Framework for Nursing Education with RAG‑Based Safety Guardrails
11
作者 Haiying SUI 《Medical Research》 2025年第4期14-26,共13页
In response to the growing mismatch between nursing workforce demand and constrained clinical teaching resources,this study proposes a Generative Multi-Agent Virtual Patient(GMVP)framework for high-fidelity nursing ed... In response to the growing mismatch between nursing workforce demand and constrained clinical teaching resources,this study proposes a Generative Multi-Agent Virtual Patient(GMVP)framework for high-fidelity nursing education.Grounded in situated learning,cognitive apprenticeship,and distributed cognition,GMVP employs a triadic agent architecture comprising narrative,physiological,and evaluator agents to reconstruct social interaction,physiological coherence,and formative assessment in virtual clinical environments.A design-based research methodology guides iterative development and classroom deployment aligned with outcome-based education standards.To address hallucination risks in high-stakes medical content,the system integrates retrieval-augmented generation with modular validation and physiological consistency checks.The framework supports scalable case generation,learning analytics,and equitable access to complex clinical training scenarios. 展开更多
关键词 Nursing education Generative multi-agent virtual patient Triadic agent architecture retrieval-augmented generation Design-based research
在线阅读 下载PDF
Large language models driven reliable clinical decision-making: Framework and application
12
作者 Jinhua Du Xiaoying Li +3 位作者 Yuyang Liu Tingyu Lv Hui Liu Hao Yin 《Informatics and Health》 2025年第2期170-178,共9页
With the proliferation of data and increased complexity of clinical decision-making in the medical field,powerful computational tools are needed to assist physicians in making precise and reliable decisions.While the ... With the proliferation of data and increased complexity of clinical decision-making in the medical field,powerful computational tools are needed to assist physicians in making precise and reliable decisions.While the Large Language Models(LLMs)with billions of parameters in model size have obtained a series of achievements in a broad range of biomedical and healthcare applications,the issues in terms of reliability and stability are still needed to be addressed.To this end,we propose the framework of MedRad,a system that combines LLMs,knowledge engineering,Chain of Thought(CoT)reasoning,Retrieval-Augmented Generation(RAG)techniques,and intelligent agents(Agents)to improve clinical decision-making reliability.Based on fine-tuned LLMs and existing studies in the biomedical and healthcare domain,we further concentrate on how these techniques could be utilized to achieve highly reliable clinical decision-making in scenarios with varying complexity,such as medical knowledge QA and clinical diagnosis recommendations.Experimental results demonstrate that MedRad has the ability to provide high-quality decision paths in the above scenarios,and the potential to extend to more biomedical and healthcare scenarios through its loosely coupled design. 展开更多
关键词 Large language models Clinical decision-making Chain of Thought retrieval-augmented Generation Intelligent agents
在线阅读 下载PDF
Exploring Gen-AI applications in building research and industry:A review
13
作者 Hanlong Wan Jian Zhang +2 位作者 Yan Chen Weili Xu Fan Feng 《Building Simulation》 2025年第6期1251-1273,共23页
This paper investigates the transformative potential of Generative AI(Gen-AI)technologies,particularly large language models,within the building industry.By leveraging these advanced AI tools,the study explores their ... This paper investigates the transformative potential of Generative AI(Gen-AI)technologies,particularly large language models,within the building industry.By leveraging these advanced AI tools,the study explores their application across key areas such as automated compliance checking and building design assistance.The research highlights how Gen-AI can automate labor-intensive processes,significantly improving efficiency and reducing costs in building practices.The paper first discusses the two widely applied fundamental models—Transformer and Diffusion model—and summarizes current pathways for accessing Gen-AI models and the most common techniques for customizing them.It then explores applications for text generation,such as compliance checking,control support,data mining,and building simulation input file editing.Additionally,it examines image generation,including direct generation through diffusion models and indirect generation through language model-supported template creation based on existing Computer-Aided Design or other design tools with rendering.The paper concludes with a comprehensive analysis of the current capabilities of Gen-AI in the building industry,outlining future directions for research and development,with the goal of paving the way for smarter,more effective,and responsive design,construction,and operational practices. 展开更多
关键词 generative AI large language models(LLMs) building industry automatic compliance checking(ACC) building design retrieval-augmented generation(RAG)
原文传递
Federated reasoning LLMs:a survey
14
作者 Shuyue WEI Yongxin TONG +5 位作者 Zimu ZHOU Yi XU Jingkai GAO Tongyu WEI Tianran HE Weifeng LV 《Frontiers of Computer Science》 2025年第12期139-161,共23页
Reasoning has long been regarded as a distinctive hallmark of human cognition,and recent advances in the artificial intelligence community have increasingly focused on the reasoning large language models(rLLMs)However... Reasoning has long been regarded as a distinctive hallmark of human cognition,and recent advances in the artificial intelligence community have increasingly focused on the reasoning large language models(rLLMs)However,due to strict privacy regulations,the domain-specific reasoning knowledge is often distributed across multiple data owners,limiting the rLLM's ability to fully leverage such valuable resources.In this context,federated learning(FL)has gained increasing attention in both the academia and industry as a promising privacy-preserving paradigm for addressing the challenges in the data-efficient training of rLLMs.In this paper,we conduct a comprehensive survey on federated rLLMs and propose a novel taxonomy based on training signals,including training signals derived from raw data,learned representations,and preference feedback.For each category,we emphasize the emerging trends according to how to use FL to enhance reasoning capabilities of rLLMs considering the model effectiveness,communication cost and privacy preservation.Finally,we envision future research directions and challenges based on insights from existing studies. 展开更多
关键词 federated learning reasoning LLMs fine tuning retrieval-augmented generation
原文传递
A reliable knowledge processing framework for combustion science using foundation models 被引量:1
15
作者 Vansh Sharma Venkat Raman 《Energy and AI》 EI 2024年第2期396-416,共21页
This research explores the integration of large language models (LLMs) into scientific data assimilation, focusing on combustion science as a case study. Leveraging foundational models integrated with Retrieval-Augmen... This research explores the integration of large language models (LLMs) into scientific data assimilation, focusing on combustion science as a case study. Leveraging foundational models integrated with Retrieval-Augmented Generation (RAG) framework, the study introduces an approach to process diverse combustion research data, spanning experimental studies, simulations, and literature. The multifaceted nature of combustion research emphasizes the critical role of knowledge processing in navigating and extracting valuable information from a vast and diverse pool of sources. The developed approach minimizes computational and economic expenses while optimizing data privacy and accuracy. It incorporates prompt engineering and offline open-source LLMs, offering user autonomy in selecting base models. The study provides a thorough examination of text segmentation strategies, conducts comparative studies between LLMs, and explores various optimized prompts to demonstrate the effectiveness of the framework. By incorporating an external vector database, the framework outperforms a conventional LLM in generating accurate responses and constructing robust arguments. Additionally, the study delves into the investigation of optimized prompt templates for the purpose of efficient extraction of scientific literature. Furthermore, we present a targeted scaling study to quantify the algorithmic performance of the framework as the number of prompt tokens increases. The research addresses concerns related to hallucinations and false research articles by introducing a custom workflow developed with a detection algorithm to filter out inaccuracies. Despite identified areas for improvement, the framework consistently delivers accurate domain-specific responses with minimal human oversight. The prompt-agnostic approach introduced holds promise for future improvements. The study underscores the significance of integrating LLMs and knowledge processing techniques in scientific research, providing a foundation for advancements in data assimilation and utilization. 展开更多
关键词 Large language models(LLM) Foundation models COMBUSTION Knowledge processing retrieval-augmented generation(RAG)
在线阅读 下载PDF
PDGPT:A large language model for acquiring phase diagram information in magnesium alloys 被引量:2
16
作者 Zini Yan Hongyu Liang +5 位作者 Jingya Wang Hongbin Zhang Alisson Kwiatkowski da Silva Shiyu Liang Ziyuan Rao Xiaoqin Zeng 《Materials Genome Engineering Advances》 2024年第4期77-87,共11页
Magnesium alloys,known for their lightweight advantages,are increasingly in demand across a range of applications,from aerospace to the automotive industry.With rising requirements for strength and corrosion resistanc... Magnesium alloys,known for their lightweight advantages,are increasingly in demand across a range of applications,from aerospace to the automotive industry.With rising requirements for strength and corrosion resistance,the development of new magnesium alloy systems has become critical.Phase diagrams play a crucial role in guiding the magnesium alloy design by providing key insights into phase stability,composition,and temperature ranges,enabling the optimization of alloy properties and processing conditions.However,accessing and interpreting phase diagram data with thermodynamic calculation software can be complex and time-consuming,often requiring intricate calculations and iterative refinement based on thermodynamic models.To address this challenge,we introduce PDGPT,a ChatGPT-based large language model designed to streamline the acquisition of magnesium alloys Phase Diagram information with high efficiency and accuracy.Enhanced by promptengineering,supervised fine-tuning and retrieval-augmented generation,PDGPT leverages the predictive and reasoning capabilities of large language models along with computational phase diagram data.By combining large language models with traditional phase diagram research tools,PDGPT not only improves the accessibility of critical phase diagram information but also sets the stage for future advancements in applying large language models to materials science. 展开更多
关键词 large language model phase diagram prediction prompt-engineering retrieval-augmented generation supervised fine-tuning
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部