Objective To improve the accuracy and professionalism of question-answering(QA)model in traditional Chinese medicine(TCM)lung cancer by integrating large language models with structured knowledge graphs using the know...Objective To improve the accuracy and professionalism of question-answering(QA)model in traditional Chinese medicine(TCM)lung cancer by integrating large language models with structured knowledge graphs using the knowledge graph(KG)to text-enhanced retrievalaugmented generation(KG2TRAG)method.Methods The TCM lung cancer model(TCMLCM)was constructed by fine-tuning Chat-GLM2-6B on the specialized datasets Tianchi TCM,HuangDi,and ShenNong-TCM-Dataset,as well as a TCM lung cancer KG.The KG2TRAG method was applied to enhance the knowledge retrieval,which can convert KG triples into natural language text via ChatGPT-aided linearization,leveraging large language models(LLMs)for context-aware reasoning.For a comprehensive comparison,MedicalGPT,HuatuoGPT,and BenTsao were selected as the baseline models.Performance was evaluated using bilingual evaluation understudy(BLEU),recall-oriented understudy for gisting evaluation(ROUGE),accuracy,and the domain-specific TCM-LCEval metrics,with validation from TCM oncology experts assessing answer accuracy,professionalism,and usability.Results The TCMLCM model achieved the optimal performance across all metrics,including a BLEU score of 32.15%,ROUGE-L of 59.08%,and an accuracy rate of 79.68%.Notably,in the TCM-LCEval assessment specific to the field of TCM,its performance was 3%−12%higher than that of the baseline model.Expert evaluations highlighted superior performance in accuracy and professionalism.Conclusion TCMLCM can provide an innovative solution for TCM lung cancer QA,demonstrating the feasibility of integrating structured KGs with LLMs.This work advances intelligent TCM healthcare tools and lays a foundation for future AI-driven applications in traditional medicine.展开更多
Question-answering(QA)models find answers to a given question.The necessity of automatically finding answers is increasing because it is very important and challenging from the large-scale QA data sets.In this paper,w...Question-answering(QA)models find answers to a given question.The necessity of automatically finding answers is increasing because it is very important and challenging from the large-scale QA data sets.In this paper,we deal with the QA pair matching approach in QA models,which finds the most relevant question and its recommended answer for a given question.Existing studies for the approach performed on the entire dataset or datasets within a category that the question writer manually specifies.In contrast,we aim to automatically find the category to which the question belongs by employing the text classification model and to find the answer corresponding to the question within the category.Due to the text classification model,we can effectively reduce the search space for finding the answers to a given question.Therefore,the proposed model improves the accuracy of the QA matching model and significantly reduces the model inference time.Furthermore,to improve the performance of finding similar sentences in each category,we present an ensemble embedding model for sentences,improving the performance compared to the individual embedding models.Using real-world QA data sets,we evaluate the performance of the proposed QA matching model.As a result,the accuracy of our final ensemble embedding model based on the text classification model is 81.18%,which outperforms the existing models by 9.81%∼14.16%point.Moreover,in terms of the model inference speed,our model is faster than the existing models by 2.61∼5.07 times due to the effective reduction of search spaces by the text classification model.展开更多
Traditional Chinese text retrieval systems return a ranked list of documentsin response to a user''s request. While a ranked list of documents may be an appropriate response forthe user, frequently it is not. ...Traditional Chinese text retrieval systems return a ranked list of documentsin response to a user''s request. While a ranked list of documents may be an appropriate response forthe user, frequently it is not. Usually it would be better for the system to provide the answeritself instead of requiring the user to search for the answer in a set of documents. Since Chinesetext retrieval has just been developed lately, and due to various specific characteristics ofChinese language, the approaches to its retrieval are quite different from those studies andresearches proposed to deal with Western language. Thus, an architecture that augments existingsearch engines is developed to support Chinese natural language question answering. In this paper anew approach to building Chinese question-answering system is described, which is thegeneral-purpose, fully-automated Chinese quest ion-answering system available on the web. In theapproach, we attempt to represent Chinese text by its characteristics, and try to convert theChinese text into ERE (E: entity, R: relation) relation data lists, and then to answer the questionthrough ERE relation model. The system performs quite well giving the simplicity of the techniquesbeing utilized. Experimental results show that question-answering accuracy can be greatly improvedby analyzing more and more matching ERE relation data lists. Simple ERE relation data extractiontechniques work well in our system making it efficient to use with many backend retrieval engines.展开更多
Question-answering systems provide short answers with the use of available information.The implementation mechanism for a question answering system is presented in this paper and is based on concepts and statistics.Th...Question-answering systems provide short answers with the use of available information.The implementation mechanism for a question answering system is presented in this paper and is based on concepts and statistics.The system determines the question and focuses on the answer types,making different conceptual expansions for different questions.It applies the latent semantic indexing(LSI)method to retrieve relevant passages.It uses matching algorithms to find a match between questions and sentences stored in a database.It also extracts answers from a frequently asked questions(FAQ)database by finding matching or similar sentences.The answering ability of the system has been improved with the use of LSI and FAQ.The question-answering system introduced in Chinese universities is a developed and proven system capable of precise results.展开更多
Objective:This study aimed to develop a Nursing Retrieval-Augmented Generation(NurRAG)system based on large language models(LLMs)and to evaluate its accuracy and clinical applicability in nursing question answering.Me...Objective:This study aimed to develop a Nursing Retrieval-Augmented Generation(NurRAG)system based on large language models(LLMs)and to evaluate its accuracy and clinical applicability in nursing question answering.Methods:A multidisciplinary team consisting of nursing experts,artificial intelligence researchers,and information engineers collaboratively designed the NurRAG framework following the principles of retrieval-augmented generation.The system included four functional modules:1)construction of a nursing knowledge base through document normalization,embedding,and vector indexing;2)nursing question filtering using a supervised classifier;3)semantic retrieval and re-ranking for evidence selection;and 4)evidence-conditioned language model generation to produce citation-based nursing answers.The system was securely deployed on hospital intranet servers using Docker containers.Performance evaluation was conducted with 1,000 expert-verified nursing question–answer pairs.Semantic fidelity was assessed using Recall Oriented Understudy for Gisting Evaluation–Longest Common Subsequence(ROUGE-L),and clinical correctness was measured using Accuracy.Results:The NurRAG system achieved significant improvements in both semantic fidelity and answer accuracy compared with conventional large language models.For ChatGLM2-6B,ROUGE-L increased from(30.73±1.48)%to(64.27±0.27)%,and accuracy increased from(49.08±0.92)%to(75.83±0.35)%.For LLaMA2-7B,ROUGE-L increased from(28.76±0.89)%to(60.33±0.21)%,and accuracy increased from(43.27±0.83)%to(73.29±0.33)%.All differences were statistically significant(P<0.001).A quantitative case analysis further demonstrated that NurRAG effectively reduced hallucinated outputs and generated evidence-based,guideline-concordant nursing responses.Conclusion:The NurRAG system integrates domain-specific retrieval with LLMs generation to provide accurate,reliable,and traceable evidence-based nursing answers.The findings demonstrate the system’s feasibility and potential to improve the accuracy of clinical knowledge access,support evidence-based nursing decision-making,and promote the safe application of artificial intelligence in nursing practice.展开更多
The emergence of artificial intelligence natural language large models has brought new dawn for the in-depth empowerment of the industry.Research on key technologies and applications of railway natural language large ...The emergence of artificial intelligence natural language large models has brought new dawn for the in-depth empowerment of the industry.Research on key technologies and applications of railway natural language large model is of great significance to promoting and coordinating the development of railway artificial intelligence.This paper puts forward the application scenarios of railway natural language large model according to the application requirements of railway artificial intelligence;designs the overall architecture of the railway natural language large model by relying on the railway artificial intelligence platform,studies the key technologies of the natural language large model,builds a railway industry large model oriented to intelligent question-answering,and verifies the model with actual data;finally,this paper prospects for the development and application of railway natural language large model from the aspects of railway traffic organization,railway operation safety and passenger service.展开更多
Automatic Chinese text summarization for dialogue style is a relatively new research area. In this paper, Latent Semantic Analysis (LSA) is first used to extract semantic knowledge from a given document, all questio...Automatic Chinese text summarization for dialogue style is a relatively new research area. In this paper, Latent Semantic Analysis (LSA) is first used to extract semantic knowledge from a given document, all question paragraphs are identified, an automatic text segmentation approach analogous to Text'filing is exploited to improve the precision of correlating question paragraphs and answer paragraphs, and finally some "important" sentences are extracted from the generic content and the question-answer pairs to generate a complete summary. Experimental results showed that our approach is highly efficient and improves significantly the coherence of the summary while not compromising informativeness.展开更多
This paper systematically explores the application potential of large language models(LLMs)in the field of agricultural intelligence,focusing on key technologies and practical pathways.The study focuses on the adaptat...This paper systematically explores the application potential of large language models(LLMs)in the field of agricultural intelligence,focusing on key technologies and practical pathways.The study focuses on the adaptation of LLMs to agricultural knowledge,starting with foundational concepts such as architecture design,pre-training strategies,and fine-tuning techniques,to build a technical framework for knowledge integration in the agricultural domain.Using tools such as vector databases and knowledge graphs,the study enables the structured development of professional agricultural knowledge bases.Additionally,by combining multimodal learning and intelligent question-answering(Q&A)system design,it validates the application value of LLMs in agricultural knowledge services.Addressing core challenges in domain adaptation,including knowledge acquisition and integration,logical reasoning,multimodal data processing,agent collaboration,and dynamic knowledge updating,the paper proposes targeted solutions.The study further explores the innovative applications of LLMs in scenarios such as precision crop management and market dynamics analysis,providing theoretical support and technical pathways for the development of agricultural intelligence.Through the technological innovation of large language models and their deep integration with the agricultural sector,the intelligence level of agricultural production,decision-making,and services can be effectively enhanced.展开更多
Inherent heterogeneity and distribution of knowledge strongly prevent knowledge from sharing and reusing among different agents and software entities, and a formal ontology has been viewed as a promising means to tack...Inherent heterogeneity and distribution of knowledge strongly prevent knowledge from sharing and reusing among different agents and software entities, and a formal ontology has been viewed as a promising means to tackle this problem. In this paper, a domain-specific formal ontology of archaeology is presented. The ontology mainly consists of three parts: archaeological categories, their relationships and axioms. The ontology not only captures the semantics of archaeological knowledge, but also provides archaeology with an explicit and formal specification of a shared conceptualization, thus making archaeological knowledge shareable and reusable across humans and machines in a structured fashion. Further, we propose a method to verify ontology. correctness based on the individuals of categories. As applications of the ontology,we have developed an ontology-driven approach to knowledge acquisition from archaeological text and a question answering system for archaeological knowledge.展开更多
Using a conversation analysis approach, the present study investigates the teacher-led question-answer sequences of one successful seminar course (Short Stories and Western Culture) within the curriculum reform for ...Using a conversation analysis approach, the present study investigates the teacher-led question-answer sequences of one successful seminar course (Short Stories and Western Culture) within the curriculum reform for English majors in Beijing Foreign Studies University, aiming at uncovering an effective way of integrating disciplinary learning with language skills development. The result of the analysis shows that the teacher of the course, who perceives student participation as an indispensable ingredient of his class, often uses more divergent, opinion-seeking questions to initiate discussion and uses four types of expansion question on his turns to promote student participation, namely, probing questions (PQ), clue-giving questions (CQ), elaboration requests (ER), and agreement checks (AC). The study also generates an I-R-(E)-F-FC [Initiation-Response-(Evaluation)-FoUow up-Further Contribution] model, in which the teacher attempts to promote student participation and guide the construction of students' understanding.展开更多
基金Postgraduate Research&Practice Innovation Program of Jiangsu Province(KYCX24_2145).
文摘Objective To improve the accuracy and professionalism of question-answering(QA)model in traditional Chinese medicine(TCM)lung cancer by integrating large language models with structured knowledge graphs using the knowledge graph(KG)to text-enhanced retrievalaugmented generation(KG2TRAG)method.Methods The TCM lung cancer model(TCMLCM)was constructed by fine-tuning Chat-GLM2-6B on the specialized datasets Tianchi TCM,HuangDi,and ShenNong-TCM-Dataset,as well as a TCM lung cancer KG.The KG2TRAG method was applied to enhance the knowledge retrieval,which can convert KG triples into natural language text via ChatGPT-aided linearization,leveraging large language models(LLMs)for context-aware reasoning.For a comprehensive comparison,MedicalGPT,HuatuoGPT,and BenTsao were selected as the baseline models.Performance was evaluated using bilingual evaluation understudy(BLEU),recall-oriented understudy for gisting evaluation(ROUGE),accuracy,and the domain-specific TCM-LCEval metrics,with validation from TCM oncology experts assessing answer accuracy,professionalism,and usability.Results The TCMLCM model achieved the optimal performance across all metrics,including a BLEU score of 32.15%,ROUGE-L of 59.08%,and an accuracy rate of 79.68%.Notably,in the TCM-LCEval assessment specific to the field of TCM,its performance was 3%−12%higher than that of the baseline model.Expert evaluations highlighted superior performance in accuracy and professionalism.Conclusion TCMLCM can provide an innovative solution for TCM lung cancer QA,demonstrating the feasibility of integrating structured KGs with LLMs.This work advances intelligent TCM healthcare tools and lays a foundation for future AI-driven applications in traditional medicine.
基金This work was supported by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(No.2022R1F1A1067008)by the Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(No.2019R1A6A1A03032119).
文摘Question-answering(QA)models find answers to a given question.The necessity of automatically finding answers is increasing because it is very important and challenging from the large-scale QA data sets.In this paper,we deal with the QA pair matching approach in QA models,which finds the most relevant question and its recommended answer for a given question.Existing studies for the approach performed on the entire dataset or datasets within a category that the question writer manually specifies.In contrast,we aim to automatically find the category to which the question belongs by employing the text classification model and to find the answer corresponding to the question within the category.Due to the text classification model,we can effectively reduce the search space for finding the answers to a given question.Therefore,the proposed model improves the accuracy of the QA matching model and significantly reduces the model inference time.Furthermore,to improve the performance of finding similar sentences in each category,we present an ensemble embedding model for sentences,improving the performance compared to the individual embedding models.Using real-world QA data sets,we evaluate the performance of the proposed QA matching model.As a result,the accuracy of our final ensemble embedding model based on the text classification model is 81.18%,which outperforms the existing models by 9.81%∼14.16%point.Moreover,in terms of the model inference speed,our model is faster than the existing models by 2.61∼5.07 times due to the effective reduction of search spaces by the text classification model.
文摘Traditional Chinese text retrieval systems return a ranked list of documentsin response to a user''s request. While a ranked list of documents may be an appropriate response forthe user, frequently it is not. Usually it would be better for the system to provide the answeritself instead of requiring the user to search for the answer in a set of documents. Since Chinesetext retrieval has just been developed lately, and due to various specific characteristics ofChinese language, the approaches to its retrieval are quite different from those studies andresearches proposed to deal with Western language. Thus, an architecture that augments existingsearch engines is developed to support Chinese natural language question answering. In this paper anew approach to building Chinese question-answering system is described, which is thegeneral-purpose, fully-automated Chinese quest ion-answering system available on the web. In theapproach, we attempt to represent Chinese text by its characteristics, and try to convert theChinese text into ERE (E: entity, R: relation) relation data lists, and then to answer the questionthrough ERE relation model. The system performs quite well giving the simplicity of the techniquesbeing utilized. Experimental results show that question-answering accuracy can be greatly improvedby analyzing more and more matching ERE relation data lists. Simple ERE relation data extractiontechniques work well in our system making it efficient to use with many backend retrieval engines.
基金supported by the National Natural Science Foundation of China(Grant No.60373095).
文摘Question-answering systems provide short answers with the use of available information.The implementation mechanism for a question answering system is presented in this paper and is based on concepts and statistics.The system determines the question and focuses on the answer types,making different conceptual expansions for different questions.It applies the latent semantic indexing(LSI)method to retrieve relevant passages.It uses matching algorithms to find a match between questions and sentences stored in a database.It also extracts answers from a frequently asked questions(FAQ)database by finding matching or similar sentences.The answering ability of the system has been improved with the use of LSI and FAQ.The question-answering system introduced in Chinese universities is a developed and proven system capable of precise results.
基金supported by the Young and Middle-aged Research Fund Project of Shenzhen People's Hospital(Grant No.SYHL2024-N0010)the Shenzhen Basic Research Program(General Program,Grant No.JCYJ20240813104409013)。
文摘Objective:This study aimed to develop a Nursing Retrieval-Augmented Generation(NurRAG)system based on large language models(LLMs)and to evaluate its accuracy and clinical applicability in nursing question answering.Methods:A multidisciplinary team consisting of nursing experts,artificial intelligence researchers,and information engineers collaboratively designed the NurRAG framework following the principles of retrieval-augmented generation.The system included four functional modules:1)construction of a nursing knowledge base through document normalization,embedding,and vector indexing;2)nursing question filtering using a supervised classifier;3)semantic retrieval and re-ranking for evidence selection;and 4)evidence-conditioned language model generation to produce citation-based nursing answers.The system was securely deployed on hospital intranet servers using Docker containers.Performance evaluation was conducted with 1,000 expert-verified nursing question–answer pairs.Semantic fidelity was assessed using Recall Oriented Understudy for Gisting Evaluation–Longest Common Subsequence(ROUGE-L),and clinical correctness was measured using Accuracy.Results:The NurRAG system achieved significant improvements in both semantic fidelity and answer accuracy compared with conventional large language models.For ChatGLM2-6B,ROUGE-L increased from(30.73±1.48)%to(64.27±0.27)%,and accuracy increased from(49.08±0.92)%to(75.83±0.35)%.For LLaMA2-7B,ROUGE-L increased from(28.76±0.89)%to(60.33±0.21)%,and accuracy increased from(43.27±0.83)%to(73.29±0.33)%.All differences were statistically significant(P<0.001).A quantitative case analysis further demonstrated that NurRAG effectively reduced hallucinated outputs and generated evidence-based,guideline-concordant nursing responses.Conclusion:The NurRAG system integrates domain-specific retrieval with LLMs generation to provide accurate,reliable,and traceable evidence-based nursing answers.The findings demonstrate the system’s feasibility and potential to improve the accuracy of clinical knowledge access,support evidence-based nursing decision-making,and promote the safe application of artificial intelligence in nursing practice.
文摘The emergence of artificial intelligence natural language large models has brought new dawn for the in-depth empowerment of the industry.Research on key technologies and applications of railway natural language large model is of great significance to promoting and coordinating the development of railway artificial intelligence.This paper puts forward the application scenarios of railway natural language large model according to the application requirements of railway artificial intelligence;designs the overall architecture of the railway natural language large model by relying on the railway artificial intelligence platform,studies the key technologies of the natural language large model,builds a railway industry large model oriented to intelligent question-answering,and verifies the model with actual data;finally,this paper prospects for the development and application of railway natural language large model from the aspects of railway traffic organization,railway operation safety and passenger service.
基金Project (No. 2002AA119050) supported by the National Hi-TechResearch and Development Program (863) of China
文摘Automatic Chinese text summarization for dialogue style is a relatively new research area. In this paper, Latent Semantic Analysis (LSA) is first used to extract semantic knowledge from a given document, all question paragraphs are identified, an automatic text segmentation approach analogous to Text'filing is exploited to improve the precision of correlating question paragraphs and answer paragraphs, and finally some "important" sentences are extracted from the generic content and the question-answer pairs to generate a complete summary. Experimental results showed that our approach is highly efficient and improves significantly the coherence of the summary while not compromising informativeness.
基金supported by Agriculture Research System of China of MOF and MARA projects(Project No.CARS-23-D07)National Key Research and Development Program of China projects(Project No.2022YFD1600602).
文摘This paper systematically explores the application potential of large language models(LLMs)in the field of agricultural intelligence,focusing on key technologies and practical pathways.The study focuses on the adaptation of LLMs to agricultural knowledge,starting with foundational concepts such as architecture design,pre-training strategies,and fine-tuning techniques,to build a technical framework for knowledge integration in the agricultural domain.Using tools such as vector databases and knowledge graphs,the study enables the structured development of professional agricultural knowledge bases.Additionally,by combining multimodal learning and intelligent question-answering(Q&A)system design,it validates the application value of LLMs in agricultural knowledge services.Addressing core challenges in domain adaptation,including knowledge acquisition and integration,logical reasoning,multimodal data processing,agent collaboration,and dynamic knowledge updating,the paper proposes targeted solutions.The study further explores the innovative applications of LLMs in scenarios such as precision crop management and market dynamics analysis,providing theoretical support and technical pathways for the development of agricultural intelligence.Through the technological innovation of large language models and their deep integration with the agricultural sector,the intelligence level of agricultural production,decision-making,and services can be effectively enhanced.
文摘Inherent heterogeneity and distribution of knowledge strongly prevent knowledge from sharing and reusing among different agents and software entities, and a formal ontology has been viewed as a promising means to tackle this problem. In this paper, a domain-specific formal ontology of archaeology is presented. The ontology mainly consists of three parts: archaeological categories, their relationships and axioms. The ontology not only captures the semantics of archaeological knowledge, but also provides archaeology with an explicit and formal specification of a shared conceptualization, thus making archaeological knowledge shareable and reusable across humans and machines in a structured fashion. Further, we propose a method to verify ontology. correctness based on the individuals of categories. As applications of the ontology,we have developed an ontology-driven approach to knowledge acquisition from archaeological text and a question answering system for archaeological knowledge.
基金part of a curriculum reform project for English majorsBeijing Foreign Studies University and Beijing Municipal Educational Commission for their sponsorship(BFSU05012,BFSU0103B03,BMEC Higher Education[2006]27)
文摘Using a conversation analysis approach, the present study investigates the teacher-led question-answer sequences of one successful seminar course (Short Stories and Western Culture) within the curriculum reform for English majors in Beijing Foreign Studies University, aiming at uncovering an effective way of integrating disciplinary learning with language skills development. The result of the analysis shows that the teacher of the course, who perceives student participation as an indispensable ingredient of his class, often uses more divergent, opinion-seeking questions to initiate discussion and uses four types of expansion question on his turns to promote student participation, namely, probing questions (PQ), clue-giving questions (CQ), elaboration requests (ER), and agreement checks (AC). The study also generates an I-R-(E)-F-FC [Initiation-Response-(Evaluation)-FoUow up-Further Contribution] model, in which the teacher attempts to promote student participation and guide the construction of students' understanding.