Objective To improve the accuracy and professionalism of question-answering(QA)model in traditional Chinese medicine(TCM)lung cancer by integrating large language models with structured knowledge graphs using the know...Objective To improve the accuracy and professionalism of question-answering(QA)model in traditional Chinese medicine(TCM)lung cancer by integrating large language models with structured knowledge graphs using the knowledge graph(KG)to text-enhanced retrievalaugmented generation(KG2TRAG)method.Methods The TCM lung cancer model(TCMLCM)was constructed by fine-tuning Chat-GLM2-6B on the specialized datasets Tianchi TCM,HuangDi,and ShenNong-TCM-Dataset,as well as a TCM lung cancer KG.The KG2TRAG method was applied to enhance the knowledge retrieval,which can convert KG triples into natural language text via ChatGPT-aided linearization,leveraging large language models(LLMs)for context-aware reasoning.For a comprehensive comparison,MedicalGPT,HuatuoGPT,and BenTsao were selected as the baseline models.Performance was evaluated using bilingual evaluation understudy(BLEU),recall-oriented understudy for gisting evaluation(ROUGE),accuracy,and the domain-specific TCM-LCEval metrics,with validation from TCM oncology experts assessing answer accuracy,professionalism,and usability.Results The TCMLCM model achieved the optimal performance across all metrics,including a BLEU score of 32.15%,ROUGE-L of 59.08%,and an accuracy rate of 79.68%.Notably,in the TCM-LCEval assessment specific to the field of TCM,its performance was 3%−12%higher than that of the baseline model.Expert evaluations highlighted superior performance in accuracy and professionalism.Conclusion TCMLCM can provide an innovative solution for TCM lung cancer QA,demonstrating the feasibility of integrating structured KGs with LLMs.This work advances intelligent TCM healthcare tools and lays a foundation for future AI-driven applications in traditional medicine.展开更多
Question-answering(QA)models find answers to a given question.The necessity of automatically finding answers is increasing because it is very important and challenging from the large-scale QA data sets.In this paper,w...Question-answering(QA)models find answers to a given question.The necessity of automatically finding answers is increasing because it is very important and challenging from the large-scale QA data sets.In this paper,we deal with the QA pair matching approach in QA models,which finds the most relevant question and its recommended answer for a given question.Existing studies for the approach performed on the entire dataset or datasets within a category that the question writer manually specifies.In contrast,we aim to automatically find the category to which the question belongs by employing the text classification model and to find the answer corresponding to the question within the category.Due to the text classification model,we can effectively reduce the search space for finding the answers to a given question.Therefore,the proposed model improves the accuracy of the QA matching model and significantly reduces the model inference time.Furthermore,to improve the performance of finding similar sentences in each category,we present an ensemble embedding model for sentences,improving the performance compared to the individual embedding models.Using real-world QA data sets,we evaluate the performance of the proposed QA matching model.As a result,the accuracy of our final ensemble embedding model based on the text classification model is 81.18%,which outperforms the existing models by 9.81%∼14.16%point.Moreover,in terms of the model inference speed,our model is faster than the existing models by 2.61∼5.07 times due to the effective reduction of search spaces by the text classification model.展开更多
This paper compares 12 representative Chinese and English online questionanswering communities(Q&A communities) based on their basic functions, interactive modes, and customized services. An empirical experiment f...This paper compares 12 representative Chinese and English online questionanswering communities(Q&A communities) based on their basic functions, interactive modes, and customized services. An empirical experiment from a comparative perspective was also conducted on them by using 12 questions representing for four types of questions,which are assigned evenly to three different subject fields so as to examine the task performance of these 12 selected online Q&A communities. Our goal was to evaluate those online Q&A communities in terms of their quality and efficiency for answering questions posed to them. It was hoped that our empirical research would yield greater understanding and insights to the working intricacy of these online Q&A communities and hence their possible further improvement.展开更多
Traditional Chinese text retrieval systems return a ranked list of documentsin response to a user''s request. While a ranked list of documents may be an appropriate response forthe user, frequently it is not. ...Traditional Chinese text retrieval systems return a ranked list of documentsin response to a user''s request. While a ranked list of documents may be an appropriate response forthe user, frequently it is not. Usually it would be better for the system to provide the answeritself instead of requiring the user to search for the answer in a set of documents. Since Chinesetext retrieval has just been developed lately, and due to various specific characteristics ofChinese language, the approaches to its retrieval are quite different from those studies andresearches proposed to deal with Western language. Thus, an architecture that augments existingsearch engines is developed to support Chinese natural language question answering. In this paper anew approach to building Chinese question-answering system is described, which is thegeneral-purpose, fully-automated Chinese quest ion-answering system available on the web. In theapproach, we attempt to represent Chinese text by its characteristics, and try to convert theChinese text into ERE (E: entity, R: relation) relation data lists, and then to answer the questionthrough ERE relation model. The system performs quite well giving the simplicity of the techniquesbeing utilized. Experimental results show that question-answering accuracy can be greatly improvedby analyzing more and more matching ERE relation data lists. Simple ERE relation data extractiontechniques work well in our system making it efficient to use with many backend retrieval engines.展开更多
Question-answering systems provide short answers with the use of available information.The implementation mechanism for a question answering system is presented in this paper and is based on concepts and statistics.Th...Question-answering systems provide short answers with the use of available information.The implementation mechanism for a question answering system is presented in this paper and is based on concepts and statistics.The system determines the question and focuses on the answer types,making different conceptual expansions for different questions.It applies the latent semantic indexing(LSI)method to retrieve relevant passages.It uses matching algorithms to find a match between questions and sentences stored in a database.It also extracts answers from a frequently asked questions(FAQ)database by finding matching or similar sentences.The answering ability of the system has been improved with the use of LSI and FAQ.The question-answering system introduced in Chinese universities is a developed and proven system capable of precise results.展开更多
钻井顶部驱动装置结构复杂、故障类型多样,现有的故障树分析法和专家系统难以有效应对复杂多变的现场情况。为此,利用知识图谱在结构化与非结构化信息融合、故障模式关联分析以及先验知识传递方面的优势,提出了一种基于知识图谱的钻井...钻井顶部驱动装置结构复杂、故障类型多样,现有的故障树分析法和专家系统难以有效应对复杂多变的现场情况。为此,利用知识图谱在结构化与非结构化信息融合、故障模式关联分析以及先验知识传递方面的优势,提出了一种基于知识图谱的钻井顶部驱动装置故障诊断方法,利用以Transformer为基础的双向编码器模型(Bidirectional Encoder Representations from Transformers,BERT)构建了混合神经网络模型BERT-BiLSTM-CRF与BERT-BiLSTM-Attention,分别实现了顶驱故障文本数据的命名实体识别和关系抽取,并通过相似度计算,实现了故障知识的有效融合和智能问答,最终构建了顶部驱动装置故障诊断方法。研究结果表明:①在故障实体识别任务上,BERT-BiLSTM-CRF模型的精确度达到95.49%,能够有效识别故障文本中的信息实体;②在故障关系抽取上,BERT-BiLSTM-Attention模型的精确度达到93.61%,实现了知识图谱关系边的正确建立;③开发的问答系统实现了知识图谱的智能应用,其在多个不同类型问题上的回答准确率超过了90%,能够满足现场使用需求。结论认为,基于知识图谱的故障诊断方法能够有效利用顶部驱动装置的先验知识,实现故障的快速定位与智能诊断,具备良好的应用前景。展开更多
以ChatGPT为代表的大型语言模型(LLMs)在多种任务中展现了巨大潜力。然而,LLMs仍然面临幻觉现象和长尾知识遗忘等问题。为了解决这些问题,现有方法通过结合知识图谱等外部知识显著增强LLMs的生成能力,从而提升回答的准确性和完整性。但...以ChatGPT为代表的大型语言模型(LLMs)在多种任务中展现了巨大潜力。然而,LLMs仍然面临幻觉现象和长尾知识遗忘等问题。为了解决这些问题,现有方法通过结合知识图谱等外部知识显著增强LLMs的生成能力,从而提升回答的准确性和完整性。但是,这些方法存在如知识图谱构建复杂、语义丢失以及知识单向流动等问题。为此,我们提出了一种双向增强框架,不仅利用知识图谱增强LLMs的生成效果,而且利用LLMs的推理结果补充知识图谱,从而形成知识的双向流动,并最终形成知识图谱与LLMs之间的循环正反馈,不断优化系统效果。此外,通过设计增强知识图谱(Enhanced Knowledge Graph,EKG),我们将关系抽取任务延迟到检索阶段,降低知识图谱的构建成本,并利用向量检索技术缓解语义丢失问题。基于此框架,本文构建了双向增强系统——BEKO(Bidirectional Enhancement with a Knowledge Ocean)系统,并在关系推理应用中相比传统方法取得明显的性能提升,验证了双向增强框架的可行性和有效性。BEKO系统目前已经部署在公开的网站——ko.zhonghuapu.com。展开更多
针对目前基于检索增强生成技术的领域问答任务中由于用户查询和知识库中相关知识的语义差距导致回答效果差的问题,本文提出一种基于关键词抽取和混合检索的对齐优化方法。首先,利用大语言模型抽取用户查询中的关键词;其次,将用户查询拼...针对目前基于检索增强生成技术的领域问答任务中由于用户查询和知识库中相关知识的语义差距导致回答效果差的问题,本文提出一种基于关键词抽取和混合检索的对齐优化方法。首先,利用大语言模型抽取用户查询中的关键词;其次,将用户查询拼接抽取后的关键词组成组合查询,将组合查询与用户查询分别输入稀疏检索模型和稠密检索模型召回相关文档;然后,将检索模型召回的文档做并集处理并重排;最后,将重排后的相关知识输入文本过滤器提取出关键信息文本,并与用户查询合并输入大语言模型生成答案返回给用户。实验结果表明,所提方法在公开的中医药问答数据集和通用领域问答数据集CMRC2018上相较于基于查询改写的对齐优化方法,Recall-Oriented Understudy for Gisting Evaluation Longest common subsequence(ROUGE-L)指标分别提高了9.9个百分点和2.3个百分点,F1指标分别提高了4.1个百分点和1.7个百分点。本文的实验结果验证了所提方法在提升领域问答准确度的有效性。展开更多
大语言模型(LLMs,Large Language Models)具有极强的自然语言理解和复杂问题求解能力,本文基于大语言模型构建了矿物问答系统,以高效地获取矿物知识。该系统首先从互联网资源获取矿物数据,清洗后将矿物数据结构化为矿物文档和问答对;将...大语言模型(LLMs,Large Language Models)具有极强的自然语言理解和复杂问题求解能力,本文基于大语言模型构建了矿物问答系统,以高效地获取矿物知识。该系统首先从互联网资源获取矿物数据,清洗后将矿物数据结构化为矿物文档和问答对;将矿物文档经过格式转换和建立索引后转化为矿物知识库,用于检索增强大语言模型生成,问答对用于微调大语言模型。使用矿物知识库检索增强大语言模型生成时,采用先召回再精排的两级检索模式,以获得更好的大语言模型生成结果。矿物大语言模型微调采用了主流的低秩适配(Low-Rank Adaption,LoRA)方法,以较少的训练参数获得了与全参微调性能相当的效果,节省了计算资源。实验结果表明,基于检索增强生成的大语言模型的矿物问答系统能以较高的准确率快捷地获取矿物知识。展开更多
基金Postgraduate Research&Practice Innovation Program of Jiangsu Province(KYCX24_2145).
文摘Objective To improve the accuracy and professionalism of question-answering(QA)model in traditional Chinese medicine(TCM)lung cancer by integrating large language models with structured knowledge graphs using the knowledge graph(KG)to text-enhanced retrievalaugmented generation(KG2TRAG)method.Methods The TCM lung cancer model(TCMLCM)was constructed by fine-tuning Chat-GLM2-6B on the specialized datasets Tianchi TCM,HuangDi,and ShenNong-TCM-Dataset,as well as a TCM lung cancer KG.The KG2TRAG method was applied to enhance the knowledge retrieval,which can convert KG triples into natural language text via ChatGPT-aided linearization,leveraging large language models(LLMs)for context-aware reasoning.For a comprehensive comparison,MedicalGPT,HuatuoGPT,and BenTsao were selected as the baseline models.Performance was evaluated using bilingual evaluation understudy(BLEU),recall-oriented understudy for gisting evaluation(ROUGE),accuracy,and the domain-specific TCM-LCEval metrics,with validation from TCM oncology experts assessing answer accuracy,professionalism,and usability.Results The TCMLCM model achieved the optimal performance across all metrics,including a BLEU score of 32.15%,ROUGE-L of 59.08%,and an accuracy rate of 79.68%.Notably,in the TCM-LCEval assessment specific to the field of TCM,its performance was 3%−12%higher than that of the baseline model.Expert evaluations highlighted superior performance in accuracy and professionalism.Conclusion TCMLCM can provide an innovative solution for TCM lung cancer QA,demonstrating the feasibility of integrating structured KGs with LLMs.This work advances intelligent TCM healthcare tools and lays a foundation for future AI-driven applications in traditional medicine.
基金This work was supported by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(No.2022R1F1A1067008)by the Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(No.2019R1A6A1A03032119).
文摘Question-answering(QA)models find answers to a given question.The necessity of automatically finding answers is increasing because it is very important and challenging from the large-scale QA data sets.In this paper,we deal with the QA pair matching approach in QA models,which finds the most relevant question and its recommended answer for a given question.Existing studies for the approach performed on the entire dataset or datasets within a category that the question writer manually specifies.In contrast,we aim to automatically find the category to which the question belongs by employing the text classification model and to find the answer corresponding to the question within the category.Due to the text classification model,we can effectively reduce the search space for finding the answers to a given question.Therefore,the proposed model improves the accuracy of the QA matching model and significantly reduces the model inference time.Furthermore,to improve the performance of finding similar sentences in each category,we present an ensemble embedding model for sentences,improving the performance compared to the individual embedding models.Using real-world QA data sets,we evaluate the performance of the proposed QA matching model.As a result,the accuracy of our final ensemble embedding model based on the text classification model is 81.18%,which outperforms the existing models by 9.81%∼14.16%point.Moreover,in terms of the model inference speed,our model is faster than the existing models by 2.61∼5.07 times due to the effective reduction of search spaces by the text classification model.
基金jointly supported by Wuhan International Science and Technology Cooperation Fund(Grant No.201070934337)the 3rd Special Award of China Postdoctoral Science Foundation(Grant No.201003497)National Science Foundation of USA(Grant No.NSF/IIS-1052773)
文摘This paper compares 12 representative Chinese and English online questionanswering communities(Q&A communities) based on their basic functions, interactive modes, and customized services. An empirical experiment from a comparative perspective was also conducted on them by using 12 questions representing for four types of questions,which are assigned evenly to three different subject fields so as to examine the task performance of these 12 selected online Q&A communities. Our goal was to evaluate those online Q&A communities in terms of their quality and efficiency for answering questions posed to them. It was hoped that our empirical research would yield greater understanding and insights to the working intricacy of these online Q&A communities and hence their possible further improvement.
文摘Traditional Chinese text retrieval systems return a ranked list of documentsin response to a user''s request. While a ranked list of documents may be an appropriate response forthe user, frequently it is not. Usually it would be better for the system to provide the answeritself instead of requiring the user to search for the answer in a set of documents. Since Chinesetext retrieval has just been developed lately, and due to various specific characteristics ofChinese language, the approaches to its retrieval are quite different from those studies andresearches proposed to deal with Western language. Thus, an architecture that augments existingsearch engines is developed to support Chinese natural language question answering. In this paper anew approach to building Chinese question-answering system is described, which is thegeneral-purpose, fully-automated Chinese quest ion-answering system available on the web. In theapproach, we attempt to represent Chinese text by its characteristics, and try to convert theChinese text into ERE (E: entity, R: relation) relation data lists, and then to answer the questionthrough ERE relation model. The system performs quite well giving the simplicity of the techniquesbeing utilized. Experimental results show that question-answering accuracy can be greatly improvedby analyzing more and more matching ERE relation data lists. Simple ERE relation data extractiontechniques work well in our system making it efficient to use with many backend retrieval engines.
基金supported by the National Natural Science Foundation of China(Grant No.60373095).
文摘Question-answering systems provide short answers with the use of available information.The implementation mechanism for a question answering system is presented in this paper and is based on concepts and statistics.The system determines the question and focuses on the answer types,making different conceptual expansions for different questions.It applies the latent semantic indexing(LSI)method to retrieve relevant passages.It uses matching algorithms to find a match between questions and sentences stored in a database.It also extracts answers from a frequently asked questions(FAQ)database by finding matching or similar sentences.The answering ability of the system has been improved with the use of LSI and FAQ.The question-answering system introduced in Chinese universities is a developed and proven system capable of precise results.
文摘钻井顶部驱动装置结构复杂、故障类型多样,现有的故障树分析法和专家系统难以有效应对复杂多变的现场情况。为此,利用知识图谱在结构化与非结构化信息融合、故障模式关联分析以及先验知识传递方面的优势,提出了一种基于知识图谱的钻井顶部驱动装置故障诊断方法,利用以Transformer为基础的双向编码器模型(Bidirectional Encoder Representations from Transformers,BERT)构建了混合神经网络模型BERT-BiLSTM-CRF与BERT-BiLSTM-Attention,分别实现了顶驱故障文本数据的命名实体识别和关系抽取,并通过相似度计算,实现了故障知识的有效融合和智能问答,最终构建了顶部驱动装置故障诊断方法。研究结果表明:①在故障实体识别任务上,BERT-BiLSTM-CRF模型的精确度达到95.49%,能够有效识别故障文本中的信息实体;②在故障关系抽取上,BERT-BiLSTM-Attention模型的精确度达到93.61%,实现了知识图谱关系边的正确建立;③开发的问答系统实现了知识图谱的智能应用,其在多个不同类型问题上的回答准确率超过了90%,能够满足现场使用需求。结论认为,基于知识图谱的故障诊断方法能够有效利用顶部驱动装置的先验知识,实现故障的快速定位与智能诊断,具备良好的应用前景。
文摘以ChatGPT为代表的大型语言模型(LLMs)在多种任务中展现了巨大潜力。然而,LLMs仍然面临幻觉现象和长尾知识遗忘等问题。为了解决这些问题,现有方法通过结合知识图谱等外部知识显著增强LLMs的生成能力,从而提升回答的准确性和完整性。但是,这些方法存在如知识图谱构建复杂、语义丢失以及知识单向流动等问题。为此,我们提出了一种双向增强框架,不仅利用知识图谱增强LLMs的生成效果,而且利用LLMs的推理结果补充知识图谱,从而形成知识的双向流动,并最终形成知识图谱与LLMs之间的循环正反馈,不断优化系统效果。此外,通过设计增强知识图谱(Enhanced Knowledge Graph,EKG),我们将关系抽取任务延迟到检索阶段,降低知识图谱的构建成本,并利用向量检索技术缓解语义丢失问题。基于此框架,本文构建了双向增强系统——BEKO(Bidirectional Enhancement with a Knowledge Ocean)系统,并在关系推理应用中相比传统方法取得明显的性能提升,验证了双向增强框架的可行性和有效性。BEKO系统目前已经部署在公开的网站——ko.zhonghuapu.com。
文摘针对目前基于检索增强生成技术的领域问答任务中由于用户查询和知识库中相关知识的语义差距导致回答效果差的问题,本文提出一种基于关键词抽取和混合检索的对齐优化方法。首先,利用大语言模型抽取用户查询中的关键词;其次,将用户查询拼接抽取后的关键词组成组合查询,将组合查询与用户查询分别输入稀疏检索模型和稠密检索模型召回相关文档;然后,将检索模型召回的文档做并集处理并重排;最后,将重排后的相关知识输入文本过滤器提取出关键信息文本,并与用户查询合并输入大语言模型生成答案返回给用户。实验结果表明,所提方法在公开的中医药问答数据集和通用领域问答数据集CMRC2018上相较于基于查询改写的对齐优化方法,Recall-Oriented Understudy for Gisting Evaluation Longest common subsequence(ROUGE-L)指标分别提高了9.9个百分点和2.3个百分点,F1指标分别提高了4.1个百分点和1.7个百分点。本文的实验结果验证了所提方法在提升领域问答准确度的有效性。
文摘大语言模型(LLMs,Large Language Models)具有极强的自然语言理解和复杂问题求解能力,本文基于大语言模型构建了矿物问答系统,以高效地获取矿物知识。该系统首先从互联网资源获取矿物数据,清洗后将矿物数据结构化为矿物文档和问答对;将矿物文档经过格式转换和建立索引后转化为矿物知识库,用于检索增强大语言模型生成,问答对用于微调大语言模型。使用矿物知识库检索增强大语言模型生成时,采用先召回再精排的两级检索模式,以获得更好的大语言模型生成结果。矿物大语言模型微调采用了主流的低秩适配(Low-Rank Adaption,LoRA)方法,以较少的训练参数获得了与全参微调性能相当的效果,节省了计算资源。实验结果表明,基于检索增强生成的大语言模型的矿物问答系统能以较高的准确率快捷地获取矿物知识。