Objective To improve the accuracy and professionalism of question-answering(QA)model in traditional Chinese medicine(TCM)lung cancer by integrating large language models with structured knowledge graphs using the know...Objective To improve the accuracy and professionalism of question-answering(QA)model in traditional Chinese medicine(TCM)lung cancer by integrating large language models with structured knowledge graphs using the knowledge graph(KG)to text-enhanced retrievalaugmented generation(KG2TRAG)method.Methods The TCM lung cancer model(TCMLCM)was constructed by fine-tuning Chat-GLM2-6B on the specialized datasets Tianchi TCM,HuangDi,and ShenNong-TCM-Dataset,as well as a TCM lung cancer KG.The KG2TRAG method was applied to enhance the knowledge retrieval,which can convert KG triples into natural language text via ChatGPT-aided linearization,leveraging large language models(LLMs)for context-aware reasoning.For a comprehensive comparison,MedicalGPT,HuatuoGPT,and BenTsao were selected as the baseline models.Performance was evaluated using bilingual evaluation understudy(BLEU),recall-oriented understudy for gisting evaluation(ROUGE),accuracy,and the domain-specific TCM-LCEval metrics,with validation from TCM oncology experts assessing answer accuracy,professionalism,and usability.Results The TCMLCM model achieved the optimal performance across all metrics,including a BLEU score of 32.15%,ROUGE-L of 59.08%,and an accuracy rate of 79.68%.Notably,in the TCM-LCEval assessment specific to the field of TCM,its performance was 3%−12%higher than that of the baseline model.Expert evaluations highlighted superior performance in accuracy and professionalism.Conclusion TCMLCM can provide an innovative solution for TCM lung cancer QA,demonstrating the feasibility of integrating structured KGs with LLMs.This work advances intelligent TCM healthcare tools and lays a foundation for future AI-driven applications in traditional medicine.展开更多
Question-answering(QA)models find answers to a given question.The necessity of automatically finding answers is increasing because it is very important and challenging from the large-scale QA data sets.In this paper,w...Question-answering(QA)models find answers to a given question.The necessity of automatically finding answers is increasing because it is very important and challenging from the large-scale QA data sets.In this paper,we deal with the QA pair matching approach in QA models,which finds the most relevant question and its recommended answer for a given question.Existing studies for the approach performed on the entire dataset or datasets within a category that the question writer manually specifies.In contrast,we aim to automatically find the category to which the question belongs by employing the text classification model and to find the answer corresponding to the question within the category.Due to the text classification model,we can effectively reduce the search space for finding the answers to a given question.Therefore,the proposed model improves the accuracy of the QA matching model and significantly reduces the model inference time.Furthermore,to improve the performance of finding similar sentences in each category,we present an ensemble embedding model for sentences,improving the performance compared to the individual embedding models.Using real-world QA data sets,we evaluate the performance of the proposed QA matching model.As a result,the accuracy of our final ensemble embedding model based on the text classification model is 81.18%,which outperforms the existing models by 9.81%∼14.16%point.Moreover,in terms of the model inference speed,our model is faster than the existing models by 2.61∼5.07 times due to the effective reduction of search spaces by the text classification model.展开更多
This paper compares 12 representative Chinese and English online questionanswering communities(Q&A communities) based on their basic functions, interactive modes, and customized services. An empirical experiment f...This paper compares 12 representative Chinese and English online questionanswering communities(Q&A communities) based on their basic functions, interactive modes, and customized services. An empirical experiment from a comparative perspective was also conducted on them by using 12 questions representing for four types of questions,which are assigned evenly to three different subject fields so as to examine the task performance of these 12 selected online Q&A communities. Our goal was to evaluate those online Q&A communities in terms of their quality and efficiency for answering questions posed to them. It was hoped that our empirical research would yield greater understanding and insights to the working intricacy of these online Q&A communities and hence their possible further improvement.展开更多
Traditional Chinese text retrieval systems return a ranked list of documentsin response to a user''s request. While a ranked list of documents may be an appropriate response forthe user, frequently it is not. ...Traditional Chinese text retrieval systems return a ranked list of documentsin response to a user''s request. While a ranked list of documents may be an appropriate response forthe user, frequently it is not. Usually it would be better for the system to provide the answeritself instead of requiring the user to search for the answer in a set of documents. Since Chinesetext retrieval has just been developed lately, and due to various specific characteristics ofChinese language, the approaches to its retrieval are quite different from those studies andresearches proposed to deal with Western language. Thus, an architecture that augments existingsearch engines is developed to support Chinese natural language question answering. In this paper anew approach to building Chinese question-answering system is described, which is thegeneral-purpose, fully-automated Chinese quest ion-answering system available on the web. In theapproach, we attempt to represent Chinese text by its characteristics, and try to convert theChinese text into ERE (E: entity, R: relation) relation data lists, and then to answer the questionthrough ERE relation model. The system performs quite well giving the simplicity of the techniquesbeing utilized. Experimental results show that question-answering accuracy can be greatly improvedby analyzing more and more matching ERE relation data lists. Simple ERE relation data extractiontechniques work well in our system making it efficient to use with many backend retrieval engines.展开更多
Question-answering systems provide short answers with the use of available information.The implementation mechanism for a question answering system is presented in this paper and is based on concepts and statistics.Th...Question-answering systems provide short answers with the use of available information.The implementation mechanism for a question answering system is presented in this paper and is based on concepts and statistics.The system determines the question and focuses on the answer types,making different conceptual expansions for different questions.It applies the latent semantic indexing(LSI)method to retrieve relevant passages.It uses matching algorithms to find a match between questions and sentences stored in a database.It also extracts answers from a frequently asked questions(FAQ)database by finding matching or similar sentences.The answering ability of the system has been improved with the use of LSI and FAQ.The question-answering system introduced in Chinese universities is a developed and proven system capable of precise results.展开更多
大语言模型(LLMs,Large Language Models)具有极强的自然语言理解和复杂问题求解能力,本文基于大语言模型构建了矿物问答系统,以高效地获取矿物知识。该系统首先从互联网资源获取矿物数据,清洗后将矿物数据结构化为矿物文档和问答对;将...大语言模型(LLMs,Large Language Models)具有极强的自然语言理解和复杂问题求解能力,本文基于大语言模型构建了矿物问答系统,以高效地获取矿物知识。该系统首先从互联网资源获取矿物数据,清洗后将矿物数据结构化为矿物文档和问答对;将矿物文档经过格式转换和建立索引后转化为矿物知识库,用于检索增强大语言模型生成,问答对用于微调大语言模型。使用矿物知识库检索增强大语言模型生成时,采用先召回再精排的两级检索模式,以获得更好的大语言模型生成结果。矿物大语言模型微调采用了主流的低秩适配(Low-Rank Adaption,LoRA)方法,以较少的训练参数获得了与全参微调性能相当的效果,节省了计算资源。实验结果表明,基于检索增强生成的大语言模型的矿物问答系统能以较高的准确率快捷地获取矿物知识。展开更多
随着全球气候变化日益严重,企业碳排放分析成为国际关注的焦点,针对通用大语言模型(large language model,LLM)知识更新滞后,增强生成架构在处理复杂问题时缺乏专业性与准确性,以及大模型生成结果中幻觉率高的问题,通过构建专有知识库,...随着全球气候变化日益严重,企业碳排放分析成为国际关注的焦点,针对通用大语言模型(large language model,LLM)知识更新滞后,增强生成架构在处理复杂问题时缺乏专业性与准确性,以及大模型生成结果中幻觉率高的问题,通过构建专有知识库,开发了基于大语言模型的企业碳排放分析与知识问答系统。提出了一种多样化索引模块构建方法,构建高质量的知识与法规检索数据集。针对碳排放报告(政策)领域的知识问答任务,提出了自提示检索增强生成架构,集成意图识别、改进的结构化思维链、混合检索技术、高质量提示工程和Text2SQL系统,支持多维度分析企业可持续性报告,为企业碳排放报告(政策)提供了一种高效、精准的知识问答解决方案。通过多层分块机制、文档索引和幻觉识别功能,确保结果的准确性与可验证性,降低了LLM技术在系统中的幻觉率。通过对比实验,所提算法在各模块的协同下在检索增强生成实验中各指标表现优异,对于企业碳排放报告的关键信息抽取和报告评价,尤其是长文本处理具有明显的优势。展开更多
钻井顶部驱动装置结构复杂、故障类型多样,现有的故障树分析法和专家系统难以有效应对复杂多变的现场情况。为此,利用知识图谱在结构化与非结构化信息融合、故障模式关联分析以及先验知识传递方面的优势,提出了一种基于知识图谱的钻井...钻井顶部驱动装置结构复杂、故障类型多样,现有的故障树分析法和专家系统难以有效应对复杂多变的现场情况。为此,利用知识图谱在结构化与非结构化信息融合、故障模式关联分析以及先验知识传递方面的优势,提出了一种基于知识图谱的钻井顶部驱动装置故障诊断方法,利用以Transformer为基础的双向编码器模型(Bidirectional Encoder Representations from Transformers,BERT)构建了混合神经网络模型BERT-BiLSTM-CRF与BERT-BiLSTM-Attention,分别实现了顶驱故障文本数据的命名实体识别和关系抽取,并通过相似度计算,实现了故障知识的有效融合和智能问答,最终构建了顶部驱动装置故障诊断方法。研究结果表明:①在故障实体识别任务上,BERT-BiLSTM-CRF模型的精确度达到95.49%,能够有效识别故障文本中的信息实体;②在故障关系抽取上,BERT-BiLSTM-Attention模型的精确度达到93.61%,实现了知识图谱关系边的正确建立;③开发的问答系统实现了知识图谱的智能应用,其在多个不同类型问题上的回答准确率超过了90%,能够满足现场使用需求。结论认为,基于知识图谱的故障诊断方法能够有效利用顶部驱动装置的先验知识,实现故障的快速定位与智能诊断,具备良好的应用前景。展开更多
基金Postgraduate Research&Practice Innovation Program of Jiangsu Province(KYCX24_2145).
文摘Objective To improve the accuracy and professionalism of question-answering(QA)model in traditional Chinese medicine(TCM)lung cancer by integrating large language models with structured knowledge graphs using the knowledge graph(KG)to text-enhanced retrievalaugmented generation(KG2TRAG)method.Methods The TCM lung cancer model(TCMLCM)was constructed by fine-tuning Chat-GLM2-6B on the specialized datasets Tianchi TCM,HuangDi,and ShenNong-TCM-Dataset,as well as a TCM lung cancer KG.The KG2TRAG method was applied to enhance the knowledge retrieval,which can convert KG triples into natural language text via ChatGPT-aided linearization,leveraging large language models(LLMs)for context-aware reasoning.For a comprehensive comparison,MedicalGPT,HuatuoGPT,and BenTsao were selected as the baseline models.Performance was evaluated using bilingual evaluation understudy(BLEU),recall-oriented understudy for gisting evaluation(ROUGE),accuracy,and the domain-specific TCM-LCEval metrics,with validation from TCM oncology experts assessing answer accuracy,professionalism,and usability.Results The TCMLCM model achieved the optimal performance across all metrics,including a BLEU score of 32.15%,ROUGE-L of 59.08%,and an accuracy rate of 79.68%.Notably,in the TCM-LCEval assessment specific to the field of TCM,its performance was 3%−12%higher than that of the baseline model.Expert evaluations highlighted superior performance in accuracy and professionalism.Conclusion TCMLCM can provide an innovative solution for TCM lung cancer QA,demonstrating the feasibility of integrating structured KGs with LLMs.This work advances intelligent TCM healthcare tools and lays a foundation for future AI-driven applications in traditional medicine.
基金This work was supported by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(No.2022R1F1A1067008)by the Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(No.2019R1A6A1A03032119).
文摘Question-answering(QA)models find answers to a given question.The necessity of automatically finding answers is increasing because it is very important and challenging from the large-scale QA data sets.In this paper,we deal with the QA pair matching approach in QA models,which finds the most relevant question and its recommended answer for a given question.Existing studies for the approach performed on the entire dataset or datasets within a category that the question writer manually specifies.In contrast,we aim to automatically find the category to which the question belongs by employing the text classification model and to find the answer corresponding to the question within the category.Due to the text classification model,we can effectively reduce the search space for finding the answers to a given question.Therefore,the proposed model improves the accuracy of the QA matching model and significantly reduces the model inference time.Furthermore,to improve the performance of finding similar sentences in each category,we present an ensemble embedding model for sentences,improving the performance compared to the individual embedding models.Using real-world QA data sets,we evaluate the performance of the proposed QA matching model.As a result,the accuracy of our final ensemble embedding model based on the text classification model is 81.18%,which outperforms the existing models by 9.81%∼14.16%point.Moreover,in terms of the model inference speed,our model is faster than the existing models by 2.61∼5.07 times due to the effective reduction of search spaces by the text classification model.
基金jointly supported by Wuhan International Science and Technology Cooperation Fund(Grant No.201070934337)the 3rd Special Award of China Postdoctoral Science Foundation(Grant No.201003497)National Science Foundation of USA(Grant No.NSF/IIS-1052773)
文摘This paper compares 12 representative Chinese and English online questionanswering communities(Q&A communities) based on their basic functions, interactive modes, and customized services. An empirical experiment from a comparative perspective was also conducted on them by using 12 questions representing for four types of questions,which are assigned evenly to three different subject fields so as to examine the task performance of these 12 selected online Q&A communities. Our goal was to evaluate those online Q&A communities in terms of their quality and efficiency for answering questions posed to them. It was hoped that our empirical research would yield greater understanding and insights to the working intricacy of these online Q&A communities and hence their possible further improvement.
文摘Traditional Chinese text retrieval systems return a ranked list of documentsin response to a user''s request. While a ranked list of documents may be an appropriate response forthe user, frequently it is not. Usually it would be better for the system to provide the answeritself instead of requiring the user to search for the answer in a set of documents. Since Chinesetext retrieval has just been developed lately, and due to various specific characteristics ofChinese language, the approaches to its retrieval are quite different from those studies andresearches proposed to deal with Western language. Thus, an architecture that augments existingsearch engines is developed to support Chinese natural language question answering. In this paper anew approach to building Chinese question-answering system is described, which is thegeneral-purpose, fully-automated Chinese quest ion-answering system available on the web. In theapproach, we attempt to represent Chinese text by its characteristics, and try to convert theChinese text into ERE (E: entity, R: relation) relation data lists, and then to answer the questionthrough ERE relation model. The system performs quite well giving the simplicity of the techniquesbeing utilized. Experimental results show that question-answering accuracy can be greatly improvedby analyzing more and more matching ERE relation data lists. Simple ERE relation data extractiontechniques work well in our system making it efficient to use with many backend retrieval engines.
基金supported by the National Natural Science Foundation of China(Grant No.60373095).
文摘Question-answering systems provide short answers with the use of available information.The implementation mechanism for a question answering system is presented in this paper and is based on concepts and statistics.The system determines the question and focuses on the answer types,making different conceptual expansions for different questions.It applies the latent semantic indexing(LSI)method to retrieve relevant passages.It uses matching algorithms to find a match between questions and sentences stored in a database.It also extracts answers from a frequently asked questions(FAQ)database by finding matching or similar sentences.The answering ability of the system has been improved with the use of LSI and FAQ.The question-answering system introduced in Chinese universities is a developed and proven system capable of precise results.
文摘大语言模型(LLMs,Large Language Models)具有极强的自然语言理解和复杂问题求解能力,本文基于大语言模型构建了矿物问答系统,以高效地获取矿物知识。该系统首先从互联网资源获取矿物数据,清洗后将矿物数据结构化为矿物文档和问答对;将矿物文档经过格式转换和建立索引后转化为矿物知识库,用于检索增强大语言模型生成,问答对用于微调大语言模型。使用矿物知识库检索增强大语言模型生成时,采用先召回再精排的两级检索模式,以获得更好的大语言模型生成结果。矿物大语言模型微调采用了主流的低秩适配(Low-Rank Adaption,LoRA)方法,以较少的训练参数获得了与全参微调性能相当的效果,节省了计算资源。实验结果表明,基于检索增强生成的大语言模型的矿物问答系统能以较高的准确率快捷地获取矿物知识。
文摘随着全球气候变化日益严重,企业碳排放分析成为国际关注的焦点,针对通用大语言模型(large language model,LLM)知识更新滞后,增强生成架构在处理复杂问题时缺乏专业性与准确性,以及大模型生成结果中幻觉率高的问题,通过构建专有知识库,开发了基于大语言模型的企业碳排放分析与知识问答系统。提出了一种多样化索引模块构建方法,构建高质量的知识与法规检索数据集。针对碳排放报告(政策)领域的知识问答任务,提出了自提示检索增强生成架构,集成意图识别、改进的结构化思维链、混合检索技术、高质量提示工程和Text2SQL系统,支持多维度分析企业可持续性报告,为企业碳排放报告(政策)提供了一种高效、精准的知识问答解决方案。通过多层分块机制、文档索引和幻觉识别功能,确保结果的准确性与可验证性,降低了LLM技术在系统中的幻觉率。通过对比实验,所提算法在各模块的协同下在检索增强生成实验中各指标表现优异,对于企业碳排放报告的关键信息抽取和报告评价,尤其是长文本处理具有明显的优势。
文摘钻井顶部驱动装置结构复杂、故障类型多样,现有的故障树分析法和专家系统难以有效应对复杂多变的现场情况。为此,利用知识图谱在结构化与非结构化信息融合、故障模式关联分析以及先验知识传递方面的优势,提出了一种基于知识图谱的钻井顶部驱动装置故障诊断方法,利用以Transformer为基础的双向编码器模型(Bidirectional Encoder Representations from Transformers,BERT)构建了混合神经网络模型BERT-BiLSTM-CRF与BERT-BiLSTM-Attention,分别实现了顶驱故障文本数据的命名实体识别和关系抽取,并通过相似度计算,实现了故障知识的有效融合和智能问答,最终构建了顶部驱动装置故障诊断方法。研究结果表明:①在故障实体识别任务上,BERT-BiLSTM-CRF模型的精确度达到95.49%,能够有效识别故障文本中的信息实体;②在故障关系抽取上,BERT-BiLSTM-Attention模型的精确度达到93.61%,实现了知识图谱关系边的正确建立;③开发的问答系统实现了知识图谱的智能应用,其在多个不同类型问题上的回答准确率超过了90%,能够满足现场使用需求。结论认为,基于知识图谱的故障诊断方法能够有效利用顶部驱动装置的先验知识,实现故障的快速定位与智能诊断,具备良好的应用前景。