期刊文献+

基于大语言模型的文献综述智能生成与循证研究 被引量:3

Research on Intelligent Generation and Evidence-based of Literature Review Based on Large Language Model
原文传递
导出
摘要 【目的】自动生成带参考文献的结构化综述,辅助科研用户快速了解某一领域科研知识。【方法】选取NSTL平台7万篇论文,对摘要进行语步识别,构建语料库。通过大模型生成与人工修改构建3 000条综述数据,对GLM3-6B模型微调训练。通过将语料库转换成高维向量,利用索引存储向量,再向量检索实现LangChain外挂知识库。为弥补专有名词检索不佳的缺陷,混合BM25检索并重排序,提高检索精度。【结果】通过微调训练模型和混合检索框架构建综述生成系统,BLEU和Rouge-L得分提高了109.64%和40.22%,人工评估真实性得分提高62.17%。【局限】受计算资源限制,本地模型参数规模较小,生成能力有待提高。【结论】利用检索增强生成技术发挥大模型的优势,不仅可以生成高质量的文献综述,也为生成内容提供循证溯源,辅助科研人员智能阅读。 [Objective]This paper aims to generate structured literature reviews with references automatically,to assist researchers quickly grasp a specific area of scientific knowledge.[Methods]A corpus was constructed by selecting 70,000 papers from the NSTL platform and identifying moves in the abstracts.The GLM3-6B model was fine-tuned for training by generating 3,000 reviews using a large language model and then revising them manually.The corpus was then converted into high-dimensional vectors and stored in an index.These vectors were retrieved to implement LangChain’s external knowledge base.To solve the problem of poor retrieval of proper nouns,a hybrid search with BM25 was used and reordered to improve retrieval accuracy.[Results]Fine-tuning and hybrid retrieval frameworks were used to construct the literature review generation system,improving the BLEU and ROUGE scores by 109.64%and 40.22%respectively,as well as the authenticity score of manual evaluation by 62.17%.[Limitations]Due to limitations in computational resources,the scale of the local model parameters is small and its generation ability needs to be improved further.[Conclusions]The retrieval-augmented generation technique uses large language models not only generates high-quality literature reviews,and provides traceable evidence for the generated content,as well as assists researchers in intelligent reading.
作者 宋梦鹏 白海燕 Song Mengpeng;Bai Haiyan(Institute of Scientific and Technical Information of China,Beijing 100038,China)
出处 《数据分析与知识发现》 北大核心 2025年第6期21-34,共14页 Data Analysis and Knowledge Discovery
基金 中国科学技术信息研究所创新研究基金青年项目(项目编号:QN2024-15)的研究成果之一。
关键词 大语言模型 自动综述 检索增强生成 Large Language Model Automatic Review Retrieval-Augmented Generation
  • 相关文献

参考文献11

二级参考文献63

共引文献134

同被引文献30

引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部