期刊文献+

基于大语言模型的燃气事故调查报告实体识别

Entity recognition of gas accident investigation reports based on large language model
在线阅读 下载PDF
导出
摘要 为解决样本稀少对大语言模型(LLM)在燃气事故调查报告中的实体识别精度影响显著的问题,提出1种基于两阶段训练的大语言模型实体识别方法。在数据集构建阶段,LLM根据对话式指令微调模板自动生成燃气事故调查报告数据集,采用简单数据增强(EDA)技术扩充人工标注的关键样本;在模型微调训练阶段,采用低秩适配微调技术对Phi3-mini-128k模型进行微调训练,第1阶段微调训练利用LLM自动标注数据集,在训练基础上利用增强数据集对模型进行第2阶段微调训练。研究结果表明:经过第1阶段微调训练后,Phi3-mini-rq模型的实体识别综合评价指标提高11.01百分点;当EDA增强数据占总数据的50%时,模型第2阶段微调效果最佳,综合评价指标值进一步提升2.49百分点。研究结果可为燃气领域的事故报告自动化处理提供有效技术支持。 In order to solve the problem of the significant impact of sample scarcity on the entity recognition accuracy of large language model(LLM)in gas accident investigation reports,a LLM entity recognition method based on two-stage training was proposed.In the dataset construction stage,LLM automatically generates the dataset of gas accident investigation reports according to the conversational instruction fine-tuning template,and adopts simple data augmentation(EDA)technique to expand manually labeled key paper and then manually annotate it.In the model fine-tuning training stage,the low-rank adaptation fine-tuning technique was adopted to conduct the fine-tuning training on the Phi3-mini-128k model.The first-stage fine-tuning training utilized LLM to automatically annotate the dataset,and the second-stage fine-tuning training wad carried out on the model by using the enhanced dataset on the basis of training.The results show that after the first-stage fine-tuning training,the comprehensive evaluation index of entity recognition of Phi3-mini-rq model is improved by 11.01%.When the EDA enhanced data accounts for 50%of the total data,the second-stage fine-tuning effect of the model is the best,and the value of comprehensive evaluation index is further improved by 2.49%.The research results can provide effective technical support for the automated processing of accident reports in the gas field.
作者 王明达 赵宝熙 吴志生 冷高强 WANG Mingda;ZHAO Baoxi;WU Zhisheng;LENG Gaoqiang(College of Mechanical and Electrical Engineering,China University of Petroleum,Qingdao Shandong 266580,China)
出处 《中国安全生产科学技术》 北大核心 2025年第2期139-145,共7页 Journal of Safety Science and Technology
基金 国家自然科学基金项目(52075549)。
关键词 燃气事故调查报告 命名实体识别 大语言模型 指令微调 数据增强 gas accident investigation report named entity recognition large language model instruction fine-tuning data enhancement
  • 相关文献

参考文献9

二级参考文献61

  • 1李妮,关焕梅,杨飘,董文永.基于BERT-IDCNN-CRF的中文命名实体识别方法[J].山东大学学报(理学版),2020,55(1):102-109. 被引量:66
  • 2刘群,张华平,俞鸿魁,程学旗.基于层叠隐马模型的汉语词法分析[J].计算机研究与发展,2004,41(8):1421-1429. 被引量:200
  • 3俞鸿魁,张华平,刘群,吕学强,施水才.基于层叠隐马尔可夫模型的中文命名实体识别[J].通信学报,2006,27(2):87-94. 被引量:168
  • 4罗智勇 宋柔.现代汉语自动分词中专名的一体化、快速识别方法[A]..ICCC,Singapore[C].,2001.11..
  • 5季姮,罗振声.基于反比概率模型和规则的中文姓名自动辨识系统[A].自然语言理解与机器翻译[C].北京:清华大学出版社,2001.123-128.
  • 6何燕.基于单字词转移概率的未登录词识别[A].自然语言理解与机器翻译[C].北京:清华大学出版社,2001 141-146.
  • 7张艳丽,黄德根等.统计和规则相结合的中文机构名称识别[A].自然语言理解与机器翻译[C].北京:清华大学出版社,2001.233-239.
  • 8SUN J,GAO J F,ZHANG L,et al.Chinese named entity identification using class-based language model[A].Proc of the 19th International Conference on Computational Linguistics[C].Taipei:Morgan Kauffmann Press,2002.967-973.
  • 9YU H,ZHANG H,LIU Q.Recognition of Chinese organization name based on role tagging[A].Advances in Computation of Oriental Languages[C].Beijing:Tsinghua University Press,2003.79-87
  • 10ZHANG H,LIU Q,YU H,et al.Chinese named entity recognition using role model[J].The International Journal of Computational Linguistics and Chinese Language Processing,2003,8(2):1-31.

共引文献293

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部