期刊文献+

面向煤矿安全隐患文本的预训练语言模型构建

Construction of pre-training language model for coal mine safety hidden danger text
在线阅读 下载PDF
导出
摘要 煤矿各类安全管理信息化平台积累的大量非结构化文本数据目前并没有得到充分利用。为充分挖掘煤矿安全隐患文本知识,提出一种基于领域术语掩码语言建模(DP-MLM)和句子顺序预测建模(SOP)学习机制的煤矿安全领域预训练语言模型(CoalBERT)。利用收集到的110万余条煤矿隐患排查记录数据和自构建的1 328个领域术语词典进行模型训练,并在煤矿安全隐患文本分类和命名实体识别2个任务上分别进行对比实验。研究结果表明:在文本分类实验中,CoalBERT模型总体结果的精准率、召回率和综合评价指标F_(1)值较双向编码器表征法预训练模型(BERT)分别提高0.34%、0.21%、0.27%;在命名实体识别实验中,CoalBERT模型的精准率和F_(1)值较BERT模型分别提高3.84%、2.13%。CoalBERT模型能够有效提升煤矿安全隐患文本语义理解能力,可为煤矿安全领域文本挖掘相关任务场景提供基础参考。 At present,a large amount of unstructured text data accumulated by various safety management information platforms in coal mines has not been fully utilized.In order to fully explore the text knowledge of coal mine safety hidden danger,a pre-training language model(CoalBERT)based on the learning mechanism of domain term word-mask language modeling(DP-MLM)and sentence order predictive modeling(SOP)was proposed.The model was trained by the collected data of more than 1.1 million records of coal mine hidden danger investigation and the self-constructed dictionary of 1328 domain terms,and comparative experiments were conducted respectively on the two tasks of coal mine safety hidden danger text classification and named entity recognition.The research results show that in the text classification experiment,the accuracy rate,recall rate and F_(1) value of the overall results of the CoalBERT model are increased by 0.34%,0.21%and 0.27%respectively compared with the pre-training model of the bidirectional encoder representation from transformers(BERT).In the named entity recognition experiments,the accuracy rate and F_(1) value of CoalBERT model are 3.84%and 2.13%higher than BERT model,respectively.The CoalBERT model can effectively enhance the text semantic understanding ability of coal mine safety hidden danger text and can provide a basic reference for text mining related task scenarios in the field of coal mine safety.
作者 李泽荃 刘飞翔 赵嘉良 祁慧 李靖 LI Zequan;LIU Feixiang;ZHAO Jialiang;QI Hui;LI Jing(Hebei Key Laboratory of Mine Intelligent Unmanned Mining Technology,North China Institute of Science and Technology,Sanhe 065201,China;School of Mine Safety,North China Institute of Science and Technology,Sanhe 065201,China;School of Economics and Management,North China Institute of Science and Technology,Sanhe 065201,China;School of Energy and Mining,China University of Mining and Technology(Beijing),Beijing 100083,China)
出处 《矿业安全与环保》 北大核心 2025年第3期185-192,共8页 Mining Safety & Environmental Protection
基金 中央高校基本科研业务费项目(3142017107) 廊坊市科技计划项目(2023029061)。
关键词 BERT模型 煤矿安全隐患文本 文本分类 命名实体识别 预训练模型 任务微调 BERT model coal mine safety hidden danger text text classification named entity recognition pre-training model task fine-tuning
  • 相关文献

参考文献4

二级参考文献30

共引文献64

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部