摘要
从描述滑坡地质灾害的海量文本中抽取有价值的滑坡地质灾害实体是构建滑坡地质灾害知识图谱的基础.基于滑坡地质灾害勘察报告等相关非结构化文本数据,依据滑坡灾害机理分析滑坡地质灾害文本语言描述特点,制定了滑坡地质灾害语义信息的标注体系与标注规范,构建了面向滑坡地质灾害领域的语料库.基于该语料库的实体识别实验结果表明,命名实体识别模型的精确率、召回率和F1值均达到90%以上,验证了该语料库的适用性,可以为后续滑坡地质知识图谱的研究提供有力的数据支撑.
Extracting valuable landslide geohazard entities from massive texts describing landslide hazards is the basis for constructing a related knowledge graph.Based on unstructured text data such as landslide hazard investigation reports,this paper analyzes the linguistic description characteristics according to landslide mechanisms,formulates an annotation system and annotation specifications for semantic information related to landslides,and constructs a corpus tailored to the field of landslide hazards.Experimental results of entity recognition based on the constructed corpus show that the precision,recall rate,and F1-score of the named entity recognition model all exceed 90%,which verifies the applicability of the corpus and provides robust data support for subsequent research on knowledge graph of landslide geology.
作者
李秋荣
刘晓晓
王波
代文
崔雅婷
尚丹丹
刘元民
LI Qiurong;LIU Xiaoxiao;WANG Bo;DAI Wen;CUI Yating;SHANG Dandan;LIU Yuanmin(School of Remote Sensing&Geomatics Engineering,Nanjing University of Information Science&Technology,Nanjing 210044,China;Beijing Institute of Geological Disaster Prevention and Control,Beijing 100120,China;School of Geographical Sciences,Nanjing University of Information Science&Technology,Nanjing 210044,China;School of Spatial Informatics and Geomatics Engineering,Anhui University of Science&Technology,Huainan 232001,China;School of Geography,Nanjing Normal University,Nanjing 210023,China)
出处
《南京信息工程大学学报》
北大核心
2025年第4期601-610,共10页
Journal of Nanjing University of Information Science & Technology
基金
国家自然科学基金(42301478)
江苏高校哲学社会科学研究一般项目(2023SJYB0179)
江苏省研究生实践创新计划项目(SJCX24_0494)。
关键词
标注体系
滑坡地质灾害
语料库
命名实体识别
标注规范
annotation system
landslide hazards
text corpus
named entity recognition
annotation specifications