摘要
计算断电区间长度是铁路供电故障数据分析的一项重要工作,由于故障数据多采用非结构化文字描述,人工整理、计算断电区间长度耗费大量时间。基于有限状态机的自动化特征信息提取方法能够处理非结构化文字描述的供电故障信息,利用自然语言处理分词、正则表达式及模式匹配等技术,快速定位关键词位置,挖掘关联关系,自动、快速、准确地提取故障区间起始点与结束点等关键特征信息,再根据线路设备技术台账,查询、计算得到断电区间长度。试验结果表明,本方法占用资源少、耗时短,对测试样本数据提取的准确率较高,可大幅提升工作效率。
Calculating length of the power-off sections is an important work for analyzing power supply failure data of railways.As failure data is mainly described in unstructured texts,it takes a lot of time to manually sort out and calculate length of the power-off sections.This paper puts forward an automatic feature information extraction method based on the finite-state machine,with the following functions:Process power supply failure information described in unstructured texts;quickly position key words and explore correlations by using such technology as processing word segmentation,regular expressions and pattern matching by natural language;automatically,quickly and accurately extract the starting point and end point of the failure section and other key feature information;query and calculate length of the power-off section based on technical account of the equipment along the line.The experimental results show that the extraction method can significantly improve work efficiency by virtue of occupation of few resources,little time consumption and high accuracy in extracting data of test samples.
作者
杨涛存
郭剑峰
杜文然
徐贵红
YANG Taocun;GUO Jianfeng;DU Wenran;XU Guihong(Institute of Computing Technology,China Academy of Railway Sciences Corporation Limited,Beijing 100081,China;Infrastructure Inspection Research Institute,China Academy of Railway Sciences Corporation Limited,Beijing 100081,China;Postgraduate Department,China Academy of Railway Sciences,Beijing 100081,China)
出处
《中国铁路》
2020年第8期7-12,共6页
China Railway
基金
中国铁路总公司科技研究开发计划项目(P2018Z001、J2019Z001)。
关键词
有限状态机
供电故障
信息提取
铁路安全
finite-state machine
power supply failure
information extraction
railway safety