摘要
并购重组类公告是上市公司进行信息披露的重要组成,属于具有一定格式规范的自由长文本。针对公告文本特点,借鉴降维思想,提出规则法和序列标注法相结合的联合信息抽取方案。采用规则法,抽取关键句子集合,将“篇章级”抽取缩小为“句子级”抽取;采用序列标注法,构建基于双向门控循环(BiGRU)网络和注意力机制(Attention)的序列标注模型,实现“句子级”到“字段级”的抽取。实验结果表明,该方案在并购重组类公告信息抽取任务中,取得了平均F1值0.92的较好结果,验证其具有一定的可行性和实用性。
The M&A and reorganization announcement is an important part of information disclosure of listed companies,which belongs to free-length text with a fixed format.According to the characteristics of the announcement text and the idea of dimensionality reduction,a joint information extraction scheme combining rule method and sequence annotation method was proposed.The rule method was used to extract key sentence sets,and the text level extraction was reduced to sentence level extraction.A sequence annotation model based on bidirectional gated recurrent unity(BiGRU)network and Attention mechanism was constructed using sequence annotation method to realize field level extraction from sentence level extraction.Experimental results show that the scheme achieves a good result with an average F1 value of 0.92 in the acquisition and reorganization announcement information extraction task,which verifies its feasibility and practicability.
作者
黄胜
李胜
朱菁
HUANG Sheng;LI Sheng;ZHU Jing(College of Communication and Information Engineering,Chongqing University of Posts and Telecommunications,Chongqing 400065,China;Key Laboratory of Optical Communications and Networking,Chongqing University of Posts and Telecommunications,Chongqing 400065,China;Data Center,Shenzhen Securities Information Limited Company,Shenzhen 518000,China)
出处
《计算机工程与设计》
北大核心
2020年第5期1420-1426,共7页
Computer Engineering and Design
基金
国家自然科学基金项目(61371096)。
关键词
信息抽取
自由长文本
规则法
序列标注法
双向门控循环网络
注意力机制
information extraction
free-long text
rule
sequence labeling
bidirectional gated recurrent unity(BiGRU)
attention mechanism