摘要
随着信息化的发展,在智能信息处理领域,对自然语言处理的要求在不断提高,其中命名实体识别是一项极其重要的研究课题。本文在对信息产业新闻本文深入地研究和分析的基础上,总结出了公司名称的基本特点,分别针对公司名全称和简称,设计了不同的两种标注方式,并提出了一种基于条件随机场的双模型两次扫描识别策略,第一次扫描使用公司名全称识别模型,同时提取出公司名关键字;第二次扫描利用第一次扫描中提取出的公司名关键词改善分词和词性标注结果,在此基础上使用公司名全简称识别模型对公司名进行识别。最终的实验结果表明这种识别方法是有效的。
With the development of information society, the recognition of named entity plays a signification role in intelligent information processing.Based on the investigations and analysis of the IT news articles, the structure features and contextual constraints were obtained.In this paper, after a careful distinction of company names into two categories, i.e.fiaU names and abbreviated names, two corresponding tagging methods are designed to represent this dichotomy and used to annotate a training corpus.This training corpus is then fed to a double-scan CRF-based company name identification system.In the first scan, flail names and the keyword of the company names are recognized and extracted.In the second scan, the flail names and the abbreviated names are identified based on the optimized segmentation and POS tagging result benefited from the first scan.The experimental results prove the effectiveness of this recognition method.
出处
《网络安全技术与应用》
2014年第4期13-14,共2页
Network Security Technology & Application
关键词
命名实体识别
信息抽取
公司名
条件随机场
Named Entity Identification
Information Extraction
Company Name
Conditional Random Fields