期刊文献+

一种基于条件随机场的中文公司名识别方法 被引量:2

Method for Chinese Company Name Recognition Based on Conditional Random Fields
原文传递
导出
摘要 随着信息化的发展,在智能信息处理领域,对自然语言处理的要求在不断提高,其中命名实体识别是一项极其重要的研究课题。本文在对信息产业新闻本文深入地研究和分析的基础上,总结出了公司名称的基本特点,分别针对公司名全称和简称,设计了不同的两种标注方式,并提出了一种基于条件随机场的双模型两次扫描识别策略,第一次扫描使用公司名全称识别模型,同时提取出公司名关键字;第二次扫描利用第一次扫描中提取出的公司名关键词改善分词和词性标注结果,在此基础上使用公司名全简称识别模型对公司名进行识别。最终的实验结果表明这种识别方法是有效的。 With the development of information society, the recognition of named entity plays a signification role in intelligent information processing.Based on the investigations and analysis of the IT news articles, the structure features and contextual constraints were obtained.In this paper, after a careful distinction of company names into two categories, i.e.fiaU names and abbreviated names, two corresponding tagging methods are designed to represent this dichotomy and used to annotate a training corpus.This training corpus is then fed to a double-scan CRF-based company name identification system.In the first scan, flail names and the keyword of the company names are recognized and extracted.In the second scan, the flail names and the abbreviated names are identified based on the optimized segmentation and POS tagging result benefited from the first scan.The experimental results prove the effectiveness of this recognition method.
出处 《网络安全技术与应用》 2014年第4期13-14,共2页 Network Security Technology & Application
关键词 命名实体识别 信息抽取 公司名 条件随机场 Named Entity Identification Information Extraction Company Name Conditional Random Fields
  • 相关文献

参考文献6

  • 1孙镇,王惠临.命名实体识别研究进展综述[J].现代图书情报技术,2010(6):42-47. 被引量:102
  • 2王宁,葛瑞芳,苑春法,黄锦辉,李文捷.中文金融新闻中公司名的识别[J].中文信息学报,2002,16(2):1-6. 被引量:54
  • 3Lafferty, John D, ; McCallum, Andrew; Pereira, Fernando C.N. : Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data.In: Proceedings of the Eighteenth International Conference on Machine Learning ( ICML 2001 ), Morgan Kaufmann Publishers, 2001, pp.282-289.
  • 4张祝玉,任飞亮,朱靖波.基于条件随机场的中文命名实体识别特征比较研究[C].见:第4届全国信息检索与内容安全学术会议论文集.2008.
  • 5邱莎,王付艳,申浩如,段玻,阿圆,丁海燕.基于含边界词性特征的中文命名实体识别[J].计算机工程,2012,38(13):128-130. 被引量:7
  • 6黄利科 刘群.基于条件随机场的中文产品名自动识别方法.计算机应用研究,2008,:1829-1831.

二级参考文献35

共引文献159

同被引文献20

  • 1尹继豪,樊孝忠,于江德.基于类语言模型的中文机构名称自动识别[J].计算机科学,2006,33(11):212-214. 被引量:3
  • 2百度百科[EB/OL] .http://www.hudong.com/wiki/%E4% BA%9 l%E8%AE%A 1%E7%AE%97#3,2011.
  • 3叶琳莉,黄日茂.结合决策树方法的中文机构名称识别[J].福建电脑,2007,23(12):184-184. 被引量:4
  • 4郭建宏.重视科研机构评价[N].中国社会科学报,2014-08-13(A05).
  • 5中国农业大学[EB/OL].[2015-03-13].http://www.Call.edu.cn/col/co110247/index.html.
  • 6Applet D,et. al. SRI Internation FASTUS system MUC-6 test re- suits and analysis. Proceeding of the MUC-6[C]. NIST Morgan -Kaufmann Publisher, Columbia.
  • 7George Krupka and Kevin Hansman. IsoQuest Inc. : Description of the NetOwlTM Extractor System as Used for MUC-7 [ C ]. In Processing of the Seventh Message Understanding Conference, 1998.
  • 8Borthwick A. Maximum Entropy Approach to Named Entity Recognition. PhD Dissertation[D]. New York University, 1999 : 18-25.
  • 9Eunji Yi. SVM-based Biological Named Entity Recognition u- sing Minimum Edit-Distance Feature Boosted by Virtual Exam- ples[J]. IJCNLP, 2004:807-814.
  • 10Lafferty J, McCallum A, Pereira F. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data [A]//Proceedings of the 18th International Conference on Ma- chine Learning[C]. San Francisco, CA, USA: Morgan Kauf- mann Publishers Inc. ,2001:282-289.

引证文献2

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部