期刊文献+

利用未标注语料改进实体名识别性能 被引量:3

Named Entity Recognition with Un-annotated Data
在线阅读 下载PDF
导出
摘要 本文主要介绍了一个利用最大熵进行实体名识别的系统以及所采用的模型和选取的特征。这些特征包括单词本身的词法词态特征和上下文信息。利用这些在任何语言的文本上都极易获得的特征 ,我们采用最大熵分类器构建了一个基准系统。在此基础上 ,我们首先通过网络资源建立了实体名词典知识库 ;并利用词典和基准系统在未标注语料上抽取出现的实体名作为辅助的训练语料 ;最后再将这些语料加入训练。实验结果表明 ,辅助的训练语料能够在一定程度上提高系统的性能。 In this paper, we describe a system that applies maximum entropy (ME) models to the task of named entity recognition (NER). We start with an annotated corpus and a set of features. These features include the morphological features and context information. They are easily obtainable for almost any language. We build a baseline NE recognizer based on these features. We first construct a named entity thesaurus based on the information on the web. Then the baseline together with the thesaurus is used to extract the named entities and their context information from additional un annotated data. In turn, these data are incorporated into the final recognizer. The experiments prove that these data could further improve the recognition accuracy.
出处 《中文信息学报》 CSCD 北大核心 2005年第2期7-11,27,共6页 Journal of Chinese Information Processing
基金 国家自然科学基金资助项目 (6 0 10 30 14 ) 上海市科委重点研究项目资助 (0 35 115 0 2 8)
关键词 计算机应用 中文信息处理 实体名识别 最大熵 未标注语料 computer application Chinese information processing named entity recognition maximum entropy un-annotated data
  • 相关文献

参考文献11

  • 1Andrew Borthwick. A Maximum Entropy Approach to Named Entity Recognition[ D]. PhD thesis, New York University.
  • 2Nancy Chinchor, Erica Brown, Lisa Ferro, and Patty Robinson. Named Entity Recognition Task Definition[Z].
  • 3Erik F. Tjong Kim Sang and Fien De Meulder. Introduction to the CoNLL- 2003 Shared Task: Language-Independent Named Entity Recognition [A]. In: Proceedings of CoNLL- 2003[C].
  • 4T. Zhang and D. Johnson. 2003. A robust risk minimization based named entity recognition system[ A]. In: Proceedings of CoNLL- 2003[ C].
  • 5GuoDong Zhou Jian Su. 2002. Named Entity Recognition using an HMM-based Chunk Tagger[ A]. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL)[C].
  • 6Hai Leong Chieu and Hwee Tou Ng. 2003. Named Entity Recognition with a Maximum Entropy Approach[ A]. In:Proceedings of CoNLL- 2003[C].
  • 7A.L. Berger, S. A. Della Pietra, and V. J. Della Pietra: A maximum entropy approach to natural language processing[ A]. Computational Linguistics[ C], 22( 1 ) :39 - 72, March.
  • 8Hai Leong Chieu and Hwee Tou Ng. 2002. Named Entity Recognition: A Maximum Entropy Approach Using Global Information[ A]. In: Proceedings of the Nine-teenth International Conference on Computational Linguistics[ C], pages190- 196.
  • 9A. Blum and T. Mitchell, 1998. Combining Labeled and Unlabeled Data with Co-Training[A], Proceedings of the11th Annual Conference on Computational Learning Theory (COLT-98)[ C].
  • 10K. Nigam, et al., 2000. Text Classification from Labeled and Unlabeled Documents using EM[ A], Machine Learning[C], 39, 103- 134.

同被引文献35

引证文献3

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部