期刊文献+

实体提及的多层嵌套识别方法研究 被引量:4

Study on Multi-scale Nested Entity Mention Recognition
在线阅读 下载PDF
导出
摘要 实体识别在许多自然语言处理应用系统中发挥着极其重要的作用。目前大部分研究集中在命名实体识别,且不考虑实体之间的嵌套,本文在自动内容抽取评测(Automatic Content Extraction,ACE)背景下,对汉语文本中各种实体提及(命名性,名词性,代词性)的多层嵌套识别进行了研究。我们将嵌套实体识别分成两个子任务:嵌套实体边界检测和实体多层信息标注。首先,本文提出了一种层次结构信息编码方法,将多层嵌套边界检测问题转化为传统的序列标注问题,利用条件随机场模型融合多种特征进行统计决策。其次,将多层信息标注问题看作分类问题,从实现的角度设计了含有两个分类引擎的并行SVM分类器,避免了对每层信息标注都设计一个分类器,比采用单一分类器在性能上有明显提高。在标准ACE语料上的实验表明,基于条件随机场的多层实体边界检测模型正确率达到71%,融合特征选择策略的两个并行分类引擎的正确率也分别达到了89.05%和82.17%。 Entity recognition plays a significantly important role in many natural language processing applications. Previous study on entity recognition is mainly focused on the Named Entity Recognition (NER) and nested NEs are not considered. This paper proposes a multi-scale nested entity mention recognition system in the context of ACE (Automatic Content Extraction), which aims to identify named, nominal, pronominal mentions of entities within unstructured texts and assign multiple attributes for all the mentions. We separate this task into two subtasks: multiscale nested boundary detection and multiple information recognition. First, we propose a information encoding method for nested structure which provides an effective solution to recast the multi-scale nested boundary detection problem to the classical sequential labeling problem. Second, a parallel two-agent classifier is presented to conduct multiple information recognition for each entity mention. Furthermore, abundant multi-level linguistic features are integrated in our machine learning based framework to achieve competitive performance. We evaluate the proposed framework on ACE standard corpus by extensive experiments and obtain the accuracy of 71% for nested boundary detection, the accuracy of 89.05%, 82.17% for the two classification agents respectively.
出处 《中文信息学报》 CSCD 北大核心 2007年第2期14-21,共8页 Journal of Chinese Information Processing
基金 国家自然科学基金资助项目(60372016) 北京市自然科学基金资助项目(4052027)
关键词 人工智能 自然语言处理 实体提及嵌套识别 条件随机场 支持向量机 artificial intelligence natural language processing nested entity mention recognition conditional random fields support vector machine
  • 相关文献

参考文献16

  • 1Yi-Feng Lin, Tzong-Han Tsai, Wen-Chi Chou, Kuen-Pin Wu, Ting-Yi Sung, Wen-Lian Hsu. A Maximum Entropy Approach to Biomedical Named Entity Recognition[A]. In: Proceedings of the 4th ACM SIGKDD Workshop on Data Mining in Bioinformatics (BIOKDD2004)[C], 2004. 56-61.
  • 2刘非凡,赵军,吕碧波,徐波,于浩,夏迎炬.面向商务信息抽取的产品命名实体识别研究[J].中文信息学报,2006,20(1):7-13. 被引量:48
  • 3A. Borthwick, J. Sterling, E. Agichtein, and R.Grishman. Exploiting diverse knowledge sources via maximum entropy in named entity recognition[A]. In:Proceedings of Workshop on Very Large Corpora [C].ACL. 1998.
  • 4Youzheng Wu, Jun Zhao, Bo Xu, Hao Yu. Chinese Named Entity Recognition Model Based on Multiple Features[A]. In: the Proceedings of HLT/EMNLP2005 [C]. Vancouver, B.C., Canada, 2005. 427-434.
  • 5D. M. Bikel, S. Miller, R. Schwartz, and R. Weischedel. Nymble: a High-performance Learning Namefinder [A]. In: Proceedings of ANLP-97 [C], 1997.194-201.
  • 6Huaping Zhang, Qun Liu, Hongkui Yu, Xueqi Cheng, Shuo Bai. Chinese Named Entity Recognition Using Role Model [J]. Special Issue "Word Formation and Chinese Language processing" of the International Journal of Computational Linguistics and Chinese Language Processing, 2003, 8(2): 29-60.
  • 7Jian Sun, Jianfeng Gao, Lei Zhang, Ming Zhou,Changning Huang. Chinese Named Entity Identification Using Class-based Language Model [A]. In:COLING 2002 [C]. Taipei, 2002.
  • 8Lance E. Ramhsaw and Mitchel P. Marcus. Text Chunking Using Transformation Based Learning [A].In: Proceedings of the 3rd ACL Workshop on Very Large Corpora [C]. 1995. 82-94.
  • 9J. Lafferty, A. McCallum, F. Pereira. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data [A]. In: Proc. ICML-01[C]. 2001, 282-289.
  • 10Y.-W. Chen and C.-J. Lin. Combining SVMs with Various Feature Selection Strategies[M]. Feature extraction, foundations and applications, a (Eds.) Guyon,I., S. Gunn, M. Nikravesh, L. Zadeh, Springer-Verlag,Berkeley, Southampton, 2005.

二级参考文献9

  • 1John M.Pierre. Mining Knowledge from Text Collections Using Automatically Generated Metadata [A]. In: Proceedings of Fourth International Conference on Practical Aspects of Knowledge Management [C].London, UK: Springer-Verlag, 2002, 537- 548.
  • 2Bick, Eekhard. A Named Entity Recognizer for Danish[A]. In:IAno et al. (eds.), Proc. of 4th International Conf.on Language Resources and Evaluation(LRE2004)[C], Lisbon, 2004, 305-308.
  • 3Jian Sun, Jianfeng Gao, Lei Zhang, Ming Zhou, Changning Huang. Chinese Named Entity Identification Using Class-based Language Model [A]. In:Proceedings of the 19th international conference on Computational Linguistics[C]. Morristown, NJ, USA, Association for Computational Linguistics, 2002, 1 - 7.
  • 4Huaping Zhang, et al. Chinese NER Using Role Model [J]. Special Issue of the International Journal of Computational Linguistics and Chinese Language Processing, 2O03, 8(2):29 - 60.
  • 5Guohong Fu and Kang-Kwong Lake. Chinese Unknown Word Identification Using Clags-based LM[A]. In:Proceedings of the First International JointConference on Natural Language Processing (IJCNLP- 04) [C]. Hainan, China,2004, 262-269.
  • 6Tzong-Han Tsai, et al. Mencius: A Chinese Named Entity Recognizer Using the Maximum Entropy-based Hybrid Model [J]. International Journal of Computational Linguistics & Chinese Language Processing, 2004, 9(1):62- 82.
  • 7Cheng Niu, Wei Li, Jihong Ding and Rohini K. Srihari. A Bootstrapping Approach to Named Entity Classification Using Successive Learners [A]. In: Proceedings of the 41st ACL [C], Sappom, Japan, 2003, 335- 342.
  • 8Shai Fine, Yoram Singer, Naftali Tishby. (1998) The Hierarchical Hidden Markov Model: Analysis and Applications[J]. btachine Learning. 1998, 32(1): 41-62.
  • 9Y. Z. Wu, J. Zhao, B. Xu. Chinese Named Entity Recognition Combining Statistical Model with Human Knowledge[A]. Workshop of 41st ACL: nuhilingual and Mix-language NER[C], Sapporo, Japan, 2003, 65 - 72.

共引文献47

同被引文献63

引证文献4

二级引证文献57

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部