摘要
【目的】研究从文本中识别植物生长发育实体(Plant Growth and Development Stage Named Entity,PDSE)的抽取。【应用背景】PDSE从本质上来说是一种命名实体。目前有关命名实体的识别已经成为自然语言处理领域最有价值的基础技术之一,被广泛应用于多种自然语言处理系统中。【方法】采用基于条件随机场和规则的混合策略,提出并实现针对PDSE特征的CRF特征模板、特征函数以及抽取规则的方法,并利用PubMed数据库收录的论文进行抽取效果测试。【结果】实验表明本文提出的混合策略能取得较高的准确率和召回率。【结论】本研究对生物学文本抽取具有一定的借鉴意义。
[Objective] This paper researches in the extraction that identifies plant growth and development stage entity from text. [Context] PDSE is a kind of named entity essentially. Named entities recognition has become one of most valuable basic technologies in Natural Language Processing field, which is used widely in many Natural Language Processing systems. [Methods] It adopts multiple strategies based on conditional random field and rules, with putting forward and realizing a method of CRF template, characteristic function and extraction rules for the features of plant growth and development stage entity. Also, it tests the extraction effect by articles from the PubMed database. [Results] The experiment shows that the proposed hybrid strategies can obtain high accuracy and recall rate. [Conclusions] This research has a certain significant reference for biology text extraction.
出处
《现代图书情报技术》
CSSCI
北大核心
2014年第1期22-27,共6页
New Technology of Library and Information Service
基金
国家社会科学基金"面向知识服务的科学数据组织与应用研究"(项目编号:13CTQ035)
中央高校基本科研业务费资助项目"面向qRT-PCR实验的内参基因挖掘技术研究"(项目编号:KYZ201159)
南京农业大学SRT计划项目"基于混和策略的植物生长发育时期识别"(项目编号:1219A11)的研究成果之一