Named entity recognition(NER)is essential in many natural language processing(NLP)tasks such as information extraction and document classification.A construction document usually contains critical named entities,and a...Named entity recognition(NER)is essential in many natural language processing(NLP)tasks such as information extraction and document classification.A construction document usually contains critical named entities,and an effective NER method can provide a solid foundation for downstream applications to improve construction management efficiency.This study presents a NER method for Chinese construction documents based on conditional random field(CRF),including a corpus design pipeline and a CRF model.The corpus design pipeline identifies typical NER tasks in construction management,enables word-based tokenization,and controls the annotation consistency with a newly designed annotating specification.The CRF model engineers nine transformation features and seven classes of state features,covering the impacts of word position,part-of-speech(POS),and word/character states within the context.The F1-measure on a labeled construction data set is 87.9%.Furthermore,as more domain knowledge features are infused,the marginal performance improvement of including POS information will decrease,leading to a promising research direction of POS customization to improve NLP performance with limited data.展开更多
目的:研究当前国内电子递交制度在药品注册全生命周期管理中的应用现状,分析面临的挑战,提出完善我国电子递交制度的建议。方法:采用文献研究和比较研究等方法,回顾并对比分析国内电子递交制度的发展历程及实施现状,结合我国实际情况,...目的:研究当前国内电子递交制度在药品注册全生命周期管理中的应用现状,分析面临的挑战,提出完善我国电子递交制度的建议。方法:采用文献研究和比较研究等方法,回顾并对比分析国内电子递交制度的发展历程及实施现状,结合我国实际情况,提出针对性的改进建议。结果与结论:我国的电子递交存在电子通用技术文档(electronic common technical document,eCTD)递交与非eCTD递交2种形式,目前处于两者并行的过渡阶段。但在实施过程中仍面临申请事项递交格式标准不统一、省级药品监管机构接收和审阅eCTD能力不足等问题。针对这些问题,本文提出了以下具体措施:①深化eCTD改革的进程,扩大实施范围。②强化药品从上市前到上市后管理的有效衔接。③加速药品全生命周期文件递交的标准化。通过这些措施的实施,以期加速推进我国药品注册申报全面电子化的进程,从而提升审评审批的效率与质量。展开更多
基金This work is supported by the National Natural Science Foundation of China(Grant No.71971196).
文摘Named entity recognition(NER)is essential in many natural language processing(NLP)tasks such as information extraction and document classification.A construction document usually contains critical named entities,and an effective NER method can provide a solid foundation for downstream applications to improve construction management efficiency.This study presents a NER method for Chinese construction documents based on conditional random field(CRF),including a corpus design pipeline and a CRF model.The corpus design pipeline identifies typical NER tasks in construction management,enables word-based tokenization,and controls the annotation consistency with a newly designed annotating specification.The CRF model engineers nine transformation features and seven classes of state features,covering the impacts of word position,part-of-speech(POS),and word/character states within the context.The F1-measure on a labeled construction data set is 87.9%.Furthermore,as more domain knowledge features are infused,the marginal performance improvement of including POS information will decrease,leading to a promising research direction of POS customization to improve NLP performance with limited data.
文摘目的:研究当前国内电子递交制度在药品注册全生命周期管理中的应用现状,分析面临的挑战,提出完善我国电子递交制度的建议。方法:采用文献研究和比较研究等方法,回顾并对比分析国内电子递交制度的发展历程及实施现状,结合我国实际情况,提出针对性的改进建议。结果与结论:我国的电子递交存在电子通用技术文档(electronic common technical document,eCTD)递交与非eCTD递交2种形式,目前处于两者并行的过渡阶段。但在实施过程中仍面临申请事项递交格式标准不统一、省级药品监管机构接收和审阅eCTD能力不足等问题。针对这些问题,本文提出了以下具体措施:①深化eCTD改革的进程,扩大实施范围。②强化药品从上市前到上市后管理的有效衔接。③加速药品全生命周期文件递交的标准化。通过这些措施的实施,以期加速推进我国药品注册申报全面电子化的进程,从而提升审评审批的效率与质量。