摘要
大规模、高质量的中文树库的建立对中文句法分析的发展有着重要的意义,但是对于字数多、结构层次复杂的中文复杂句的标注仍费时费力并且标注质量不高.这严重影响了中文树库的建立速度,阻碍了中文句法分析的发展.因此,该文提出一种融合了从底向上和自顶向下的中文复杂句标注方法,该方法将复杂句切分成结构较简单的块进行分析.实验表明,与传统的从底向上的标注方法相比,该文中的方法的校对速度快于传统方法,且整体差异率和分阶段的差异率降低了约20%,说明该文中的方法在对中文复杂度的标注是有效的且实用的.
Large-scale development of the establishment of high-quality Chinese Treebank of Chinese syntactic analysis has important significance,but for more words and Chinese complex sentences complex hierarchy of labels is still time-consuming and annotation quality is not high.It seriously affects the speed of the tree to build the library and hindersdevelopment of Chinese parsing.Therefore,this paper presents a blend of bottom-up and top-down complex sentences from tagging method and this method cuts the complex sentences into the blocks that have relatively simple structure to analyze.Experimental results showthat,compared with the mark from the bottom up approach,proofreading speed of this method is faster than traditional method,and the overall difference in the rate and phased difference was reduced by about 20%,indicating that the proposed method for Chinese complex sentences the label is effective and applied.
出处
《小型微型计算机系统》
CSCD
北大核心
2016年第4期716-721,共6页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61271304,61373075)资助
北京市教委科技发展计划重点项目标暨北京市自然科学基金B类重点项目(KZ2013112307)资助