摘要
提出了关联词搭配模式自动发现的基本方法。建立一个大规模语料库,然后作分词处理,并对关联词进行自动标注和人工校对;评估关联词搭配的三个重要参数(搭配距离、搭配强度MI值、搭配强度Z值),并设定阈值,超过阈值的格式自动作为候选搭配模式。通过实验,标注的准确率为88.75%,表明本方法具有较好效果。运用该方法,发现了以往大量未被注意的句法搭配模式,对研制高质量的关联词知识库起到了积极的促进作用,对复句句法、语义的自动分析具有重要的意义。
This paper provided a method of the automatic discovery of the conjunctions' collocation pattern. Built a large corpus, and it was tagged by a Chinese automatic segmenting system, and tagged and proofed the connects words artificially. Set a threshold, and regard the collocation whose parameters were above of the value as candidates for the collocation pattern. The accuracy of tagging was 88.75% ,which indicated that this method was feasible. Many syntactic patterns are discoved in the research which will promot buliding a top-quality knowledge base of connects words. And it has vital significance in automatic analysis of the syntactic and semantic of compund sentences.
出处
《计算机应用研究》
CSCD
北大核心
2011年第12期4426-4428,4432,共4页
Application Research of Computers
基金
国家自然科学基金资助项目(60703008)
国家重点实验室开放研究基金资助项目(SKLSE04-018)
教育部人文社科重点研究基地重大资助项目(10JJD740012)
湖北省科技攻关资助项目(2007AA101C49)
关键词
语料库
关联词
搭配
自动发现
corpus
conjunction
collocation pattern
automatic discovery