摘要
提出了一种通过有向图和统计加规则的多层过滤方法来有效解决汉语分词过程中的交集型歧义切分问题,该方法大大提高了切分的正确率。经过65000字的开放语料测试,统计了其对交集型歧义字段的切分结果,发现该方法对交集型歧义字段的切分正确率为98.43%,以上数据表明该方法在解决汉语交集型歧义字段的问题时是行之有效的。
This paper presents a method that is based on directed graph plus statistic-based and rule-based means,this method effectively resolves the Chinese overlapped ambiguous segmentation.In an open test of a Chinese corpus with 65 000 characters, the accuracy of segmentation for ambiguous phrases of overlapped type reaches 98.43% ,this number proves that this method is very effective on resolving Chinese overlapped ambiguous segmentation.
出处
《计算机工程与应用》
CSCD
北大核心
2007年第11期175-177,共3页
Computer Engineering and Applications
基金
中国科学院知识创新工程重要方向项目(No.KGCX2-SW-511)。
关键词
有向图
统计模型
规则库
歧义字段
汉字切分
directed graph
statistical model
rule library
ambiguous phrase
Chinese word segmentation