摘要
本文通过对汉语自动分词难点的分析,讨论了词频和词结合力的关系,提出了一套机械切分与语义校正的汉语自动分词方法。 系统包括建立绝对切分标志符表,变长度最大匹配法、2-3-1优先规则集、固有歧义切分和组合歧义切分校正方法等。最后列举了描述语义校正规则的实例。系统作为CETRAN.A的一部分,在SUN3-280工作站上实现。
By means of analysis to the difficulty of the Chinese Automatic segmenting words, this paper discussed the relation between the word frequency and combinational ability. Put forward a set of the Chinese automatic segmenting method, machine segmenting and semantic correction.The system has been set up the list of absolute segmenting marks; changable length maximum matching method;2-3-1 priority rule set; intrinsic ambiguous correction and combinational ambiguous correction, etc.Some examples used the rules are given. This system is a part of CETRAN.A and programmed in C language at SUN 3-280 workstation.
出处
《中文信息学报》
CSCD
1990年第1期37-43,共7页
Journal of Chinese Information Processing