摘要
本文证明了在以Zipf定律描述整个汉字字频分布时 ,不管如何精心挑选参数a和c,一些累计拟合频率都有明显的误差。针对这一现象 ,本文提出了一个解决办法 ,那就是以Zipf定律仅描述汉字字频分布的尾部的方法。
This paper first proves that when the Zipf's law is used to describe the total distribution of Chinese Character frequency,some fitting frequence accumulations take obvious errors no matter how elaborately the parameters a and care selected,and then presents a method to overcome the shortcoming by using the Zipf's law to describe only the tail of the distribution.
出处
《中文信息学报》
CSCD
北大核心
2000年第3期60-65,共6页
Journal of Chinese Information Processing
关键词
计量语言模型
汉字字频分布
Zipf定律
拟合频率
Computationl language model Distribution of Chinese character frequency Zipf's law Fitting frequency