摘要
以《红楼梦》《毛泽东选集》《邓小平文选》为对象,利用CSW分词软件进行词语的切分,统计发现这些材料的词频都表现出Zipf分布规律.这个结果与包括英语、西班牙语、法语、希腊语,甚至古代语言Meroitic等在内的很多种语言的实证研究结果是一致的.以往基于字和多元字对而不是用词的研究方法是出现争论的主要原因.
A Zipf distribution of word use is found in Dream of the Red Chamber, Selected Works of Mao Tse-tung and Selected Works of Deng Xiaoping, after word extraction by CSW freeware. This is consistant with empirical researches from English, Spanish, French, Greek, and even the ancient Meroitic and other languages. To research Chinese character and Chinese character N-gram but not words is why debate arose in previous works.
出处
《北京师范大学学报(自然科学版)》
CAS
CSCD
北大核心
2009年第4期424-427,共4页
Journal of Beijing Normal University(Natural Science)
基金
国家自然科学基金资助项目(60534080
70871013)
关键词
ZIPF分布
汉语
词频
字频
Zipf distribution
Chinese language
word frequency
character frequency