摘要
本文探讨汉语文本的0 阶统计模型的构造方法,提出了一个卓有成效的汉语文本压缩算法。仅仅凭借这一最初级的模型,汉语文本的编码效率已经超过LZ与Huffman 编码的混合算法。由于0 阶统计模型是各种高阶统计模型的基础,所以,本文对汉语以及其他大字符集文种( 如日文、朝鲜文)
This paper addressed the construction of a dynamic alphabet order 0 model of Chinese text for arithmetic coding and provided an algorithm of Chinese text compression.The model had shown to be of high performance because the algorithm driven by it could compress Chinese texts more efficiently than those that combined both LZ and Huffman coding.Because order 0 model laid the foundation of order n models,what the paper discussed was important to the text compression of any large alphabet natural language,such as Chinese,Japanese and Korean.
出处
《中文信息学报》
CSCD
北大核心
2000年第1期39-47,共9页
Journal of Chinese Information Processing
关键词
汉语文本
算术编码
统计模型
0阶模型
压缩算法
Data compression Chinese text compression Arithmetic coding Statistical model