摘要
针对中国国家标准及ISO藏文编码字符集提出书面藏语字词的排序涉及藏字结构序、构造级和字符序概念 ,是不同于中文、英文序性而性质独特的一种排序 .文章详尽分析了藏字字形、结构形态、传统字符顺序以及藏字字长和层高等特征 ,构建出藏语排序的数学模型 .然后依据模型要求为每类藏文符号进行数字赋值 ,通过算法逐步确定字符位置并识别字符 ,最后按照抽取字符的对应数值组合排序 ,完成了藏语字词的排序 .该模型现已在Win dows平台上实现 .
According to GB16959-1997 and ISO/IEC 10646-1:1993 of coded character set for Tibetan information processing, there is an engineering need for applying the set to all kinds of software and databases, in which sorting is an important technology. As Tibetan sorting involves construction order, classes of constitution and character sequence in the dictionary order, A Written Tibetan word has an inconceivably complex structure with multi-hierarchies. The paper makes an exhaustive analysis to the structures of words, the order of construction categories, and the sequence of characters in each structural position, as well as the length of words and the hierarchies of vertical composition stacks, and then establishes a sorting mathematical model. On the basis of the analysis, the paper assigns distinctive values to all existing characters with numerals in a word, then step by step identifies each character in the words with special algorithm and match it with character-numeral lists. At last, the paper combines all the values extracted from characters of words and compares different combination to make an ordered arrangement for any words in Tibetan language. This processing strategy has been accomplished in Windows 2000/NT Operating System.
出处
《计算机学报》
EI
CSCD
北大核心
2004年第4期524-529,共6页
Chinese Journal of Computers
基金
国家自然科学基金 ( 60 173 0 2 4)资助
关键词
藏字
结构序
构造级
字符序
计算机排序
数学模型
written Tibetan
construction order
classes of constitution
character sequence
sorting by computer