摘要
通过对藏文数词内部构词规律及外部边界信息进行分析,提出对藏文数词基本构件定义的方案。采取最优路径决策模型判断数词构件边界,然后通过有限自动机模型识别并翻译基本数词,最后用模板匹配算法处理复杂数词。结果表明,提出的方法对数词识别与翻译的F值达到98.73%,在藏汉机器翻译的测试集上的BLEU提高了2.64%。
The authors propose a definition of Tibetan number basic component through analyzing the inner structure and the boundary information. A best path decision was applied in judging basic component, then the number was recognized and translated by a finite automation model, finally a template matching algorithm was used for processing complicated number. The F-score of identification and translation is 98.73% and the BLEU score of Tibetan-Chinese translation obtains an improvement of 2.64%.
出处
《北京大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2013年第1期75-80,共6页
Acta Scientiarum Naturalium Universitatis Pekinensis
基金
863计划(2011AA01A207)资助
关键词
藏文
数词基本构件
自动机
数词识别
数词翻译
Tibetan
number basic component
automation
number indentification
number translation