期刊文献+

藏语文本标准化方法

Tibetan text normalization method
原文传递
导出
摘要 针对现代藏语文本表征形式复杂多样且不规范,影响语音合成系统的性能问题,提出了具有易于维护及可扩展性特点的藏语文本标准化方法。首先,对藏文标记符号和来自其他语言的非藏文特殊符号在藏语文本中的不同表现形式进行了深度解析,并通过不同特征对特殊符号进行了分类;其次,根据归纳的不同类型,分别建立起了15种特殊符号转化为藏语的书写规则;最后,以13490个句子作为实验数据,通过藏语字音转换测试识别并检测文本中特殊符号和藏文音节的有效性,采用规则匹配的方法对含有特殊符号的句子进行标准化处理。实验结果表明:标准化之前藏语音素转写的遗漏率高达4.69%,而经过标准化之后音素转写的遗漏率降低到0.01%,其藏语文本标准化准确率达99%。 In view of the complexity and nonstandard representation of modern Tibetan text,which affects the performance of speech synthesis system,this paper proposes a Tibetan text standardization method with the characteristics of easy maintenance and scalability.Firstly,a deep analysis was conducted on the different manifestations of Tibetan marker symbols and non Tibetan special symbols from other languages in Tibetan texts,and the special symbols were classified based on different features.Secondly,according to the different types of induction,the writing rules for converting 15 special symbols into Tibetan language were respectively established.Finally,using 13490 sentences as the experimental data,the effectiveness of special symbols and Tibetan syllables in the text is identified and tested through the Tibetan graphemeto-phoneme conversion test,and the sentences containing special symbols are standardized by the method of rule matching.The experimental results show that the omission rate of Tibetan phoneme transcription before standardization was as high as 4.69%,but after standardization,the omission rate of phoneme transcription was reduced to 0.01%,and the standardization accuracy rate of Tibetan text reached 99%.
作者 拉巴顿珠 扎西多吉 珠杰 LHAKPA Dondrub;ZHAXI Duoji;ZHU Jie(School of Information Science and Technology,Xizang University,Lhasa 850000,China;Xizang Informatization Collaborative Innovation Center Jointly Built by the Province and the Ministry,Lhasa 850000,China)
出处 《吉林大学学报(工学版)》 CSCD 北大核心 2024年第12期3577-3588,共12页 Journal of Jilin University:Engineering and Technology Edition
基金 国家自然基金项目(62406256) 教育部人文社会科学研究项目(21YJCZH059) 2025年西藏自治区自然科学基金项目(ZRKX2025000068) 西藏大学在职攻读博士学位及博士后进站研究人员科研项目(zbds202326) 西藏大学培育计划项目(ZDQMJH20-09)。
关键词 计算机应用技术 藏语文本分析 文本标准化 语音合成 特殊符号 字音转换 computer application technology Tibetan text analysis text normalization text-to-speech special symbols grapheme-to-phoneme
  • 相关文献

参考文献4

二级参考文献31

共引文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部