期刊文献+

一种适应域的汉语N-gram语言模型平滑算法 被引量:9

Smoothing algorithm of the task adaptation Chinese N gram model
原文传递
导出
摘要 针对基于汉语词的 Ngram 模型统计数据稀疏问题和应用域变化造成原统计模型识别性能降低,提出具有应用域适应能力的 Ngram 模型平滑算法。对两种应用域的语料进行了前、后向 0 到3 元文法统计,采用隐马尔可夫模型( H M M)在语音识别中的成功经验,由 Baum w elch 算法来获得优化权值,每个权值代表相关模型的统计可靠性。由前后向的3gram 模型可得到5gram 文法约束的平滑算法,以弥补统计矩阵数据的稀疏现象。将《人民日报》语料的统计结果作为先验统计结果,和《计算机世界》作为转换域的专业语料进行后继训练,得到一种适应应用域的3gram 模型。实验结果表明,前后向约束的3gram 文法得到的5gram 平滑可以较小的存储代价得到较高的文法约束。 Statistic data sparse problem of Chinese word N gram model and changing of application domains caused former statistic model low recognition performance. A Chinese N gram model smoothing algorithm of task adaptation ability was put forward. A 0 gram to 3 gram forward and backwards probability statistics models were built in two application domains, it adopted the success experience of HMM in speech recognition, to apply Baum welch algorithm for optimum of the weights. Each weight stands for reliability of the correlation statistic models. The 5 gram statistic probability smoothing algorithm was obtained from the forward and backwards 3 gram, in order to offset the matrix sparse data of statistic probability. The “People Daily” corpus statistic is regard as the preliminary result, and “PC World” as the corpus of the changing domain to carry on successive training, a 3 gram model of task adaptation is gotten. The experiment results show, the 5 gram model is obtained from forward and backwards 3 gram models that has a higher grammar restriction with less shortage cost, thus the perplexity of statistic models is decreased greatly.
出处 《清华大学学报(自然科学版)》 EI CAS CSCD 北大核心 1999年第9期99-102,共4页 Journal of Tsinghua University(Science and Technology)
基金 国家自然科学基金 教育部博士后重点科研基金
关键词 适应域 平滑算法 汉语语音识别 N-GRAM语言模型 gram model task adaptation smoothing algorithm 
  • 相关文献

参考文献1

  • 1Zhou M,IEICE Trans Inf Syst,1996年,E79卷,4期,333页

同被引文献56

引证文献9

二级引证文献65

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部