期刊文献+

采用长短时记忆网络的低资源语音识别方法 被引量:20

A Speech Recognition Method Using Long Short-Term Memory Network in Low Resources
在线阅读 下载PDF
导出
摘要 针对低资源环境下由于标注训练数据不足、造成语音识别系统识别率急剧下降的问题,提出一种采用长短时记忆网络的低资源语音识别(LSTM-LRASR)方法。该方法采用长短时记忆网络构建声学模型,从特征提取、数据扩展及模型优化3个方面提高低资源语音识别性能。在特征提取方面,提取语言无关的高层稳健特征参数,降低声学模型对训练数据的依赖;在数据扩展方面,对已有标注数据进行语速扰动,对无标注数据进行自动识别,从而自动获取更多标注数据;在模型优化方面,通过序贯区分性训练技术提高模型对易混淆音素的区分能力,利用最小风险贝叶斯解码对多个系统进行融合,进一步提高识别性能。对OpenKWS16评测数据的实验结果表明,采用LSTMLRASR方法搭建的低资源语音识别系统的词错率相对基线系统下降了29.9%,所有查询词的查询项权重代价提升了60.3%。 A speech recognition method using long short-term memory network in low resources(LSTM-LRASR method)is proposed to solve the problem that the recognition rate of an auto speech recognition system is declining due to the lack of transcripted training data in low resource environments.The method uses long short-term memory network to construct an acoustic model,and improves the low resource speech recognition performance from three aspects.These are feature extraction,data augmentation and model optimization.The feature extraction extracts language-independent high-level robustness parameters to reduce the dependence of acoustic model on training data.The data augmentation processes the transcripted data by speed perturbation,while the untranscripted data is recognized automatically,so that more transcripted data are created.The model optimization uses the sequential discriminating training technique to improve the ability of distinguishing phonemes,and the minimum Bayes-risk decoding is used to combine multiple systems and to further improve the recognition performance.The experimentalresults on the OpenKWS16 evaluation database show that the word error rate of the low resource speech recognition system built by the proposed LSTM-LRASR method is 29.9% lower than that of the baseline system,and the actual value weighted value increases by 60.3%.
作者 舒帆 屈丹 张文林 周利莉 郭武 SHU Fan;QU Dan;ZHANG Wenlin;ZHOU Lili;GUO Wu(Institute of Information System Engineering,PLA Information Engineering University,Zhengzhou 450002, China;Institute of Information Science and Technology,University of Science and Technology of China, Hefei 230026, China)
出处 《西安交通大学学报》 EI CAS CSCD 北大核心 2017年第10期120-127,共8页 Journal of Xi'an Jiaotong University
基金 国家自然科学基金资助项目(61673395 61403415 61302107) 河南省自然科学基金资助项目(162300410331)
关键词 语音识别 低资源 长短时记忆 神经网络 speech recognition low resource long short-term memory neural network
  • 相关文献

参考文献3

二级参考文献39

  • 1Cui X, Xue J, Dognin P L, et al. Acoustic modeling with bootstrap and restructuring for low-resoureed languages[ C ]// Interspeech. 2010:2 974-2 977.
  • 2Vu N T, Sehlippe T, Kraus F, et al. Rapid bootstrapping of five eastern european languages using the rapid language adaptation toolkit[C] //tnterspeech. 2010: 865-868.
  • 3Rabiner L R. A Tutorial on hidden markov models and selected applications in speech recognition[ J]. Proceedings of IEEE, 1989, 77(2) :257-286.
  • 4Davis S, Mermclstein P. Comparison of parametric representations formonosyllable word recognition in continuously spoken sentences [ J ]. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1980, 28 (4) : 357-366.
  • 5Povey D, Burgct L, Agarwal M, et al. Subspaee Gaussian mixture models for speech recognition [ C ]//Acoustics Speech and Signal Processing (ICASSP) , 2010 IEEE International Conference on. IEEE, 2010:4 330-4 333.
  • 6Povey D, Burget L, Agarwa M, et a. The subspace Gaussian mixture model: a structured model for speech recognition[ J]. Computer Speech and Language, 2011, 25 (2) :404-439.
  • 7Dahl G, Yu D, Deng L, et al. Context-dependent pre-trained deep neural networks for large vocabulary speech recognition [ J ]. IEEE Trans on Audio, Speech and LanguageProcessing, 2012, 20( 1 ): 30-42,.
  • 8Seide F, Li G, Yu D. Conversational speech transcription using context-dependent deep neural networks [ C ] // Interspeech. 2011 : 437-440.
  • 9Normandin Y. Hidden Markov models, maximum mutual information estimation, and the speech recognition problem IDa. Canada: McGill University, 1991.
  • 10He X D, Deng L, Chou W. Discriminative learning in sequential pattern recognition [ J ]. IEEE Signal Processing Magazine, 2008, 14 ( 1 ) : 14-36.

共引文献16

同被引文献142

引证文献20

二级引证文献113

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部