期刊文献+

基于TDNN-FSMN的蒙古语语音识别技术研究 被引量:6

Mongolian Speech Recognition Based on TDNN-FSMN
在线阅读 下载PDF
导出
摘要 为了提高蒙古语语音识别性能,该文首先将时延神经网络融合前馈型序列记忆网络应用于蒙古语语音识别任务中,通过对长序列语音帧建模来充分挖掘上下文相关信息;此外研究了前馈型序列记忆网络"记忆"模块中历史信息和未来信息长度对模型的影响;最后分析了融合的网络结构中隐藏层个数及隐藏层节点数对声学模型性能的影响。实验结果表明,时延神经网络融合前馈型序列记忆网络相比深度神经网络、时延神经网络和前馈型序列记忆网络具有更好的性能,单词错误率与基线深度神经网络模型相比降低22.2%。 In order to improve Mongolian speech recognition,the Time Delay Neural Network(TDNN)and Feedforward Sequential Memory Network(FSMN)are combined to model the long sequence speech frames.In addition,we investigate the influence caused by the information from the preceding and the subsequent frames in the memory block over FSMN.We compare the performance of the TDNN-LSTM using different hidden layers and nodes.The results show that the fusion of TDNN and FSMN produces better performance than DNN,TDNN and FSMN,reducing the word error rate(WER)by 22.2% compared with the DNN baseline.
作者 王勇和 飞龙 高光来 WANG Yonghe;BAO Feilong;GAO Guanglai(College of Computer Science,Inner Mongolia University,Hohhot,Inner Mongolia 010021,China)
出处 《中文信息学报》 CSCD 北大核心 2018年第9期28-34,共7页 Journal of Chinese Information Processing
基金 国家自然科学基金(61563040 61773224) 内蒙古自然科学基金(2016ZD06)
关键词 蒙古语 语音识别 时延神经网络 前馈型序列记忆网络 Mongolian speech recognition Time Delay Neural Network Feed-forward Sequential Memory Network
  • 相关文献

参考文献3

二级参考文献47

  • 1Y.F.Gong.Speech recognition in noisy environments:A survey[J].Speech Communication,1995,16:261-291.
  • 2S.Boll.Suppression of acoustic noise in speech using spectral subtraction[J].IEEE Transactions on Acoustics,Speech and Signal Processing,1979,27(2):113-120.In:Proceedings of IEEE International Conference on Acoustics,Acoustics and Signal Processing.
  • 3K.Paliwal and A.Basu.A speech enhancement method based on Kalman filtering[C]//Proceedings of 1987 IEEE International Conference on Acoustics,Acoustics and Signal Processing.Dallas,Texas,USA,1987:177-180.
  • 4Y.Ephraim and H.L.Van Trees.A signal subspace approach for speech enhancement[C]//Proceedings of 1993 IEEE International Conference on Acoustics,Acoustics and Signal Processing.Minneapolis,MN,USA,1993:355-358.
  • 5H.Lev-Ari,Y.Ephraim.Extension of the signal subspace speech enhancement approach to colored noise[J].IEEE Signal Processing Letters,2003,10(4):104-106.
  • 6S.Furui.Cepstral analysis technique for automatic speaker verification[J].IEEE Transactions on Acoustics,Speech and Signal Processing,1981,29(2):254-272.
  • 7O.Viikki and K.Laurila.Cepstral Domain Segmental Feature Vector Normalization for Noise Robust Speech Recognition[J].Speech Communication,1998,25:133-147.
  • 8A.de la Torre,A.M.Peinado,J.C.Segura et al.Histogram equalization of speech representation for robust speech recognition[J].IEEE Transactions on Acoustics,Speech and Signal Processing,2005,13(3):355-366.
  • 9S.H.Lin,Y.M.Yeh,and B.Chen.A Comparative Study of HEQ for Robust speech recognition[J].International Journal of Computational Linguistics and Chinese Language Processing,2007,12(2):217-238.
  • 10J.L.Gauvain and C.H.Lee.Maximum a posteriori estimation for multivariate Gaussian mixtureobservations of Markov chains[J].IEEE Transactions on Speech and Audio Processing,1994,2(2):291-298.

共引文献15

同被引文献38

引证文献6

二级引证文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部