A Long Short-Term Memory(LSTM) Recurrent Neural Network(RNN) has driven tremendous improvements on an acoustic model based on Gaussian Mixture Model(GMM). However, these models based on a hybrid method require a force...A Long Short-Term Memory(LSTM) Recurrent Neural Network(RNN) has driven tremendous improvements on an acoustic model based on Gaussian Mixture Model(GMM). However, these models based on a hybrid method require a forced aligned Hidden Markov Model(HMM) state sequence obtained from the GMM-based acoustic model. Therefore, it requires a long computation time for training both the GMM-based acoustic model and a deep learning-based acoustic model. In order to solve this problem, an acoustic model using CTC algorithm is proposed. CTC algorithm does not require the GMM-based acoustic model because it does not use the forced aligned HMM state sequence. However, previous works on a LSTM RNN-based acoustic model using CTC used a small-scale training corpus. In this paper, the LSTM RNN-based acoustic model using CTC is trained on a large-scale training corpus and its performance is evaluated. The implemented acoustic model has a performance of 6.18% and 15.01% in terms of Word Error Rate(WER) for clean speech and noisy speech, respectively. This is similar to a performance of the acoustic model based on the hybrid method.展开更多
认知神经心理学为阅读机制的探讨提供了大量的证据,认为不同阅读障碍是不同加工通道选择性受损的结果。近年来,基于联结主义的三角模型理论,研究者提出了主要系统假说(primary system hypothesis),认为阅读障碍是主要的认知系统(如视觉...认知神经心理学为阅读机制的探讨提供了大量的证据,认为不同阅读障碍是不同加工通道选择性受损的结果。近年来,基于联结主义的三角模型理论,研究者提出了主要系统假说(primary system hypothesis),认为阅读障碍是主要的认知系统(如视觉、语义和语音系统)受损导致的:表层障碍是因为语义系统受损导致的阅读困难,语音和深层障碍是语音和语义系统同时受损时综合症状的连续体。该理论认为各主要系统可能同时是多个认知活动的加工成分,一个系统的受损会影响所有与之相关的认知过程,从而把阅读障碍与其它认知功能障碍联系起来。统一的主要系统受损下对各种获得性阅读障碍形成机制在文中得到详细的解释。展开更多
In this paper,we summarize recent progresses made in deep learning based acoustic models and the motivation and insights behind the surveyed techniques.We first discuss models such as recurrent neural networks(RNNs) a...In this paper,we summarize recent progresses made in deep learning based acoustic models and the motivation and insights behind the surveyed techniques.We first discuss models such as recurrent neural networks(RNNs) and convolutional neural networks(CNNs) that can effectively exploit variablelength contextual information,and their various combination with other models.We then describe models that are optimized end-to-end and emphasize on feature representations learned jointly with the rest of the system,the connectionist temporal classification(CTC) criterion,and the attention-based sequenceto-sequence translation model.We further illustrate robustness issues in speech recognition systems,and discuss acoustic model adaptation,speech enhancement and separation,and robust training strategies.We also cover modeling techniques that lead to more efficient decoding and discuss possible future directions in acoustic model research.展开更多
基金supported by the Ministry of Trade,Industry & Energy(MOTIE,Korea) under Industrial Technology Innovation Program (No.10063424,'development of distant speech recognition and multi-task dialog processing technologies for in-door conversational robots')
文摘A Long Short-Term Memory(LSTM) Recurrent Neural Network(RNN) has driven tremendous improvements on an acoustic model based on Gaussian Mixture Model(GMM). However, these models based on a hybrid method require a forced aligned Hidden Markov Model(HMM) state sequence obtained from the GMM-based acoustic model. Therefore, it requires a long computation time for training both the GMM-based acoustic model and a deep learning-based acoustic model. In order to solve this problem, an acoustic model using CTC algorithm is proposed. CTC algorithm does not require the GMM-based acoustic model because it does not use the forced aligned HMM state sequence. However, previous works on a LSTM RNN-based acoustic model using CTC used a small-scale training corpus. In this paper, the LSTM RNN-based acoustic model using CTC is trained on a large-scale training corpus and its performance is evaluated. The implemented acoustic model has a performance of 6.18% and 15.01% in terms of Word Error Rate(WER) for clean speech and noisy speech, respectively. This is similar to a performance of the acoustic model based on the hybrid method.
文摘认知神经心理学为阅读机制的探讨提供了大量的证据,认为不同阅读障碍是不同加工通道选择性受损的结果。近年来,基于联结主义的三角模型理论,研究者提出了主要系统假说(primary system hypothesis),认为阅读障碍是主要的认知系统(如视觉、语义和语音系统)受损导致的:表层障碍是因为语义系统受损导致的阅读困难,语音和深层障碍是语音和语义系统同时受损时综合症状的连续体。该理论认为各主要系统可能同时是多个认知活动的加工成分,一个系统的受损会影响所有与之相关的认知过程,从而把阅读障碍与其它认知功能障碍联系起来。统一的主要系统受损下对各种获得性阅读障碍形成机制在文中得到详细的解释。
文摘In this paper,we summarize recent progresses made in deep learning based acoustic models and the motivation and insights behind the surveyed techniques.We first discuss models such as recurrent neural networks(RNNs) and convolutional neural networks(CNNs) that can effectively exploit variablelength contextual information,and their various combination with other models.We then describe models that are optimized end-to-end and emphasize on feature representations learned jointly with the rest of the system,the connectionist temporal classification(CTC) criterion,and the attention-based sequenceto-sequence translation model.We further illustrate robustness issues in speech recognition systems,and discuss acoustic model adaptation,speech enhancement and separation,and robust training strategies.We also cover modeling techniques that lead to more efficient decoding and discuss possible future directions in acoustic model research.