摘要
针对低资源环境下由于标注训练数据不足、造成语音识别系统识别率急剧下降的问题,提出一种采用长短时记忆网络的低资源语音识别(LSTM-LRASR)方法。该方法采用长短时记忆网络构建声学模型,从特征提取、数据扩展及模型优化3个方面提高低资源语音识别性能。在特征提取方面,提取语言无关的高层稳健特征参数,降低声学模型对训练数据的依赖;在数据扩展方面,对已有标注数据进行语速扰动,对无标注数据进行自动识别,从而自动获取更多标注数据;在模型优化方面,通过序贯区分性训练技术提高模型对易混淆音素的区分能力,利用最小风险贝叶斯解码对多个系统进行融合,进一步提高识别性能。对OpenKWS16评测数据的实验结果表明,采用LSTMLRASR方法搭建的低资源语音识别系统的词错率相对基线系统下降了29.9%,所有查询词的查询项权重代价提升了60.3%。
A speech recognition method using long short-term memory network in low resources(LSTM-LRASR method)is proposed to solve the problem that the recognition rate of an auto speech recognition system is declining due to the lack of transcripted training data in low resource environments.The method uses long short-term memory network to construct an acoustic model,and improves the low resource speech recognition performance from three aspects.These are feature extraction,data augmentation and model optimization.The feature extraction extracts language-independent high-level robustness parameters to reduce the dependence of acoustic model on training data.The data augmentation processes the transcripted data by speed perturbation,while the untranscripted data is recognized automatically,so that more transcripted data are created.The model optimization uses the sequential discriminating training technique to improve the ability of distinguishing phonemes,and the minimum Bayes-risk decoding is used to combine multiple systems and to further improve the recognition performance.The experimentalresults on the OpenKWS16 evaluation database show that the word error rate of the low resource speech recognition system built by the proposed LSTM-LRASR method is 29.9% lower than that of the baseline system,and the actual value weighted value increases by 60.3%.
作者
舒帆
屈丹
张文林
周利莉
郭武
SHU Fan;QU Dan;ZHANG Wenlin;ZHOU Lili;GUO Wu(Institute of Information System Engineering,PLA Information Engineering University,Zhengzhou 450002, China;Institute of Information Science and Technology,University of Science and Technology of China, Hefei 230026, China)
出处
《西安交通大学学报》
EI
CAS
CSCD
北大核心
2017年第10期120-127,共8页
Journal of Xi'an Jiaotong University
基金
国家自然科学基金资助项目(61673395
61403415
61302107)
河南省自然科学基金资助项目(162300410331)
关键词
语音识别
低资源
长短时记忆
神经网络
speech recognition
low resource
long short-term memory
neural network