期刊文献+

改进Transformer解码器的端到端语音识别 被引量:1

End-to-end speech recognition with improved Transformer decoder
在线阅读 下载PDF
导出
摘要 Transformer模型架构在序列到序列任务中可以很好地将注意力分散到整个输入上以学习长期依赖关系,然而,在语音识别中,文本输出和语音输入是单调对齐的。针对Transformer解码器无法较好地捕获局部特征以进行单调对齐的问题,提出一种改进的Transformer解码器。将Transformer解码器中的2种注意力机制拆分为2个单独模块,再使用交叉注意力进行更高效的局部特征捕获。在开源中文普通话AISHELL-1数据集上的实验结果表明,使用能够捕获局部特征的编码器时,该解码器相较于Transformer解码器有着更好的识别效果。具体地,当编码器为Conformer时,字错误率(CER)降低了16.19%,且收敛速度更快,而在使用了连接时序分类(CTC)进行辅助解码后,CER降低了5.08%,最终的CER为4.67%。 The Transformer model architecture can distribute attention effectively across the entire input to learn long-term dependencies in sequence-to-sequence tasks.However,in speech recognition,the textual output and speech input are aligned monotonically.To address the issue that the Transformer decoder is difficult to capture local features for monotonic alignment,an improved Transformer decoder was proposed.In this decoder,the two attention mechanisms in the Transformer decoder were split into two separate modules,and cross-attention was utilized to capture local features more efficiently.Experimental results on the open-source Mandarin Chinese AISHELL-1 dataset demonstrate that when using an encoder that can capture local features,this decoder achieves better recognition performance compared to the Transformer decoder.Specifically,when the encoder is Conformer,the Character Error Rate(CER)is relatively reduced by 16.19%with faster convergence.Furthermore,after incorporating Connectionist Temporal Classification(CTC)as an auxiliary decoding method,the CER is further reduced by 5.08%,achieving a final CER of 4.67%.
作者 胡恒博 牛铜 何振华 HU Hengbo;NIU Tong;HE Zhenhua(Research and Development Department,Zhengzhou Xinda Institute of Advanced Technology,Zhengzhou Henan 450001,China;College of Information System Engineering,Information Engineering University,Zhengzhou Henan 450001,China)
出处 《计算机应用》 北大核心 2025年第S1期95-100,共6页 journal of Computer Applications
基金 国家自然科学基金资助项目(62171470)。
关键词 交叉注意力 Transformer解码器 Conformer编码器 语音识别 局部特征 cross-attention Transformer decoder Conformer encoder speech recognition local feature
  • 相关文献

同被引文献10

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部