期刊文献+

基于动态时间规整与Transformer的连续语音识别与发音校正算法

Continuous speech recognition and pronunciation correction algorithm based on DTW and Transformer
在线阅读 下载PDF
导出
摘要 针对传统动态时间规整(DTW)算法在大规模语音数据处理中效率低、非特定人识别鲁棒性不足,以及Transformer模型在短时语音时序对齐精度欠佳的问题,提出一种DTW与Transformer融合的连续语音识别与发音校正算法。该算法通过DTW实现短时语音帧的精准时序对齐,利用Transformer的多头注意力机制捕捉长时语音序列的全局依赖关系,构建“局部对齐-全局建模”的双层处理架构。在公开语音数据集TIMIT和自建语言学习发音数据集上的实验结果表明:所提算法的连续语音识别词错误率(WER)较传统DTW算法降低18.9%,较单一Transformer模型降低5.7%;发音校正的音素错误检出率达95.3%,实时响应延迟控制在280 ms以内,可以满足语言教育、智能评测等场景的应用需求。 In allusion to the limitations of traditional dynamic time warping(DTW)algorithms in large-scale speech processing,such as low efficiency,insufficient robustness for non-specific person recognition,and the poor accuracy of Transformer models in short-term speech alignment,a continuous speech recognition and pronunciation correction algorithm based on DTW-Transformer fusion is proposed.This algorithm can realize the precise temporal alignment of short-term speech frames by means of DTW,capture the global dependencies of long-term speech sequences by means of the multi-head attention mechanism of Transformer,and construct a two-layer processing architecture of"local alignment-global modeling".The experimental results on the public speech dataset TIMIT and proprietary speech learning pronunciation dataset reveal that the word error rate(WER)of the proposed algorithm in continuous speech recognition is 18.9%lower than that of the traditional DTW algorithm and 5.7%lower than that of the single Transformer model.The phoneme error detection rate for pronunciation correction can reach 95.3%,and the real-time response delay is controlled within 280 ms,which can meet the application requirements of scenarios such as language education and intelligent evaluation.
作者 潘桂妹 PAN Guimei(Zhanjiang University of Science and Technology,Zhanjiang 524094,China)
机构地区 湛江科技学院
出处 《现代电子技术》 北大核心 2025年第24期61-66,共6页 Modern Electronics Technique
基金 广东省教育厅项目(粤教高函[2023]4号-1097) 中国民办教育协会2025年度规划课题(青年课题)(CANQN250851) 湛江市哲学社会科学2025年度规划项目(ZJ25YB47)。
关键词 连续语音识别 发音校正 动态时间规整 TRANSFORMER 时序对齐 注意力机制 continuous speech recognition pronunciation correction dynamic time warping Transformer temporal alignment attention mechanism
  • 相关文献

参考文献18

二级参考文献90

共引文献44

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部