期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
Efficient decoding self-attention for end-to-end speech synthesis
1
作者 Wei ZHAO Li XU 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2022年第7期1127-1138,共12页
Self-attention has been innovatively applied to text-to-speech(TTS)because of its parallel structure and superior strength in modeling sequential data.However,when used in end-to-end speech synthesis with an autoregre... Self-attention has been innovatively applied to text-to-speech(TTS)because of its parallel structure and superior strength in modeling sequential data.However,when used in end-to-end speech synthesis with an autoregressive decoding scheme,its inference speed becomes relatively low due to the quadratic complexity in sequence length.This problem becomes particularly severe on devices without graphics processing units(GPUs).To alleviate the dilemma,we propose an efficient decoding self-attention(EDSA)module as an alternative.Combined with a dynamic programming decoding procedure,TTS model inference can be effectively accelerated to have a linear computation complexity.We conduct studies on Mandarin and English datasets and find that our proposed model with EDSA can achieve 720%and 50%higher inference speed on the central processing unit(CPU)and GPU respectively,with almost the same performance.Thus,this method may make the deployment of such models easier when there are limited GPU resources.In addition,our model may perform better than the baseline Transformer TTS on out-of-domain utterances. 展开更多
关键词 Efficient decoding END-TO-END Self-attention Speech synthesis
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部