期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
标签同步解码算法及其在语音识别中的应用 被引量:10
1
作者 陈哲怀 郑文露 +2 位作者 游永彬 钱彦旻 俞凯 《计算机学报》 EI CSCD 北大核心 2019年第7期1511-1523,共13页
自动语音识别(Automatic Speech Recognition,ASR)等序列标注任务的一个显著特点是其对相邻帧的时序序列关联性建模.用于对相邻帧进行时序建模的主流序列模型包括隐马尔可夫模型(Hidden Markov Model, HMM)和连接时序模型(Connectionist... 自动语音识别(Automatic Speech Recognition,ASR)等序列标注任务的一个显著特点是其对相邻帧的时序序列关联性建模.用于对相邻帧进行时序建模的主流序列模型包括隐马尔可夫模型(Hidden Markov Model, HMM)和连接时序模型(Connectionist Temporal Classification,CTC).针对这些模型,当前主流的推理方法是帧层面的维特比束搜索算法,该算法复杂度很高,限制了语音识别的广泛应用.深度学习的发展使得更强的上下文和历史建模成为可能.通过引入blank单元,端到端建模系统能够直接预测标签在给定特征下的后验概率.该文系统地提出了一系列方法,通过使用高效的blank结构和后处理方法,使得搜索解码过程从逐帧同步变为标签同步.该系列通用方法在隐马尔可夫模型和连接时序模型上均得到了验证.结果表明,在Switchboard数据集上,不损失性能的前提下,实验取得了2~4倍的加速.该文同时研究了搜索空间、候选序列剪枝、转移模型、降帧率等对加速比的影响,并在所有情况下取得一致性加速。 展开更多
关键词 自动语音识别 隐马尔可夫模型 连接时序模型 逐帧同步解码 标签同步解码 可变帧率 剪枝
在线阅读 下载PDF
Binary neural networks for speech recognition 被引量:1
2
作者 Yan-min QIAN Xu XIANG 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2019年第5期701-715,共15页
Recently, deep neural networks(DNNs) significantly outperform Gaussian mixture models in acoustic modeling for speech recognition. However, the substantial increase in computational load during the inference stage mak... Recently, deep neural networks(DNNs) significantly outperform Gaussian mixture models in acoustic modeling for speech recognition. However, the substantial increase in computational load during the inference stage makes deep models difficult to directly deploy on low-power embedded devices. To alleviate this issue,structure sparseness and low precision fixed-point quantization have been applied widely. In this work, binary neural networks for speech recognition are developed to reduce the computational cost during the inference stage.A fast implementation of binary matrix multiplication is introduced. On modern central processing unit(CPU)and graphics processing unit(GPU) architectures, a 5–7 times speedup compared with full precision floatingpoint matrix multiplication can be achieved in real applications. Several kinds of binary neural networks and related model optimization algorithms are developed for large vocabulary continuous speech recognition acoustic modeling. In addition, to improve the accuracy of binary models, knowledge distillation from the normal full precision floating-point model to the compressed binary model is explored. Experiments on the standard Switchboard speech recognition task show that the proposed binary neural networks can deliver 3–4 times speedup over the normal full precision deep models. With the knowledge distillation from the normal floating-point models, the binary DNNs or binary convolutional neural networks(CNNs) can restrict the word error rate(WER) degradation to within 15.0%,compared to the normal full precision floating-point DNNs or CNNs, respectively. Particularly for the binary CNN with binarization only on the convolutional layers, the WER degradation is very small and is almost negligible with the proposed approach. 展开更多
关键词 Speech recognition BINARY neural net WORKS BINARY matrix MULTIPLICATION Knowledge DISTILLATION Population COUNT
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部