In this paper,we summarize recent progresses made in deep learning based acoustic models and the motivation and insights behind the surveyed techniques.We first discuss models such as recurrent neural networks(RNNs) a...In this paper,we summarize recent progresses made in deep learning based acoustic models and the motivation and insights behind the surveyed techniques.We first discuss models such as recurrent neural networks(RNNs) and convolutional neural networks(CNNs) that can effectively exploit variablelength contextual information,and their various combination with other models.We then describe models that are optimized end-to-end and emphasize on feature representations learned jointly with the rest of the system,the connectionist temporal classification(CTC) criterion,and the attention-based sequenceto-sequence translation model.We further illustrate robustness issues in speech recognition systems,and discuss acoustic model adaptation,speech enhancement and separation,and robust training strategies.We also cover modeling techniques that lead to more efficient decoding and discuss possible future directions in acoustic model research.展开更多
This paper improves and presents an advanced method of the voice conversion system based on Gaussian Mixture Models(GMM) models by changing the time-scale of speech.The Speech Transformation and Representation using A...This paper improves and presents an advanced method of the voice conversion system based on Gaussian Mixture Models(GMM) models by changing the time-scale of speech.The Speech Transformation and Representation using Adaptive Interpolation of weiGHTed spectrum(STRAIGHT) model is adopted to extract the spectrum features,and the GMM models are trained to generate the conversion function.The spectrum features of a source speech will be converted by the conversion function.The time-scale of speech is changed by extracting the converted features and adding to the spectrum.The conversion voice was evaluated by subjective and objective measurements.The results confirm that the transformed speech not only approximates the characteristics of the target speaker,but also more natural and more intelligible.展开更多
With the popularity of adaptive multi-rate wideband (AMR-WB) audio in mobile communication, many AMR- WB based techniques, such as a similar compression architecture to transmit secret information during the process...With the popularity of adaptive multi-rate wideband (AMR-WB) audio in mobile communication, many AMR- WB based techniques, such as a similar compression architecture to transmit secret information during the process of compression, were proposed to transmit covert messages. However, if a sender does not have the original waveform audio format (WAV) audio, the architecture cannot be used. In this paper, a new covert message method, which takes effect after WAV audio is compressed into AMR-WB speech, is proposed. This method takes advantage of algebraic codebook search. Aiming at improving speed and reducing search space, it does not perform algebraic codebook search using the optimal search algorithm, and it does not reach the positions of non-zero pulses via depth-first tree search that characterizes the energy of audio. According to the features of search methods and the codebook index construction, every track in each subframe is analyzed to find the proper positions for embedding secret information. Experimental results show that the proposed method has satisfactory capacity and simplicity regardless of compression process.展开更多
文摘In this paper,we summarize recent progresses made in deep learning based acoustic models and the motivation and insights behind the surveyed techniques.We first discuss models such as recurrent neural networks(RNNs) and convolutional neural networks(CNNs) that can effectively exploit variablelength contextual information,and their various combination with other models.We then describe models that are optimized end-to-end and emphasize on feature representations learned jointly with the rest of the system,the connectionist temporal classification(CTC) criterion,and the attention-based sequenceto-sequence translation model.We further illustrate robustness issues in speech recognition systems,and discuss acoustic model adaptation,speech enhancement and separation,and robust training strategies.We also cover modeling techniques that lead to more efficient decoding and discuss possible future directions in acoustic model research.
基金Supported by the National Natural Science Foundation of China (No. 60872105)the Program for Science & Technology Innovative Research Team of Qing Lan Project in Higher Educational Institutions of Jiangsuthe Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD)
文摘This paper improves and presents an advanced method of the voice conversion system based on Gaussian Mixture Models(GMM) models by changing the time-scale of speech.The Speech Transformation and Representation using Adaptive Interpolation of weiGHTed spectrum(STRAIGHT) model is adopted to extract the spectrum features,and the GMM models are trained to generate the conversion function.The spectrum features of a source speech will be converted by the conversion function.The time-scale of speech is changed by extracting the converted features and adding to the spectrum.The conversion voice was evaluated by subjective and objective measurements.The results confirm that the transformed speech not only approximates the characteristics of the target speaker,but also more natural and more intelligible.
基金supported by the Fundamental Research Funds for the Central Universities (2016JX06)the National Natural Science Foundation of China (61472369)
文摘With the popularity of adaptive multi-rate wideband (AMR-WB) audio in mobile communication, many AMR- WB based techniques, such as a similar compression architecture to transmit secret information during the process of compression, were proposed to transmit covert messages. However, if a sender does not have the original waveform audio format (WAV) audio, the architecture cannot be used. In this paper, a new covert message method, which takes effect after WAV audio is compressed into AMR-WB speech, is proposed. This method takes advantage of algebraic codebook search. Aiming at improving speed and reducing search space, it does not perform algebraic codebook search using the optimal search algorithm, and it does not reach the positions of non-zero pulses via depth-first tree search that characterizes the energy of audio. According to the features of search methods and the codebook index construction, every track in each subframe is analyzed to find the proper positions for embedding secret information. Experimental results show that the proposed method has satisfactory capacity and simplicity regardless of compression process.