This paper studies two kinds of methods for pitch predictor in speech compressing coding, i.e., open-loop and closed-loop structures. Some of simplified approaches for solving pitch predictor equation are suggested, a...This paper studies two kinds of methods for pitch predictor in speech compressing coding, i.e., open-loop and closed-loop structures. Some of simplified approaches for solving pitch predictor equation are suggested, and the performances are compared under several conditions. The computer simulation results are shown.展开更多
The LPC “Linear Predictive Coding” algorithm is a widely used technique for voice coder. In this paper we present different implementations of the LPC algorithm used in the majority of voice decoding standard. The w...The LPC “Linear Predictive Coding” algorithm is a widely used technique for voice coder. In this paper we present different implementations of the LPC algorithm used in the majority of voice decoding standard. The windowing/autocorrelation bloc is implemented by three different versions on an FPGA Spartan 3. Allowing the possibility to integrate a Microblaze processor core a first solution consists of a pure software implementation of the LPC using this core RISC processor. Second solution is a pure hardware architecture implemented using VHDL based methodology starting from description until integration. Finally, the autocorrelation core is then proposed to be implemented using hardware/software (HW/SW) architecture with the existing processor. Each architecture performances are compared for different data lengths.展开更多
This paper presents a real-time implementation of 4.2Kb/s CELP speech coding on single DSP chip. An algorithm reducing search complexity for adaptive codebook is suggested; the solving method that the parameters are c...This paper presents a real-time implementation of 4.2Kb/s CELP speech coding on single DSP chip. An algorithm reducing search complexity for adaptive codebook is suggested; the solving method that the parameters are changed into LSP parameters is discussed. The realtime implementation process of this coding on a commercial development board with a single TMS320C30 is described.展开更多
In recent years, the accuracy of speech recognition (SR) has been one of the most active areas of research. Despite that SR systems are working reasonably well in quiet conditions, they still suffer severe performance...In recent years, the accuracy of speech recognition (SR) has been one of the most active areas of research. Despite that SR systems are working reasonably well in quiet conditions, they still suffer severe performance degradation in noisy conditions or distorted channels. It is necessary to search for more robust feature extraction methods to gain better performance in adverse conditions. This paper investigates the performance of conventional and new hybrid speech feature extraction algorithms of Mel Frequency Cepstrum Coefficient (MFCC), Linear Prediction Coding Coefficient (LPCC), perceptual linear production (PLP), and RASTA-PLP in noisy conditions through using multivariate Hidden Markov Model (HMM) classifier. The behavior of the proposal system is evaluated using TIDIGIT human voice dataset corpora, recorded from 208 different adult speakers in both training and testing process. The theoretical basis for speech processing and classifier procedures were presented, and the recognition results were obtained based on word recognition rate.展开更多
The performance of linear prediction analysis of speech deteriorates rapidly under noisy environments. To tackle this issue, an improved noise-robust sparse linear prediction algorithm is proposed. First, the linear p...The performance of linear prediction analysis of speech deteriorates rapidly under noisy environments. To tackle this issue, an improved noise-robust sparse linear prediction algorithm is proposed. First, the linear prediction residual of speech is modeled as Student-t distribution, and the additive noise is incorporated explicitly to increase the robustness, thus a probabilistic model for sparse linear prediction of speech is built, Furthermore, variational Bayesian inference is utilized to approximate the intractable posterior distributions of the model parameters, and then the optimal linear prediction parameters are estimated robustly. The experimental results demonstrate the advantage of the developed algorithm in terms of several different metrics compared with the traditional algorithm and the l1 norm minimization based sparse linear prediction algorithm proposed in recent years. Finally it draws to a conclusion that the proposed algorithm is more robust to noise and is able to increase the speech quality in applications.展开更多
A kind of Web voice browser based on improved synchronous linear predictive coding (ISLPC) and Text-toSpeech (TTS) algorithm and Internet application was proposed. The paper analyzes the features of TTS system wit...A kind of Web voice browser based on improved synchronous linear predictive coding (ISLPC) and Text-toSpeech (TTS) algorithm and Internet application was proposed. The paper analyzes the features of TTS system with ISLPC speech synthesis and discusses the design and implementation of ISLPC TTS-based Web voice browser. The browser integrates Web technology, Chinese information processing, artificial intelligence and the key technology of Chinese ISLPC speech synthesis. It's a visual and audible web browser that can improve information precision for network users. The evaluation results show that ISLPC-based TTS model has a better performance than other browsers in voice quality and capability of identifying Chinese characters.展开更多
This paper presented an approach to hide secret speech information in code excited linear prediction (CELP)-based speech coding scheme by adopting the analysis-by-synthesis (ABS)-based algorithm of speech information ...This paper presented an approach to hide secret speech information in code excited linear prediction (CELP)-based speech coding scheme by adopting the analysis-by-synthesis (ABS)-based algorithm of speech information hiding and extracting for the purpose of secure speech communication. The secret speech is coded in 2.4 Kb/s mixed excitation linear prediction (MELP), which is embedded in CELP type public speech. The ABS algorithm adopts speech synthesizer in speech coder. Speech embedding and coding are synchronous, i.e. a fusion of speech information data of public and secret. The experiment of embedding 2.4 Kb/s MELP secret speech in G.728 scheme coded public speech transmitted via public switched telephone network (PSTN) shows that the proposed approach satisfies the requirements of information hiding, meets the secure communication speech quality constraints, and achieves high hiding capacity of average 3.2 Kb/s with an excellent speech quality and complicating speakers’ recognition.展开更多
In this paper,we present a comparison of Khasi speech representations with four different spectral features and novel extension towards the development of Khasi speech corpora.These four features include linear predic...In this paper,we present a comparison of Khasi speech representations with four different spectral features and novel extension towards the development of Khasi speech corpora.These four features include linear predictive coding(LPC),linear prediction cepstrum coefficient(LPCC),perceptual linear prediction(PLP),and Mel frequency cepstral coefficient(MFCC).The 10-hour speech data were used for training and 3-hour data for testing.For each spectral feature,different hidden Markov model(HMM)based recognizers with variations in HMM states and different Gaussian mixture models(GMMs)were built.The performance was evaluated by using the word error rate(WER).The experimental results show that MFCC provides a better representation for Khasi speech compared with the other three spectral features.展开更多
Although CELP coding has provided good quality synthetic speech at medium and low bit rates,the computation of an exhaustive search for stochastic codebook is extremely complex. This paper studies the exhaustive searc...Although CELP coding has provided good quality synthetic speech at medium and low bit rates,the computation of an exhaustive search for stochastic codebook is extremely complex. This paper studies the exhaustive search procedure for determining the optimum excitation,and develops an effective search method by using improved populating codebook as excitation source. The computational cost of CELP coder was reduced to 1/26 that of a conventional full-gaussian codebook search.展开更多
文摘This paper studies two kinds of methods for pitch predictor in speech compressing coding, i.e., open-loop and closed-loop structures. Some of simplified approaches for solving pitch predictor equation are suggested, and the performances are compared under several conditions. The computer simulation results are shown.
文摘The LPC “Linear Predictive Coding” algorithm is a widely used technique for voice coder. In this paper we present different implementations of the LPC algorithm used in the majority of voice decoding standard. The windowing/autocorrelation bloc is implemented by three different versions on an FPGA Spartan 3. Allowing the possibility to integrate a Microblaze processor core a first solution consists of a pure software implementation of the LPC using this core RISC processor. Second solution is a pure hardware architecture implemented using VHDL based methodology starting from description until integration. Finally, the autocorrelation core is then proposed to be implemented using hardware/software (HW/SW) architecture with the existing processor. Each architecture performances are compared for different data lengths.
文摘This paper presents a real-time implementation of 4.2Kb/s CELP speech coding on single DSP chip. An algorithm reducing search complexity for adaptive codebook is suggested; the solving method that the parameters are changed into LSP parameters is discussed. The realtime implementation process of this coding on a commercial development board with a single TMS320C30 is described.
文摘In recent years, the accuracy of speech recognition (SR) has been one of the most active areas of research. Despite that SR systems are working reasonably well in quiet conditions, they still suffer severe performance degradation in noisy conditions or distorted channels. It is necessary to search for more robust feature extraction methods to gain better performance in adverse conditions. This paper investigates the performance of conventional and new hybrid speech feature extraction algorithms of Mel Frequency Cepstrum Coefficient (MFCC), Linear Prediction Coding Coefficient (LPCC), perceptual linear production (PLP), and RASTA-PLP in noisy conditions through using multivariate Hidden Markov Model (HMM) classifier. The behavior of the proposal system is evaluated using TIDIGIT human voice dataset corpora, recorded from 208 different adult speakers in both training and testing process. The theoretical basis for speech processing and classifier procedures were presented, and the recognition results were obtained based on word recognition rate.
基金supported by the Natural Science Foundation of Jiangsu Province(BK2012510,BK20140074)the National Postdoctoral Foundation of China(20090461424)
文摘The performance of linear prediction analysis of speech deteriorates rapidly under noisy environments. To tackle this issue, an improved noise-robust sparse linear prediction algorithm is proposed. First, the linear prediction residual of speech is modeled as Student-t distribution, and the additive noise is incorporated explicitly to increase the robustness, thus a probabilistic model for sparse linear prediction of speech is built, Furthermore, variational Bayesian inference is utilized to approximate the intractable posterior distributions of the model parameters, and then the optimal linear prediction parameters are estimated robustly. The experimental results demonstrate the advantage of the developed algorithm in terms of several different metrics compared with the traditional algorithm and the l1 norm minimization based sparse linear prediction algorithm proposed in recent years. Finally it draws to a conclusion that the proposed algorithm is more robust to noise and is able to increase the speech quality in applications.
基金Supported by the National High-Technology Re-search and Development Program(2005AA122210) the National Out-standing Youth Foundation (60325104)
文摘A kind of Web voice browser based on improved synchronous linear predictive coding (ISLPC) and Text-toSpeech (TTS) algorithm and Internet application was proposed. The paper analyzes the features of TTS system with ISLPC speech synthesis and discusses the design and implementation of ISLPC TTS-based Web voice browser. The browser integrates Web technology, Chinese information processing, artificial intelligence and the key technology of Chinese ISLPC speech synthesis. It's a visual and audible web browser that can improve information precision for network users. The evaluation results show that ISLPC-based TTS model has a better performance than other browsers in voice quality and capability of identifying Chinese characters.
文摘This paper presented an approach to hide secret speech information in code excited linear prediction (CELP)-based speech coding scheme by adopting the analysis-by-synthesis (ABS)-based algorithm of speech information hiding and extracting for the purpose of secure speech communication. The secret speech is coded in 2.4 Kb/s mixed excitation linear prediction (MELP), which is embedded in CELP type public speech. The ABS algorithm adopts speech synthesizer in speech coder. Speech embedding and coding are synchronous, i.e. a fusion of speech information data of public and secret. The experiment of embedding 2.4 Kb/s MELP secret speech in G.728 scheme coded public speech transmitted via public switched telephone network (PSTN) shows that the proposed approach satisfies the requirements of information hiding, meets the secure communication speech quality constraints, and achieves high hiding capacity of average 3.2 Kb/s with an excellent speech quality and complicating speakers’ recognition.
基金supported by the Visvesvaraya Ph.D.Scheme for Electronics and IT students launched by the Ministry of Electronics and Information Technology(MeiTY),Government of India under Grant No.PhD-MLA/4(95)/2015-2016.
文摘In this paper,we present a comparison of Khasi speech representations with four different spectral features and novel extension towards the development of Khasi speech corpora.These four features include linear predictive coding(LPC),linear prediction cepstrum coefficient(LPCC),perceptual linear prediction(PLP),and Mel frequency cepstral coefficient(MFCC).The 10-hour speech data were used for training and 3-hour data for testing.For each spectral feature,different hidden Markov model(HMM)based recognizers with variations in HMM states and different Gaussian mixture models(GMMs)were built.The performance was evaluated by using the word error rate(WER).The experimental results show that MFCC provides a better representation for Khasi speech compared with the other three spectral features.
文摘Although CELP coding has provided good quality synthetic speech at medium and low bit rates,the computation of an exhaustive search for stochastic codebook is extremely complex. This paper studies the exhaustive search procedure for determining the optimum excitation,and develops an effective search method by using improved populating codebook as excitation source. The computational cost of CELP coder was reduced to 1/26 that of a conventional full-gaussian codebook search.