期刊文献+
共找到5篇文章
< 1 >
每页显示 20 50 100
Comparison of Different Implementations of MFCC 被引量:20
1
作者 郑方 张国亮 宋战江 《Journal of Computer Science & Technology》 SCIE EI CSCD 2001年第6期582-589,共8页
The performance of the Mel-Frequency Cepstrum Coefficients (MFCC) may be affected by (1) the number of filters, (2) the shape of filters, (3) the way in which filters are spaced, and (4) the way in which the power spe... The performance of the Mel-Frequency Cepstrum Coefficients (MFCC) may be affected by (1) the number of filters, (2) the shape of filters, (3) the way in which filters are spaced, and (4) the way in which the power spectrum is warped. In this paper, several compar- ison experiments are done to find a best implementation. The traditional MFCC calculation excludes the 0th coefficient for the reason that it is regarded as somewhat unreliable. According to the analysis and experiments, the authors find that it can be regarded as the generalized frequency band energy (FBE) and is hence useful, which results in the FBE-MFCC. The au- thors also propose a better analysis, namely the auto-regressive analysis, on the frame energy, which outperforms its 1st and/or 2nd order differential derivatives. Experiments with the '863' Speech Database show that, compared with the traditional MFCC with its corresponding auto- regressive analysis coefficients, the FBE-MFCC and the frame energy with their corresponding auto-regressive analysis coefficients form the best combination, reducing the Chinese syllable er- ror rate (CSER) by about 10%, while the FBE-MFCC with the corresponding auto-regressive analysis coefficients reduces CSER by 2.5%. Comparison experiments are also done with a quite casual Chinese speech database, named Chinese Annotated Spontaneous Speech (CASS) corpus. The FBE-MFCC can reduce the error rate by about 2.9% on an average. 展开更多
关键词 MFCC frequency band energy auto-regressive analysis generalized ini- tial/final
原文传递
Improving the Syllable-Synchronous Network SearchAlgorithm for Word Decoding in ContinuousChinese Speech Recognition 被引量:2
2
作者 郑方 武健 宋战江 《Journal of Computer Science & Technology》 SCIE EI CSCD 2000年第5期461-471,共11页
The previously proposed syllable-synchronous network search (SSNS) algorithm plays a very important role in the word decoding of the continuous Chinese speech recognition and achieves satisfying performance. Several r... The previously proposed syllable-synchronous network search (SSNS) algorithm plays a very important role in the word decoding of the continuous Chinese speech recognition and achieves satisfying performance. Several related key factors that may affect the overall word decoding effect are carefully studied in this paper, including the perfecting of the vocabulary, the big-discount Turing re-estimating of the N-Gram probabilities, and the managing of the searching path buffers. Based on these discussions, corresponding approaches to improving the SSNS algorithm are proposed. Compared with the previous version of SSNS algorithm, the new version decreases the Chinese character error rate (CCER) in the word decoding by 42.1% across a database consisting of a large number of testing sentences (syllable strings). 展开更多
关键词 large-vocabulary continuous Chinese speech recognition word decoding syllable- synchronous network search word segmentation
原文传递
A Method to Build a Super Small but Practically Accurate Language Model for Handheld Devices 被引量:2
3
作者 吴根清 郑方 《Journal of Computer Science & Technology》 SCIE EI CSCD 2003年第6期747-755,共9页
In this paper, an important question, whether a small language model can be practically accurate enough, is raised. Afterwards, the purpose of a language model, the problems that a language model faces, and the factor... In this paper, an important question, whether a small language model can be practically accurate enough, is raised. Afterwards, the purpose of a language model, the problems that a language model faces, and the factors that affect the performance of a language model,are analyzed. Finally, a novel method for language model compression is proposed, which makes the large language model usable for applications in handheld devices, such as mobiles, smart phones, personal digital assistants (PDAs), and handheld personal computers (HPCs). In the proposed language model compression method, three aspects are included. First, the language model parameters are analyzed and a criterion based on the importance measure of n-grams is used to determine which n-grams should be kept and which removed. Second, a piecewise linear warping method is proposed to be used to compress the uni-gram count values in the full language model. And third, a rank-based quantization method is adopted to quantize the bi-gram probability values. Experiments show that by using this compression method the language model can be reduced dramatically to only about 1M bytes while the performance almost does not decrease. This provides good evidence that a language model compressed by means of a well-designed compression technique is practically accurate enough, and it makes the language model usable in handheld devices. 展开更多
原文传递
Mandarin Pronunciation Modeling Based on CASS Corpus 被引量:1
4
作者 郑方 PascaleFung 《Journal of Computer Science & Technology》 SCIE EI CSCD 2002年第3期249-263,共15页
The pronunciation variability is an important issue that must be faced with when developing practical automatic spontaneous speech recognition systems .In this paper,the factors that may affect the recognition perform... The pronunciation variability is an important issue that must be faced with when developing practical automatic spontaneous speech recognition systems .In this paper,the factors that may affect the recognition performance are analyzed,inculding those specific to the Chinese language.By studyin the INITIAL/FINAL(IF)characteristics of Chinese language and developing the Bayesian equation,the concepts of generalized INITIAL/FINAL(GIF) and generalized syllable(GS),the GIF modeling and the IF-GIF modeling,as well as the context-dependent pronunciation weighting,are proposed based on a well phonetically transcribed seed database.By using these methods,the Chinese sylable error rate(SER) is reduced by 6.3% and 4.2% compared with the GIF modeling and IF modeling respectively when the language model,such as syllable or word N-gram,is not used.The effectiveness of these methods is also proved when more data without the phonetic transcription are used to refine the acoustic model using the proposed iterative forced-alignment based transcribing(IFABT)method,achieving a 5.7% SER reduction. 展开更多
原文传递
Speech Detection in Non—Stationary Noise Based on the 1/f Process
5
作者 王帆 郑方 《Journal of Computer Science & Technology》 SCIE EI CSCD 2002年第1期83-89,共7页
In this paper,an effective and robust active speech detection method is proposed based on the 1/f process technique for signals under non-stationary noisy environments.The Gaussian 1/f process ,a mathematical model fo... In this paper,an effective and robust active speech detection method is proposed based on the 1/f process technique for signals under non-stationary noisy environments.The Gaussian 1/f process ,a mathematical model for statistically self-similar radom processes based on fractals,is selected to model the speech and the background noise.An optimal Bayesian two-class classifier is developed to discriminate them by their 1/f wavelet coefficients with Karhunen-Loeve-type properties.Multiple templates are trained for the speech signal,and the parameters of the background noise can be dynamically adapted in runtime to model the variation of both the speech and the noise.In our experiments,a 10-minute long speech with different types of noises ranging from 20dB to 5dB is tested using this new detection method.A high performance with over 90% detection accuracy is achieved when average SNR is about 10dB. 展开更多
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部