摘要
采用增减特征分量的方法研究了MFCC各维倒谱分量对说话人识别和语音识别的贡献。使用DTW测度 ,在标准英文数字语音库上的实验表明 ,最有用的语音信息包含在MFCC分量C1到C12 之间 ,最有用的说话人信息包含在MFCC分量C2 到C16之间。MFCC分量C0 和C1包含有负作用的说话人信息 ,将其作为特征会引起识别率的降低。
The analysis of the relative importance of components of MFCC for both speech recognition and speaker recognition using DTW recognizer in various noise environments are given.For English digit and under the Euclidean distance definition,the experiment results show cepstral components from C 2 to C 16 contain the most useful speaker information,while C 0 and C 1 are usually harm to speaker recognition.Cepstral terms from C 1 to C 12 are found to contain the most useful speech information.In both tasks,the additive noise decreases the relative importance of low MFCC terms faster than that of the middle and high MFCC terms,and the decrement depends on the speech SNR.The channel distortion will deteriorate low terms more than the middle and high MFCC terms in both tasks,also.
出处
《北京大学学报(自然科学版)》
CAS
CSCD
北大核心
2001年第3期371-378,共8页
Acta Scientiarum Naturalium Universitatis Pekinensis
基金
国家自然科学基金! (6 96 35 0 5 0 )
北京市自然科学基金! (40 0 2 0 12 )
高等学校骨干教师资助!计划资助项目