Automatic speaker recognition(ASR)systems are the field of Human-machine interaction and scientists have been using feature extraction and feature matching methods to analyze and synthesize these signals.One of the mo...Automatic speaker recognition(ASR)systems are the field of Human-machine interaction and scientists have been using feature extraction and feature matching methods to analyze and synthesize these signals.One of the most commonly used methods for feature extraction is Mel Frequency Cepstral Coefficients(MFCCs).Recent researches show that MFCCs are successful in processing the voice signal with high accuracies.MFCCs represents a sequence of voice signal-specific features.This experimental analysis is proposed to distinguish Turkish speakers by extracting the MFCCs from the speech recordings.Since the human perception of sound is not linear,after the filterbank step in theMFCC method,we converted the obtained log filterbanks into decibel(dB)features-based spectrograms without applying the Discrete Cosine Transform(DCT).A new dataset was created with converted spectrogram into a 2-D array.Several learning algorithms were implementedwith a 10-fold cross-validationmethod to detect the speaker.The highest accuracy of 90.2%was achieved using Multi-layer Perceptron(MLP)with tanh activation function.The most important output of this study is the inclusion of human voice as a new feature set.展开更多
为了准确识别气体绝缘开关柜(gas insulated switchgear,GIS)设备的异常工况,提出了一种基于加权梅尔频率谱系数单类支持向量机(Mel frequency cestrum coefficient-one class support vector machine,MFCC-OCSVM)和贝叶斯优化的门控循...为了准确识别气体绝缘开关柜(gas insulated switchgear,GIS)设备的异常工况,提出了一种基于加权梅尔频率谱系数单类支持向量机(Mel frequency cestrum coefficient-one class support vector machine,MFCC-OCSVM)和贝叶斯优化的门控循环单元(bidirectional gate recurrent unit,BiGRU)声纹识别算法。首先,利用基于F统计量的MFCC对声纹数据进行加权特征提取,突出重要特征并减弱噪声的影响,然后利用OCSVM对加权后的特征进行异常检测并去除异常值,提高数据质量。为解决样本不平衡问题,采用合成少数类过采样技术(synthetic minority over-sampling technique,SMOTE)进行声纹样本的均衡。最后,应用基于贝叶斯优化的BiGRU模型进行声纹识别。以某气体绝缘全封闭组合电器(gas insulated switchgear,GIS)为例,采集了20类不同工况下操纵机构的声音样本,与多种经典分类模型进行对比。结果显示,所提算法取得的最高平均识别准确率达到了92.8%,相比于自适应增强、朴素贝叶斯和线性判别分析算法分别提升了30.1%、14.7%和11.5%。通过消融实验进一步评估和验证了所提算法各个流程对声纹识别的实际效果和性能影响,研究成果可为GIS设备异常工况的声纹识别提供高效技术路线。展开更多
基金This work was supported by the GRRC program of Gyeonggi province.[GRRC-Gachon2020(B04),Development of AI-based Healthcare Devices].
文摘Automatic speaker recognition(ASR)systems are the field of Human-machine interaction and scientists have been using feature extraction and feature matching methods to analyze and synthesize these signals.One of the most commonly used methods for feature extraction is Mel Frequency Cepstral Coefficients(MFCCs).Recent researches show that MFCCs are successful in processing the voice signal with high accuracies.MFCCs represents a sequence of voice signal-specific features.This experimental analysis is proposed to distinguish Turkish speakers by extracting the MFCCs from the speech recordings.Since the human perception of sound is not linear,after the filterbank step in theMFCC method,we converted the obtained log filterbanks into decibel(dB)features-based spectrograms without applying the Discrete Cosine Transform(DCT).A new dataset was created with converted spectrogram into a 2-D array.Several learning algorithms were implementedwith a 10-fold cross-validationmethod to detect the speaker.The highest accuracy of 90.2%was achieved using Multi-layer Perceptron(MLP)with tanh activation function.The most important output of this study is the inclusion of human voice as a new feature set.