针对移动机器人在室内环境下的定位问题,本文采用基于GFCC((Gammatone Frequency Cepstrum Coeffi-cient))特征提取BatSLAM模型,用指数压缩来模拟听觉系统的非线性特性,使用加海宁窗处理来减小回波信号所存在的边缘影响,使用离散余弦变...针对移动机器人在室内环境下的定位问题,本文采用基于GFCC((Gammatone Frequency Cepstrum Coeffi-cient))特征提取BatSLAM模型,用指数压缩来模拟听觉系统的非线性特性,使用加海宁窗处理来减小回波信号所存在的边缘影响,使用离散余弦变换来对耳蜗图进行有损数据压缩,从而提高耳蜗图的抗干扰能力,使用升半正弦倒谱提升来提高耳蜗图的鲁棒性,通过GFCC特征提取可以有效提高室内定位的精度和准确性。实验表明,基于GFCC特征提取Bat-SLAM模型,通过提高耳蜗图的抗干扰性和鲁棒性,可以有效的较小定位误差,从而提高移动机器人的定位精度和准确性。展开更多
This research is focused on a highly effective and untapped feature called gammatone frequency cepstral coefficients(GFCC)for the detection of COVID-19 by using the nature-inspired meta-heuristic algorithm of deer hun...This research is focused on a highly effective and untapped feature called gammatone frequency cepstral coefficients(GFCC)for the detection of COVID-19 by using the nature-inspired meta-heuristic algorithm of deer hunting optimization and artificial neural network(DHO-ANN).The noisy crowdsourced cough datasets were collected from the public domain.This research work claimed that the GFCC yielded better results in terms of COVID-19 detection as compared to the widely used Mel-frequency cepstral coefficient in noisy crowdsourced speech corpora.The proposed algorithm's performance for detecting COVID-19 disease is rigorously validated using statistical measures,F1 score,confusion matrix,specificity,and sensitivity parameters.Besides,it is found that the proposed algorithm using GFCC performs well in terms of detecting the COVID-19 disease from the noisy crowdsourced cough dataset,COUGHVID.Moreover,the proposed algorithm and undertaken feature parameters have improved the detection of COVID-19 by 5%compared to the existing methods.展开更多
为了提高低信噪比下语种识别的准确率,引入一种新的特征提取融合方法.在前端加入有声段检测,并基于人耳听觉感知模型提取伽玛通频率倒谱系数(Gammatone Frequency Cepstrum Coefficient,GFCC)特征,通过主成分分析对特征进行压缩、降噪,...为了提高低信噪比下语种识别的准确率,引入一种新的特征提取融合方法.在前端加入有声段检测,并基于人耳听觉感知模型提取伽玛通频率倒谱系数(Gammatone Frequency Cepstrum Coefficient,GFCC)特征,通过主成分分析对特征进行压缩、降噪,融合每个有声段的Teager能量算子倒谱参数,通过高斯混合通用背景模型进行语种识别验证.实验结果表明,在信噪比为-5~0 dB时,相对于基于对数梅尔尺度滤波器组能量特征方法,融合特征集方法对5种语言的识别率,分别提升了23.7%~34.0%,其他信噪比等级下识别率也有明显的提升.展开更多
对鸣笛声的准确识别是机动车鸣笛抓拍系统得以运用的关键。为了克服单一特征对鸣笛声表征不足的缺陷,提高识别的准确性,文章将Mel频率倒谱系数(Mel Frequency Cepstrum Coefficient,MFCC)与Gama频率倒谱系数(Gammatone Frequency Cepstr...对鸣笛声的准确识别是机动车鸣笛抓拍系统得以运用的关键。为了克服单一特征对鸣笛声表征不足的缺陷,提高识别的准确性,文章将Mel频率倒谱系数(Mel Frequency Cepstrum Coefficient,MFCC)与Gama频率倒谱系数(Gammatone Frequency Cepstrum Coefficient,GFCC)融合得到M-GFCC特征,并分别使用支持向量机(Support Vector Machines,SVM)和BP(Back Propagation,BP)神经网络算法进行分类。实验结果表明,与使用单一的MFCC特征相比,BP神经网络对鸣笛声识别的有效率提高了10.4%,SVM的有效率提高了4.4%;相较于单一的GFCC特征,BP神经网络的有效率提高了6.6%,SVM的有效率提高了4.2%,证明了该融合特征能提高鸣笛声识别准确性。展开更多
针对法庭说话人识别中待鉴定人员语音样本不足的问题,提出了一种新的对说话人自身变化性建模的替代性方法以及相应的方差控制算法。使用同条件下的参考数据库构建识别系统的多个相同说话人得分模型,代替检验需要的多个非同期的带检验人...针对法庭说话人识别中待鉴定人员语音样本不足的问题,提出了一种新的对说话人自身变化性建模的替代性方法以及相应的方差控制算法。使用同条件下的参考数据库构建识别系统的多个相同说话人得分模型,代替检验需要的多个非同期的带检验人员语音样本比较时的得分模型,以获得能反映说话人自身变化性的统计模型。基于目前最新的法庭证据评估的似然比证据强度评估体系,使用MFCC(Mel Frequency Cepstral Coefficients)和GFCC(Gammatone Frequency Cepstral Coefficients)特征对该方法的有效性进行了验证,并对上述特征进行了特征级和决策级融合。实验结果表明:该方法在纯净语音环境和噪声环境下都具有很高的识别率和稳定性,并且特征级融合能进一步提高识别系统的性能。展开更多
针对传统语音端点检测方法在噪声环境下鲁棒性较差以及对语音段检测效果不佳的问题,提出一种多特征融合的语音端点检测方法.首先,提取带噪语音信号的子带谱熵特征和基于Mel频率倒谱系数(Mel Frequency Cepstral Coefficient,MFCC)的投...针对传统语音端点检测方法在噪声环境下鲁棒性较差以及对语音段检测效果不佳的问题,提出一种多特征融合的语音端点检测方法.首先,提取带噪语音信号的子带谱熵特征和基于Mel频率倒谱系数(Mel Frequency Cepstral Coefficient,MFCC)的投影特征,并将Gammatone频率倒谱系数的第一维系数GFCC0特征应用到语音端点检测任务中;然后,对3类特征进行自适应加权融合,得到适用于端点检测的融合特征;最后,采用模糊C均值聚类自适应估计门限阈值,再通过双门限法得到端点检测的结果.所提方法和已有传统方法相比,在7种噪声环境下均取得了更好的端点检测结果,提升了语音端点检测的准确率,特别是在volvo噪声环境下的端点检测准确率可以达到94.5%以上.展开更多
针对传统的构音障碍诊断方法存在耗时高、成本高等问题,提出一种构音障碍语音的计算机自动识别方法。结合Gammatone频率倒谱系数(Gammatone Frequency Cepstrum Coefficients, GFCC)与常用声学特征形成组合声学特征,应用差分演化算法进...针对传统的构音障碍诊断方法存在耗时高、成本高等问题,提出一种构音障碍语音的计算机自动识别方法。结合Gammatone频率倒谱系数(Gammatone Frequency Cepstrum Coefficients, GFCC)与常用声学特征形成组合声学特征,应用差分演化算法进行特征选择,并使用逻辑回归分类器对构音障碍语音进行识别。将Torgo构音障碍语音数据库分成3个语音子集,分别是非词、短词语、限制句子集,提取24维GFCC和37维常用的声学特征构成组合声学特征,最后使用差分演化算法和逻辑回归分类器进行分类识别。实验表明:使用差分演化算法可以有效选择出具有更佳识别能力的特征,从而显著提高构音障碍识别率。在非词子集上的实验准确率达到98.18%,召回率为98.3%,精确率为98.3%。展开更多
The problem of disguised voice recognition based on deep belief networks is studied. A hybrid feature extraction algorithm based on formants, Gammatone frequency cepstrum coefficients(GFCC) and their different coeffic...The problem of disguised voice recognition based on deep belief networks is studied. A hybrid feature extraction algorithm based on formants, Gammatone frequency cepstrum coefficients(GFCC) and their different coefficients is proposed to extract more discriminative speaker features from the original voice data. Using mixed features as the input of the model, a masquerade voice library is constructed. A masquerade voice recognition model based on a depth belief network is proposed. A dropout strategy is introduced to prevent overfitting, which effectively solves the problems of traditional Gaussian mixture models, such as insufficient modeling ability and low discrimination. Experimental results show that the proposed disguised voice recognition method can better fit the feature distribution, and significantly improve the classification effect and recognition rate.展开更多
文摘针对移动机器人在室内环境下的定位问题,本文采用基于GFCC((Gammatone Frequency Cepstrum Coeffi-cient))特征提取BatSLAM模型,用指数压缩来模拟听觉系统的非线性特性,使用加海宁窗处理来减小回波信号所存在的边缘影响,使用离散余弦变换来对耳蜗图进行有损数据压缩,从而提高耳蜗图的抗干扰能力,使用升半正弦倒谱提升来提高耳蜗图的鲁棒性,通过GFCC特征提取可以有效提高室内定位的精度和准确性。实验表明,基于GFCC特征提取Bat-SLAM模型,通过提高耳蜗图的抗干扰性和鲁棒性,可以有效的较小定位误差,从而提高移动机器人的定位精度和准确性。
文摘This research is focused on a highly effective and untapped feature called gammatone frequency cepstral coefficients(GFCC)for the detection of COVID-19 by using the nature-inspired meta-heuristic algorithm of deer hunting optimization and artificial neural network(DHO-ANN).The noisy crowdsourced cough datasets were collected from the public domain.This research work claimed that the GFCC yielded better results in terms of COVID-19 detection as compared to the widely used Mel-frequency cepstral coefficient in noisy crowdsourced speech corpora.The proposed algorithm's performance for detecting COVID-19 disease is rigorously validated using statistical measures,F1 score,confusion matrix,specificity,and sensitivity parameters.Besides,it is found that the proposed algorithm using GFCC performs well in terms of detecting the COVID-19 disease from the noisy crowdsourced cough dataset,COUGHVID.Moreover,the proposed algorithm and undertaken feature parameters have improved the detection of COVID-19 by 5%compared to the existing methods.
文摘为了提高低信噪比下语种识别的准确率,引入一种新的特征提取融合方法.在前端加入有声段检测,并基于人耳听觉感知模型提取伽玛通频率倒谱系数(Gammatone Frequency Cepstrum Coefficient,GFCC)特征,通过主成分分析对特征进行压缩、降噪,融合每个有声段的Teager能量算子倒谱参数,通过高斯混合通用背景模型进行语种识别验证.实验结果表明,在信噪比为-5~0 dB时,相对于基于对数梅尔尺度滤波器组能量特征方法,融合特征集方法对5种语言的识别率,分别提升了23.7%~34.0%,其他信噪比等级下识别率也有明显的提升.
文摘对鸣笛声的准确识别是机动车鸣笛抓拍系统得以运用的关键。为了克服单一特征对鸣笛声表征不足的缺陷,提高识别的准确性,文章将Mel频率倒谱系数(Mel Frequency Cepstrum Coefficient,MFCC)与Gama频率倒谱系数(Gammatone Frequency Cepstrum Coefficient,GFCC)融合得到M-GFCC特征,并分别使用支持向量机(Support Vector Machines,SVM)和BP(Back Propagation,BP)神经网络算法进行分类。实验结果表明,与使用单一的MFCC特征相比,BP神经网络对鸣笛声识别的有效率提高了10.4%,SVM的有效率提高了4.4%;相较于单一的GFCC特征,BP神经网络的有效率提高了6.6%,SVM的有效率提高了4.2%,证明了该融合特征能提高鸣笛声识别准确性。
文摘为了解决传统径向基(Radial basis function,RBF)神经网络在语音识别任务中基函数中心值和半径随机初始化的问题,从人脑对语音感知的分层处理机理出发,提出利用大量无标签数据初始化网络参数的无监督预训练方式代替传统随机初始化方法,使用深度自编码网络作为语音识别的声学模型,分析梅尔频率倒谱系数(Mel Frequency Cepstrum Coefficient,MFCC)和基于Gammatone听觉滤波器频率倒谱系数(Gammatone Frequency Cepstrum Coefficient,GFCC)下非特定人小词汇量孤立词的抗噪性能。实验结果表明,深度自编码网络在MFCC特征下较径向基神经网络表现出更优越的抗噪性能;而与经典的MFCC特征相比,GFCC特征在深度自编码网络下平均识别率相对提升1.87%。
文摘针对法庭说话人识别中待鉴定人员语音样本不足的问题,提出了一种新的对说话人自身变化性建模的替代性方法以及相应的方差控制算法。使用同条件下的参考数据库构建识别系统的多个相同说话人得分模型,代替检验需要的多个非同期的带检验人员语音样本比较时的得分模型,以获得能反映说话人自身变化性的统计模型。基于目前最新的法庭证据评估的似然比证据强度评估体系,使用MFCC(Mel Frequency Cepstral Coefficients)和GFCC(Gammatone Frequency Cepstral Coefficients)特征对该方法的有效性进行了验证,并对上述特征进行了特征级和决策级融合。实验结果表明:该方法在纯净语音环境和噪声环境下都具有很高的识别率和稳定性,并且特征级融合能进一步提高识别系统的性能。
文摘针对传统语音端点检测方法在噪声环境下鲁棒性较差以及对语音段检测效果不佳的问题,提出一种多特征融合的语音端点检测方法.首先,提取带噪语音信号的子带谱熵特征和基于Mel频率倒谱系数(Mel Frequency Cepstral Coefficient,MFCC)的投影特征,并将Gammatone频率倒谱系数的第一维系数GFCC0特征应用到语音端点检测任务中;然后,对3类特征进行自适应加权融合,得到适用于端点检测的融合特征;最后,采用模糊C均值聚类自适应估计门限阈值,再通过双门限法得到端点检测的结果.所提方法和已有传统方法相比,在7种噪声环境下均取得了更好的端点检测结果,提升了语音端点检测的准确率,特别是在volvo噪声环境下的端点检测准确率可以达到94.5%以上.
文摘针对传统的构音障碍诊断方法存在耗时高、成本高等问题,提出一种构音障碍语音的计算机自动识别方法。结合Gammatone频率倒谱系数(Gammatone Frequency Cepstrum Coefficients, GFCC)与常用声学特征形成组合声学特征,应用差分演化算法进行特征选择,并使用逻辑回归分类器对构音障碍语音进行识别。将Torgo构音障碍语音数据库分成3个语音子集,分别是非词、短词语、限制句子集,提取24维GFCC和37维常用的声学特征构成组合声学特征,最后使用差分演化算法和逻辑回归分类器进行分类识别。实验表明:使用差分演化算法可以有效选择出具有更佳识别能力的特征,从而显著提高构音障碍识别率。在非词子集上的实验准确率达到98.18%,召回率为98.3%,精确率为98.3%。
基金supported by Natural Science Foundation of Liaoning Province (Nos. 2019-ZD-0168 and 2020-KF-12-11)Major Training Program of Criminal Investigation Police University of China (No. 3242019010)+1 种基金Key Research and Development Projects of Ministry of Science and Technology (No. 2017YFC0821005)Second Batch of New Engineering Research and Practice Projects(No. E-AQGABQ20202710)。
文摘The problem of disguised voice recognition based on deep belief networks is studied. A hybrid feature extraction algorithm based on formants, Gammatone frequency cepstrum coefficients(GFCC) and their different coefficients is proposed to extract more discriminative speaker features from the original voice data. Using mixed features as the input of the model, a masquerade voice library is constructed. A masquerade voice recognition model based on a depth belief network is proposed. A dropout strategy is introduced to prevent overfitting, which effectively solves the problems of traditional Gaussian mixture models, such as insufficient modeling ability and low discrimination. Experimental results show that the proposed disguised voice recognition method can better fit the feature distribution, and significantly improve the classification effect and recognition rate.