期刊文献+

融合通道-时间注意力和深度可分离卷积的欺骗语音检测

Spoof Speech Detection with Channel-temporal Attention and Depthwise Separable Convolutions
在线阅读 下载PDF
导出
摘要 自动说话人验证系统在应对日益逼真的深度伪造语音时,面临显著的欺骗攻击威胁。现有基于卷积神经网络的反欺骗模型在捕捉全局特征与应对未知类型语音伪造的泛化性能方面存在不足。为提升反欺骗检测效果,提出了一种融合通道-时间注意力机制与深度可分离卷积的网络模型CT-DSCNet。该模型在RawNet2基础上引入通道-时间注意力模块,增强对重要语音特征的关注,减少无关区域的干扰;同时采用深度可分离卷积残差块,优化计算效率与模型实时性。实验在AS-Vspoof2019、ASVspoof2021和FMFCC-A数据集上进行,结果显示CT-DSCNet在ASVspoof2019 LA测试集上的等错误率(equal error rate,EER)达到1.53%,较基线模型降低70.58%。在泛化能力方面相较其他模型也表现出色,在FMFCC-A评估集上的EER,较改进前模型相比提高了25.35%。实验验证了该方法在提升伪造语音检测性能和跨数据集适应性方面的有效性。 The growing sophistication of deepfake speech poses significant security threats to ASV(automatic speaker verification)systems.Current anti-spoofing models based on CNNs(convolutional neural networks)are constrained by inadequate global feature extraction and limited generalization capability against unseen spoofing attacks.To address these challenges,a novel network architecture integrating CT-DSCNet(channel-temporal attention mechanisms with depthwise separable convolutions)was proposed.Building upon the RawNet2 framework,the developed model incorporates dual-domain attention modules to enhance discriminative feature representation while suppressing irrelevant acoustic artifacts.Furthermore,depthwise separable convolutional residual blocks were strategically implemented to optimize computational efficiency and real-time processing capabilities.Comprehensive evaluations were conducted across three benchmark datasets:ASVspoof2019 LA,ASVspoof2021 DF,and FMFCC-A.Experimental results demonstrate state-of-the-art performance with EER(equal error rate)of 1.53% on ASVspoof2019 LA,representing a 70.58% relative improvement over baseline systems.Notably,the proposed architecture exhibits superior cross-dataset generalization,achieving a 25.35% lower EER on the FMFCC-A evaluation set compared with conventional approaches.These findings validate the effectiveness of the hybrid attention-convolution design in advancing spoofing detection robustness and domain adaptability.
作者 冯嘉琪 王华朋 刘天赐 FENG Jia-qi;WANG Hua-peng;LIU Tian-ci(College of Public Security Information Technology and Intelligence,Criminal Investigation Police University of China,Shenyang 110854,China)
出处 《科学技术与工程》 北大核心 2025年第22期9427-9435,共9页 Science Technology and Engineering
基金 国家重点研发计划(2017YFC0821000) 司法部司法鉴定重点实验室(司法鉴定科学研究院,KF202117) 中国刑事警察学院研究生创新能力提升项目(2024YCZD05)。
关键词 深度伪造语音 注意力机制 深度可分离卷积 语音反欺骗 deepfake speech attention mechanism depthwise separable convolution speech anti-spoofing
  • 相关文献

参考文献4

二级参考文献12

共引文献24

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部