摘要
随着深度伪造技术的飞速发展,虚假语音的制作越来越简便,给人们生活带来便利的同时,构成的威胁和风险也越来越大。虚假语音检测过程通常分为语音信号特征提取及分类训练两部分,该文对常用的语音特征进行了归纳,随着技术的进步,传统上较少采用的特征也能取得较好的检测效果;对后端分类模型的技术原理和优劣势进行比较分析,发现深度学习方法优势明显,且多模型集成可提升泛化能力;对虚假语音常用数据集发展情况进行了介绍;对虚假语音检测技术存在的主要问题以及下一步重点研究方向进行了讨论。
With the rapid development of deep forgery technology,the production of false speech has become increasingly simple,bringing convenience to people’s lives,while also posing increasing threats and risks.The process of false speech detection is usually divided into two parts:speech signal feature extraction and classification training.This article summarizes commonly used false speech features,and with the advancement of technology,traditionally less commonly used features can also achieve good detection results.A comparative analysis was conducted on the technical principles and advantages and disadvantages of backend classification model design.The method based on deep learning has obvious advantages,and the integration of multiple models has better generalization ability.Introduced the development of commonly used datasets for forged speech.The main problems of forged speech detection technology and the next key research directions were discussed.
作者
张文俊
卫霞
王勇
令宇豪
ZHANG Wenjun;WEI Xia;WANG Yong;LING Yuhao(Shaanxi Branch of National Computer Emergency Network Response Technical Team and Coordination Center,Xi’an 710075,China;Department of Information Engineering,Xi’an Mingde Institute of Technology,Xi’an 710124,China;Information Industry Wireless Communication Product Quality Supervision and Inspection Center,Xi’an 710061,China)
出处
《电子设计工程》
2025年第14期18-23,28,共7页
Electronic Design Engineering
基金
陕西省社会科学基金(2020M014)
西安明德理工学院科研基金项目(2022XY01L01)。
关键词
深度伪造
语音检测
深度学习
机器学习
特征提取
Deepfake
speech detection
deep learning
machine learning
feature extraction