Recent advances in artificial intelligence and the availability of large-scale benchmarks have made deepfake video generation and manipulation easier.Therefore,developing reliable and robust deepfake video detection m...Recent advances in artificial intelligence and the availability of large-scale benchmarks have made deepfake video generation and manipulation easier.Therefore,developing reliable and robust deepfake video detection mechanisms is paramount.This research introduces a novel real-time deepfake video detection framework by analyzing gaze and blink patterns,addressing the spatial-temporal challenges unique to gaze and blink anomalies using the TimeSformer and hybrid Transformer-CNN models.The TimeSformer architecture leverages spatial-temporal attention mechanisms to capture fine-grained blinking intervals and gaze direction anomalies.Compared to state-of-the-art traditional convolutional models like MesoNet and EfficientNet,which primarily focus on global facial features,our approach emphasizes localized eye-region analysis,significantly enhancing detection accuracy.We evaluate our framework on four standard datasets:FaceForensics,CelebDF-V2,DFDC,and FakeAVCeleb.The proposed framework results reveal higher accuracy,with the TimeSformer model achieving accuracies of 97.5%,96.3%,95.8%,and 97.1%,and with the hybrid Transformer-CNN model demonstrating accuracies of 92.8%,91.5%,90.9%,and 93.2%,on FaceForensics,CelebDF-V2,DFDC,and FakeAVCeleb datasets,respectively,showing robustness in distinguishing manipulated from authentic videos.Our research provides a robust state-of-the-art framework for real-time deepfake video detection.This novel study significantly contributes to video forensics,presenting scalable and accurate real-world application solutions.展开更多
As Deepfake technology continues to evolve,the distinction between real and fake content becomes increasingly blurred.Most existing Deepfake video detectionmethods rely on single-frame facial image features,which limi...As Deepfake technology continues to evolve,the distinction between real and fake content becomes increasingly blurred.Most existing Deepfake video detectionmethods rely on single-frame facial image features,which limits their ability to capture temporal differences between frames.Current methods also exhibit limited generalization capabilities,struggling to detect content generated by unknown forgery algorithms.Moreover,the diversity and complexity of forgery techniques introduced by Artificial Intelligence Generated Content(AIGC)present significant challenges for traditional detection frameworks,whichmust balance high detection accuracy with robust performance.To address these challenges,we propose a novel Deepfake detection framework that combines a two-stream convolutional network with a Vision Transformer(ViT)module to enhance spatio-temporal feature representation.The ViT model extracts spatial features from the forged video,while the 3D convolutional network captures temporal features.The 3D convolution enables cross-frame feature extraction,allowing the model to detect subtle facial changes between frames.The confidence scores from both the ViT and 3D convolution submodels are fused at the decision layer,enabling themodel to effectively handle unknown forgery techniques.Focusing on Deepfake videos and GAN-generated images,the proposed approach is evaluated on two widely used public face forgery datasets.Compared to existing state-of-theartmethods,it achieves higher detection accuracy and better generalization performance,offering a robust solution for deepfake detection in real-world scenarios.展开更多
With the rapid development of deepfake technology,the authenticity of various types of fake synthetic content is increasing rapidly,which brings potential security threats to people’s daily life and social stability....With the rapid development of deepfake technology,the authenticity of various types of fake synthetic content is increasing rapidly,which brings potential security threats to people’s daily life and social stability.Currently,most algorithms define deepfake detection as a binary classification problem,i.e.,global features are first extracted using a backbone network and then fed into a binary classifier to discriminate true or false.However,the differences between real and fake samples are often subtle and local,and such global feature-based detection algorithms are not optimal in efficiency and accuracy.To this end,to enhance the extraction of forgery details in deep forgery samples,we propose a multi-branch deepfake detection algorithm based on fine-grained features from the perspective of fine-grained classification.First,to address the critical problem in locating discriminative feature regions in fine-grained classification tasks,we investigate a method for locating multiple different discriminative regions and design a lightweight feature localization module to obtain crucial feature representations by augmenting the most significant parts of the feature map.Second,using information complementation,we introduce a correlation-guided fusion module to enhance the discriminative feature information of different branches.Finally,we use the global attention module in the multi-branch model to improve the cross-dimensional interaction of spatial domain and channel domain information and increase the weights of crucial feature regions and feature channels.We conduct sufficient ablation experiments and comparative experiments.The experimental results show that the algorithm outperforms the detection accuracy and effectiveness on the FaceForensics++and Celeb-DF-v2 datasets compared with the representative detection algorithms in recent years,which can achieve better detection results.展开更多
Deepfake has emerged as an obstinate challenge in a world dominated by light.Here,the authors introduce a new deepfake detection method based on Xception architecture.The model is tested exhaustively with millions of ...Deepfake has emerged as an obstinate challenge in a world dominated by light.Here,the authors introduce a new deepfake detection method based on Xception architecture.The model is tested exhaustively with millions of frames and diverse video clips;accuracy levels as high as 99.65%are reported.These are the main reasons for such high efficacy:superior feature extraction capabilities and stable training mechanisms,such as early stopping,characterizing the Xception model.The methodology applied is also more advanced when it comes to data preprocessing steps,making use of state-of-the-art techniques applied to ensure constant performance.With an ever-rising threat from fake media,this piece of research puts great emphasis on stringent memory testing to keep at bay the spread of manipulated content.It also justifies better explanation methods to justify the reasoning done by the model for those decisions that build more trust and reliability.The ensemble models being more accurate have been studied and examined for establishing a possibility of combining various detection frameworks that could together produce superior results.Further,the study underlines the need for real-time detection tools that can be effective on different social media sites and digital environments.Ethics,protecting privacy,and public awareness in the fight against the proliferation of deepfakes are important considerations.By significantly contributing to the advancements made in the technology that has actually advanced detection,it strengthens the safety and integrity of the cyber world with a robust defense against ever-evolving deepfake threats in technology.Overall,the findings generally go a long way to prove themselves as the crucial step forward to ensuring information authenticity and the trustworthiness of society in this digital world.展开更多
Deepfake poses significant threats to various fields,including politics,journalism,and entertainment.Although many defense methods against deepfake have been proposed based on either passive detection or proactive def...Deepfake poses significant threats to various fields,including politics,journalism,and entertainment.Although many defense methods against deepfake have been proposed based on either passive detection or proactive defense,few have achieved both passive detection and proactive defense.To address this issue,we propose a full-defense framework(FDF)based on cross-domain feature fusion and separable watermarks(SepMark)to achieve copyright protection and deepfake detection,combining the ideas of passive detection and proactive defense.The proactive defense module consists of one encoder and two separable decoders,where the encoder embeds one watermark into the protected face,and two decoders separately extract two watermarks with different robustness.The robust watermark can reliably trace the trusted marked face while the semi-robust watermark is sensitive to malicious distortions that make the watermark disappear after deepfake or watermark removal attack.The passive detection module fuses spatial-and frequency-domain features to further differentiate between deepfake content and watermark removal attacks in the absence of watermarks.The proposed cross-domain feature fusion involves substituting the "secondary" channels of spatial-domain features with the "primary" channels of frequency-domain features.Subsequently,the "primary"channels of spatial-domain features are used to replace the "secondary"channels of frequency-domain features.Extensive experiments demonstrate that our approach not only offers proactive defense mechanisms by using extracted watermarks,i.e.,source tracing and copyright protection,but also achieves passive detection when there are no watermarks,to further differentiate between deepfake content and watermark removal attacks,thereby offering a full-defense approach.展开更多
The majority of current deepfake detection methods are constrained to identifying one or two specific types of counterfeit images,which limits their ability to keep pace with the rapid advancements in deepfake technol...The majority of current deepfake detection methods are constrained to identifying one or two specific types of counterfeit images,which limits their ability to keep pace with the rapid advancements in deepfake technology.Therefore,in this study,we propose a novel algorithm,StereoMixture Density Network(SMNDNet),which can detect multiple types of deepfake face manipulations using a single network framework.SMNDNet is an end-to-end CNNbased network specially designed for detecting various manipulation types of deepfake face images.First,we design a Subtle Distinguishable Feature Enhancement Module to emphasize the differentiation between authentic and forged features.Second,we introduce aMulti-Scale Forged Region AdaptiveModule that dynamically adapts to extract forged features from images of varying synthesis scales.Third,we integrate a Nonlinear Expression Capability Enhancement Module to augment the model’s capacity for capturing intricate nonlinear patterns across various types of deepfakes.Collectively,these modules empower our model to efficiently extract forgery features fromdiverse manipulation types,ensuring a more satisfactory performance in multiple-types deepfake detection.Experiments show that the proposed method outperforms alternative approaches in detection accuracy and AUC across all four types of deepfake images.It also demonstrates strong generalization on cross-dataset and cross-type detection,along with robust performance against post-processing manipulations.展开更多
Rapid development of deepfake technology led to the spread of forged audios and videos across network platforms,presenting risks for numerous countries,societies,and individuals,and posing a serious threat to cyberspa...Rapid development of deepfake technology led to the spread of forged audios and videos across network platforms,presenting risks for numerous countries,societies,and individuals,and posing a serious threat to cyberspace security.To address the problem of insufficient extraction of spatial features and the fact that temporal features are not considered in the deepfake video detection,we propose a detection method based on improved CapsNet and temporal–spatial features(iCapsNet–TSF).First,the dynamic routing algorithm of CapsNet is improved using weight initialization and updating.Then,the optical flow algorithm is used to extract interframe temporal features of the videos to form a dataset of temporal–spatial features.Finally,the iCapsNet model is employed to fully learn the temporal–spatial features of facial videos,and the results are fused.Experimental results show that the detection accuracy of iCapsNet–TSF reaches 94.07%,98.83%,and 98.50%on the Celeb-DF,FaceSwap,and Deepfakes datasets,respectively,displaying a better performance than most existing mainstream algorithms.The iCapsNet–TSF method combines the capsule network and the optical flow algorithm,providing a novel strategy for the deepfake detection,which is of great significance to the prevention of deepfake attacks and the preservation of cyberspace security.展开更多
This paper provides a comprehensive bibliometric exposition on deepfake research,exploring the intersection of artificial intelligence and deepfakes as well as international collaborations,prominent researchers,organi...This paper provides a comprehensive bibliometric exposition on deepfake research,exploring the intersection of artificial intelligence and deepfakes as well as international collaborations,prominent researchers,organizations,institutions,publications,and key themes.We performed a search on theWeb of Science(WoS)database,focusing on Artificial Intelligence and Deepfakes,and filtered the results across 21 research areas,yielding 1412 articles.Using VOSviewer visualization tool,we analyzed thisWoS data through keyword co-occurrence graphs,emphasizing on four prominent research themes.Compared with existing bibliometric papers on deepfakes,this paper proceeds to identify and discuss some of the highly cited papers within these themes:deepfake detection,feature extraction,face recognition,and forensics.The discussion highlights key challenges and advancements in deepfake research.Furthermore,this paper also discusses pressing issues surrounding deepfakes such as security,regulation,and datasets.We also provide an analysis of another exhaustive search on Scopus database focusing solely on Deepfakes(while not excluding AI)revealing deep learning as the predominant keyword,underscoring AI’s central role in deepfake research.This comprehensive analysis,encompassing over 500 keywords from 8790 articles,uncovered a wide range of methods,implications,applications,concerns,requirements,challenges,models,tools,datasets,and modalities related to deepfakes.Finally,a discussion on recommendations for policymakers,researchers,and other stakeholders is also provided.展开更多
Current studies against DeepFake attacks are mostly passive methods that detect specific defects of DeepFake algorithms,lacking generalization ability.Meanwhile,existing active defense methods only focus on defending ...Current studies against DeepFake attacks are mostly passive methods that detect specific defects of DeepFake algorithms,lacking generalization ability.Meanwhile,existing active defense methods only focus on defending against face attribute manipulations,and there remain enormous challenges to establishing an active and sustainable defense mechanism for face swap detection.Therefore,we propose a novel training framework called FSD-GAN(Face Swap Detection based on Generative Adversarial Network),immune to the evolution of face swap attacks.Specifically,FSD-GAN contains three modules:the data processing module,the attack module that generates fake faces only used in training,and the defense module that consists of a fingerprint generator and a fingerprint discriminator.We embed the latent noise fingerprints generated by the fingerprint generator into face images,unperceivable to attackers visually and statistically.Once an attacker uses these protected faces to perform face swap attacks,these fingerprints will be transferred from training data(protected faces)to generative models(real-world face swap models),and they also exist in generated results(swapped faces).Our discriminator can easily detect latent noise fingerprints embedded in face images,converting the problem of face swap detection to verifying if fingerprints exist in swapped face images or not.Moreover,we alternately train the attack and defense modules under an adversarial framework,making the defense module more robust.We illustrate the effectiveness and robustness of FSD-GAN through extensive experiments,demonstrating that it can confront various face images,mainstream face swap models,and JPEG compression under different qualities.展开更多
In recent years,with the rapid development of deepfake technology,a large number of deepfake videos have emerged on the Internet,which poses a huge threat to national politics,social stability,and personal privacy.Alt...In recent years,with the rapid development of deepfake technology,a large number of deepfake videos have emerged on the Internet,which poses a huge threat to national politics,social stability,and personal privacy.Although many existing deepfake detection methods exhibit excellent performance for known manipulations,their detection capabilities are not strong when faced with unknown manipulations.Therefore,in order to obtain better generalization ability,this paper analyzes global and local inter-frame dynamic inconsistencies from the perspective of spatial and frequency domains,and proposes a Local region Frequency Guided Dynamic Inconsistency Network(LFGDIN).The network includes two parts:Global SpatioTemporal Network(GSTN)and Local Region Frequency Guided Module(LRFGM).The GSTN is responsible for capturing the dynamic information of the entire face,while the LRFGM focuses on extracting the frequency dynamic information of the eyes and mouth.The LRFGM guides the GTSN to concentrate on dynamic inconsistency in some significant local regions through local region alignment,so as to improve the model's detection performance.Experiments on the three public datasets(FF++,DFDC,and Celeb-DF)show that compared with many recent advanced methods,the proposed method achieves better detection results when detecting deepfake videos of unknown manipulation types.展开更多
文摘Recent advances in artificial intelligence and the availability of large-scale benchmarks have made deepfake video generation and manipulation easier.Therefore,developing reliable and robust deepfake video detection mechanisms is paramount.This research introduces a novel real-time deepfake video detection framework by analyzing gaze and blink patterns,addressing the spatial-temporal challenges unique to gaze and blink anomalies using the TimeSformer and hybrid Transformer-CNN models.The TimeSformer architecture leverages spatial-temporal attention mechanisms to capture fine-grained blinking intervals and gaze direction anomalies.Compared to state-of-the-art traditional convolutional models like MesoNet and EfficientNet,which primarily focus on global facial features,our approach emphasizes localized eye-region analysis,significantly enhancing detection accuracy.We evaluate our framework on four standard datasets:FaceForensics,CelebDF-V2,DFDC,and FakeAVCeleb.The proposed framework results reveal higher accuracy,with the TimeSformer model achieving accuracies of 97.5%,96.3%,95.8%,and 97.1%,and with the hybrid Transformer-CNN model demonstrating accuracies of 92.8%,91.5%,90.9%,and 93.2%,on FaceForensics,CelebDF-V2,DFDC,and FakeAVCeleb datasets,respectively,showing robustness in distinguishing manipulated from authentic videos.Our research provides a robust state-of-the-art framework for real-time deepfake video detection.This novel study significantly contributes to video forensics,presenting scalable and accurate real-world application solutions.
基金supported by National Natural Science Foundation of China(Nos.62477026,62177029,61807020)Humanities and Social Sciences Research Program of the Ministry of Education of China(No.23YJAZH047)the Startup Foundation for Introducing Talent of Nanjing University of Posts and Communications under Grant NY222034.
文摘As Deepfake technology continues to evolve,the distinction between real and fake content becomes increasingly blurred.Most existing Deepfake video detectionmethods rely on single-frame facial image features,which limits their ability to capture temporal differences between frames.Current methods also exhibit limited generalization capabilities,struggling to detect content generated by unknown forgery algorithms.Moreover,the diversity and complexity of forgery techniques introduced by Artificial Intelligence Generated Content(AIGC)present significant challenges for traditional detection frameworks,whichmust balance high detection accuracy with robust performance.To address these challenges,we propose a novel Deepfake detection framework that combines a two-stream convolutional network with a Vision Transformer(ViT)module to enhance spatio-temporal feature representation.The ViT model extracts spatial features from the forged video,while the 3D convolutional network captures temporal features.The 3D convolution enables cross-frame feature extraction,allowing the model to detect subtle facial changes between frames.The confidence scores from both the ViT and 3D convolution submodels are fused at the decision layer,enabling themodel to effectively handle unknown forgery techniques.Focusing on Deepfake videos and GAN-generated images,the proposed approach is evaluated on two widely used public face forgery datasets.Compared to existing state-of-theartmethods,it achieves higher detection accuracy and better generalization performance,offering a robust solution for deepfake detection in real-world scenarios.
基金supported by the 2023 Open Project of Key Laboratory of Ministry of Public Security for Artificial Intelligence Security(RGZNAQ-2304)the Fundamental Research Funds for the Central Universities of PPSUC(2023JKF01ZK08).
文摘With the rapid development of deepfake technology,the authenticity of various types of fake synthetic content is increasing rapidly,which brings potential security threats to people’s daily life and social stability.Currently,most algorithms define deepfake detection as a binary classification problem,i.e.,global features are first extracted using a backbone network and then fed into a binary classifier to discriminate true or false.However,the differences between real and fake samples are often subtle and local,and such global feature-based detection algorithms are not optimal in efficiency and accuracy.To this end,to enhance the extraction of forgery details in deep forgery samples,we propose a multi-branch deepfake detection algorithm based on fine-grained features from the perspective of fine-grained classification.First,to address the critical problem in locating discriminative feature regions in fine-grained classification tasks,we investigate a method for locating multiple different discriminative regions and design a lightweight feature localization module to obtain crucial feature representations by augmenting the most significant parts of the feature map.Second,using information complementation,we introduce a correlation-guided fusion module to enhance the discriminative feature information of different branches.Finally,we use the global attention module in the multi-branch model to improve the cross-dimensional interaction of spatial domain and channel domain information and increase the weights of crucial feature regions and feature channels.We conduct sufficient ablation experiments and comparative experiments.The experimental results show that the algorithm outperforms the detection accuracy and effectiveness on the FaceForensics++and Celeb-DF-v2 datasets compared with the representative detection algorithms in recent years,which can achieve better detection results.
文摘Deepfake has emerged as an obstinate challenge in a world dominated by light.Here,the authors introduce a new deepfake detection method based on Xception architecture.The model is tested exhaustively with millions of frames and diverse video clips;accuracy levels as high as 99.65%are reported.These are the main reasons for such high efficacy:superior feature extraction capabilities and stable training mechanisms,such as early stopping,characterizing the Xception model.The methodology applied is also more advanced when it comes to data preprocessing steps,making use of state-of-the-art techniques applied to ensure constant performance.With an ever-rising threat from fake media,this piece of research puts great emphasis on stringent memory testing to keep at bay the spread of manipulated content.It also justifies better explanation methods to justify the reasoning done by the model for those decisions that build more trust and reliability.The ensemble models being more accurate have been studied and examined for establishing a possibility of combining various detection frameworks that could together produce superior results.Further,the study underlines the need for real-time detection tools that can be effective on different social media sites and digital environments.Ethics,protecting privacy,and public awareness in the fight against the proliferation of deepfakes are important considerations.By significantly contributing to the advancements made in the technology that has actually advanced detection,it strengthens the safety and integrity of the cyber world with a robust defense against ever-evolving deepfake threats in technology.Overall,the findings generally go a long way to prove themselves as the crucial step forward to ensuring information authenticity and the trustworthiness of society in this digital world.
基金Project supported by the Liaoning Provincial Education Department Science Project(No.JYTMS20231039)the Liaoning Provincial Education Science Planning Project(No.JG22CB252)the National Natural Science Foundation of China(Nos.61976109 and 61601214)。
文摘Deepfake poses significant threats to various fields,including politics,journalism,and entertainment.Although many defense methods against deepfake have been proposed based on either passive detection or proactive defense,few have achieved both passive detection and proactive defense.To address this issue,we propose a full-defense framework(FDF)based on cross-domain feature fusion and separable watermarks(SepMark)to achieve copyright protection and deepfake detection,combining the ideas of passive detection and proactive defense.The proactive defense module consists of one encoder and two separable decoders,where the encoder embeds one watermark into the protected face,and two decoders separately extract two watermarks with different robustness.The robust watermark can reliably trace the trusted marked face while the semi-robust watermark is sensitive to malicious distortions that make the watermark disappear after deepfake or watermark removal attack.The passive detection module fuses spatial-and frequency-domain features to further differentiate between deepfake content and watermark removal attacks in the absence of watermarks.The proposed cross-domain feature fusion involves substituting the "secondary" channels of spatial-domain features with the "primary" channels of frequency-domain features.Subsequently,the "primary"channels of spatial-domain features are used to replace the "secondary"channels of frequency-domain features.Extensive experiments demonstrate that our approach not only offers proactive defense mechanisms by using extracted watermarks,i.e.,source tracing and copyright protection,but also achieves passive detection when there are no watermarks,to further differentiate between deepfake content and watermark removal attacks,thereby offering a full-defense approach.
基金funded by the National Natural Science Foundation of China(Grant No.62376212)the Shaanxi Science Foundation of China(Grant No.2022GY-087)supported by the Open Fund of Intelligent Control Laboratory.
文摘The majority of current deepfake detection methods are constrained to identifying one or two specific types of counterfeit images,which limits their ability to keep pace with the rapid advancements in deepfake technology.Therefore,in this study,we propose a novel algorithm,StereoMixture Density Network(SMNDNet),which can detect multiple types of deepfake face manipulations using a single network framework.SMNDNet is an end-to-end CNNbased network specially designed for detecting various manipulation types of deepfake face images.First,we design a Subtle Distinguishable Feature Enhancement Module to emphasize the differentiation between authentic and forged features.Second,we introduce aMulti-Scale Forged Region AdaptiveModule that dynamically adapts to extract forged features from images of varying synthesis scales.Third,we integrate a Nonlinear Expression Capability Enhancement Module to augment the model’s capacity for capturing intricate nonlinear patterns across various types of deepfakes.Collectively,these modules empower our model to efficiently extract forgery features fromdiverse manipulation types,ensuring a more satisfactory performance in multiple-types deepfake detection.Experiments show that the proposed method outperforms alternative approaches in detection accuracy and AUC across all four types of deepfake images.It also demonstrates strong generalization on cross-dataset and cross-type detection,along with robust performance against post-processing manipulations.
基金supported by the Fundamental Research Funds for the Central Universities under Grant 2020JKF101the Research Funds of Sugon under Grant 2022KY001.
文摘Rapid development of deepfake technology led to the spread of forged audios and videos across network platforms,presenting risks for numerous countries,societies,and individuals,and posing a serious threat to cyberspace security.To address the problem of insufficient extraction of spatial features and the fact that temporal features are not considered in the deepfake video detection,we propose a detection method based on improved CapsNet and temporal–spatial features(iCapsNet–TSF).First,the dynamic routing algorithm of CapsNet is improved using weight initialization and updating.Then,the optical flow algorithm is used to extract interframe temporal features of the videos to form a dataset of temporal–spatial features.Finally,the iCapsNet model is employed to fully learn the temporal–spatial features of facial videos,and the results are fused.Experimental results show that the detection accuracy of iCapsNet–TSF reaches 94.07%,98.83%,and 98.50%on the Celeb-DF,FaceSwap,and Deepfakes datasets,respectively,displaying a better performance than most existing mainstream algorithms.The iCapsNet–TSF method combines the capsule network and the optical flow algorithm,providing a novel strategy for the deepfake detection,which is of great significance to the prevention of deepfake attacks and the preservation of cyberspace security.
文摘This paper provides a comprehensive bibliometric exposition on deepfake research,exploring the intersection of artificial intelligence and deepfakes as well as international collaborations,prominent researchers,organizations,institutions,publications,and key themes.We performed a search on theWeb of Science(WoS)database,focusing on Artificial Intelligence and Deepfakes,and filtered the results across 21 research areas,yielding 1412 articles.Using VOSviewer visualization tool,we analyzed thisWoS data through keyword co-occurrence graphs,emphasizing on four prominent research themes.Compared with existing bibliometric papers on deepfakes,this paper proceeds to identify and discuss some of the highly cited papers within these themes:deepfake detection,feature extraction,face recognition,and forensics.The discussion highlights key challenges and advancements in deepfake research.Furthermore,this paper also discusses pressing issues surrounding deepfakes such as security,regulation,and datasets.We also provide an analysis of another exhaustive search on Scopus database focusing solely on Deepfakes(while not excluding AI)revealing deep learning as the predominant keyword,underscoring AI’s central role in deepfake research.This comprehensive analysis,encompassing over 500 keywords from 8790 articles,uncovered a wide range of methods,implications,applications,concerns,requirements,challenges,models,tools,datasets,and modalities related to deepfakes.Finally,a discussion on recommendations for policymakers,researchers,and other stakeholders is also provided.
基金supported by the National Natural Science Foundation of China under Grant Nos.62472092,62172089,and 62106045the Natural Science Foundation of Jiangsu Province of China under Grant No.BK20241751+2 种基金the Jiangsu Provincial Key Laboratory of Computer Networking Technology,the Jiangsu Provincial Key Laboratory of Network and Information Security under Grant No.BM2003201the Key Laboratory of Computer Network and Information Integration of the Ministry of Education of China under Grant No.93K-9the Nanjing Purple Mountain Laboratories.We appreciate the Big Data Computing Center of Southeast University for providing the facility support on the numerical calculations.
文摘Current studies against DeepFake attacks are mostly passive methods that detect specific defects of DeepFake algorithms,lacking generalization ability.Meanwhile,existing active defense methods only focus on defending against face attribute manipulations,and there remain enormous challenges to establishing an active and sustainable defense mechanism for face swap detection.Therefore,we propose a novel training framework called FSD-GAN(Face Swap Detection based on Generative Adversarial Network),immune to the evolution of face swap attacks.Specifically,FSD-GAN contains three modules:the data processing module,the attack module that generates fake faces only used in training,and the defense module that consists of a fingerprint generator and a fingerprint discriminator.We embed the latent noise fingerprints generated by the fingerprint generator into face images,unperceivable to attackers visually and statistically.Once an attacker uses these protected faces to perform face swap attacks,these fingerprints will be transferred from training data(protected faces)to generative models(real-world face swap models),and they also exist in generated results(swapped faces).Our discriminator can easily detect latent noise fingerprints embedded in face images,converting the problem of face swap detection to verifying if fingerprints exist in swapped face images or not.Moreover,we alternately train the attack and defense modules under an adversarial framework,making the defense module more robust.We illustrate the effectiveness and robustness of FSD-GAN through extensive experiments,demonstrating that it can confront various face images,mainstream face swap models,and JPEG compression under different qualities.
基金supported by the National Natural Science Foundation of China(Nos.62072251 and U22B2062)the Priority Academic Program Development of Jiangsu Higher Education Institutions fund.
文摘In recent years,with the rapid development of deepfake technology,a large number of deepfake videos have emerged on the Internet,which poses a huge threat to national politics,social stability,and personal privacy.Although many existing deepfake detection methods exhibit excellent performance for known manipulations,their detection capabilities are not strong when faced with unknown manipulations.Therefore,in order to obtain better generalization ability,this paper analyzes global and local inter-frame dynamic inconsistencies from the perspective of spatial and frequency domains,and proposes a Local region Frequency Guided Dynamic Inconsistency Network(LFGDIN).The network includes two parts:Global SpatioTemporal Network(GSTN)and Local Region Frequency Guided Module(LRFGM).The GSTN is responsible for capturing the dynamic information of the entire face,while the LRFGM focuses on extracting the frequency dynamic information of the eyes and mouth.The LRFGM guides the GTSN to concentrate on dynamic inconsistency in some significant local regions through local region alignment,so as to improve the model's detection performance.Experiments on the three public datasets(FF++,DFDC,and Celeb-DF)show that compared with many recent advanced methods,the proposed method achieves better detection results when detecting deepfake videos of unknown manipulation types.