To address the high-quality forged videos,traditional approaches typically have low recognition accuracy and tend to be easily misclassified.This paper tries to address the challenge of detecting high-quality deepfake...To address the high-quality forged videos,traditional approaches typically have low recognition accuracy and tend to be easily misclassified.This paper tries to address the challenge of detecting high-quality deepfake videos by promoting the accuracy of Artificial Intelligence Generated Content(AIGC)video authenticity detection with a multimodal information fusion approach.First,a high-quality multimodal video dataset is collected and normalized,including resolution correction and frame rate unification.Next,feature extraction techniques are employed to draw out features from visual,audio,and text modalities.Subsequently,these features are fused into a multilayer perceptron and attention mechanisms-based multimodal feature matrix.Finally,the matrix is fed into a multimodal information fusion layer in order to construct and train a deep learning model.Experimental findings show that the multimodal fusion model achieves an accuracy of 93.8%for the detection of video authenticity,showing significant improvement against other unimodal models,as well as affirming better performance and resistance of the model to AIGC video authenticity detection.展开更多
文摘To address the high-quality forged videos,traditional approaches typically have low recognition accuracy and tend to be easily misclassified.This paper tries to address the challenge of detecting high-quality deepfake videos by promoting the accuracy of Artificial Intelligence Generated Content(AIGC)video authenticity detection with a multimodal information fusion approach.First,a high-quality multimodal video dataset is collected and normalized,including resolution correction and frame rate unification.Next,feature extraction techniques are employed to draw out features from visual,audio,and text modalities.Subsequently,these features are fused into a multilayer perceptron and attention mechanisms-based multimodal feature matrix.Finally,the matrix is fed into a multimodal information fusion layer in order to construct and train a deep learning model.Experimental findings show that the multimodal fusion model achieves an accuracy of 93.8%for the detection of video authenticity,showing significant improvement against other unimodal models,as well as affirming better performance and resistance of the model to AIGC video authenticity detection.