Deepfake technology can be used to replace people’s faces in videos or pictures to show them saying or doing things they never said or did. Deepfake media are often used to extort, defame, and manipulate public opini...Deepfake technology can be used to replace people’s faces in videos or pictures to show them saying or doing things they never said or did. Deepfake media are often used to extort, defame, and manipulate public opinion. However, despite deepfake technology’s risks, current deepfake detection methods lack generalization and are inconsistent when applied to unknown videos, i.e., videos on which they have not been trained. The purpose of this study is to develop a generalizable deepfake detection model by training convoluted neural networks (CNNs) to classify human facial features in videos. The study formulated the research questions: “How effectively does the developed model provide reliable generalizations?” A CNN model was trained to distinguish between real and fake videos using the facial features of human subjects in videos. The model was trained, validated, and tested using the FaceForensiq++ dataset, which contains more than 500,000 frames and subsets of the DFDC dataset, totaling more than 22,000 videos. The study demonstrated high generalizability, as the accuracy of the unknown dataset was only marginally (about 1%) lower than that of the known dataset. The findings of this study indicate that detection systems can be more generalizable, lighter, and faster by focusing on just a small region (the human face) of an entire video.展开更多
This paper presents a novel technique for embedding a digital watermark into video frames based on motion vectors and discrete wavelet transform (DWT). In the proposed scheme, the binary image watermark is divided int...This paper presents a novel technique for embedding a digital watermark into video frames based on motion vectors and discrete wavelet transform (DWT). In the proposed scheme, the binary image watermark is divided into blocks and each watermark block is embedded several times in each selected video frame at different locations. The block-based motion estimation algorithm is used to select the video frame blocks having the greatest motion vectors magnitude. The DWT is applied to the selected frame blocks, and then, the watermark block is hidden into these blocks by modifying the coefficients of the Horizontal sub-bands (HL). Adding the watermark at different locations in the same video frame makes the scheme more robust against different types of attacks. The method was tested on different types of videos. The average peak signal to noise ratio (PSNR) and the normalized correlation (NC) are used to measure the performance of the proposed method. Experimental results show that the proposed algorithm does not affect the visual quality of video frames and the scheme is robust against a variety of attacks.展开更多
Video prediction is the problem of generating future frames by exploiting the spatiotemporal correlation from the past frame sequence.It is one of the crucial issues in computer vision and has many real-world applicat...Video prediction is the problem of generating future frames by exploiting the spatiotemporal correlation from the past frame sequence.It is one of the crucial issues in computer vision and has many real-world applications,mainly focused on predicting future scenarios to avoid undesirable outcomes.However,modeling future image content and object is challenging due to the dynamic evolution and complexity of the scene,such as occlusions,camera movements,delay and illumination.Direct frame synthesis or optical-flow estimation are common approaches used by researchers.However,researchers mainly focused on video prediction using one of the approaches.Both methods have limitations,such as direct frame synthesis,usually face blurry prediction due to complex pixel distributions in the scene,and optical-flow estimation,usually produce artifacts due to large object displacements or obstructions in the clip.In this paper,we constructed a deep neural network Frame Prediction Network(FPNet-OF)with multiplebranch inputs(optical flow and original frame)to predict the future video frame by adaptively fusing the future object-motion with the future frame generator.The key idea is to jointly optimize direct RGB frame synthesis and dense optical flow estimation to generate a superior video prediction network.Using various real-world datasets,we experimentally verify that our proposed framework can produce high-level video frame compared to other state-ofthe-art framework.展开更多
A popular and challenging task in video research,frame interpolation aims to increase the frame rate of video.Most existing methods employ a fixed motion model,e.g.,linear,quadratic,or cubic,to estimate the intermedia...A popular and challenging task in video research,frame interpolation aims to increase the frame rate of video.Most existing methods employ a fixed motion model,e.g.,linear,quadratic,or cubic,to estimate the intermediate warping field.However,such fixed motion models cannot well represent the complicated non-linear motions in the real world or rendered animations.Instead,we present an adaptive flow prediction module to better approximate the complex motions in video.Furthermore,interpolating just one intermediate frame between consecutive input frames may be insufficient for complicated non-linear motions.To enable multi-frame interpolation,we introduce the time as a control variable when interpolating frames between original ones in our generic adaptive flow prediction module.Qualitative and quantitative experimental results show that our method can produce high-quality results and outperforms the existing stateof-the-art methods on popular public datasets.展开更多
文摘Deepfake technology can be used to replace people’s faces in videos or pictures to show them saying or doing things they never said or did. Deepfake media are often used to extort, defame, and manipulate public opinion. However, despite deepfake technology’s risks, current deepfake detection methods lack generalization and are inconsistent when applied to unknown videos, i.e., videos on which they have not been trained. The purpose of this study is to develop a generalizable deepfake detection model by training convoluted neural networks (CNNs) to classify human facial features in videos. The study formulated the research questions: “How effectively does the developed model provide reliable generalizations?” A CNN model was trained to distinguish between real and fake videos using the facial features of human subjects in videos. The model was trained, validated, and tested using the FaceForensiq++ dataset, which contains more than 500,000 frames and subsets of the DFDC dataset, totaling more than 22,000 videos. The study demonstrated high generalizability, as the accuracy of the unknown dataset was only marginally (about 1%) lower than that of the known dataset. The findings of this study indicate that detection systems can be more generalizable, lighter, and faster by focusing on just a small region (the human face) of an entire video.
文摘This paper presents a novel technique for embedding a digital watermark into video frames based on motion vectors and discrete wavelet transform (DWT). In the proposed scheme, the binary image watermark is divided into blocks and each watermark block is embedded several times in each selected video frame at different locations. The block-based motion estimation algorithm is used to select the video frame blocks having the greatest motion vectors magnitude. The DWT is applied to the selected frame blocks, and then, the watermark block is hidden into these blocks by modifying the coefficients of the Horizontal sub-bands (HL). Adding the watermark at different locations in the same video frame makes the scheme more robust against different types of attacks. The method was tested on different types of videos. The average peak signal to noise ratio (PSNR) and the normalized correlation (NC) are used to measure the performance of the proposed method. Experimental results show that the proposed algorithm does not affect the visual quality of video frames and the scheme is robust against a variety of attacks.
基金supported by Incheon NationalUniversity Research Grant in 2017.
文摘Video prediction is the problem of generating future frames by exploiting the spatiotemporal correlation from the past frame sequence.It is one of the crucial issues in computer vision and has many real-world applications,mainly focused on predicting future scenarios to avoid undesirable outcomes.However,modeling future image content and object is challenging due to the dynamic evolution and complexity of the scene,such as occlusions,camera movements,delay and illumination.Direct frame synthesis or optical-flow estimation are common approaches used by researchers.However,researchers mainly focused on video prediction using one of the approaches.Both methods have limitations,such as direct frame synthesis,usually face blurry prediction due to complex pixel distributions in the scene,and optical-flow estimation,usually produce artifacts due to large object displacements or obstructions in the clip.In this paper,we constructed a deep neural network Frame Prediction Network(FPNet-OF)with multiplebranch inputs(optical flow and original frame)to predict the future video frame by adaptively fusing the future object-motion with the future frame generator.The key idea is to jointly optimize direct RGB frame synthesis and dense optical flow estimation to generate a superior video prediction network.Using various real-world datasets,we experimentally verify that our proposed framework can produce high-level video frame compared to other state-ofthe-art framework.
基金supported by the Research Grants Council of the Hong Kong Special Administrative Region,under RGC General Research Fund(Project No.CUHK 14201017)Shenzhen Science and Technology Program(No.JCYJ20180507182410327)the Science and Technology Plan Project of Guangzhou(No.201704020141)。
文摘A popular and challenging task in video research,frame interpolation aims to increase the frame rate of video.Most existing methods employ a fixed motion model,e.g.,linear,quadratic,or cubic,to estimate the intermediate warping field.However,such fixed motion models cannot well represent the complicated non-linear motions in the real world or rendered animations.Instead,we present an adaptive flow prediction module to better approximate the complex motions in video.Furthermore,interpolating just one intermediate frame between consecutive input frames may be insufficient for complicated non-linear motions.To enable multi-frame interpolation,we introduce the time as a control variable when interpolating frames between original ones in our generic adaptive flow prediction module.Qualitative and quantitative experimental results show that our method can produce high-quality results and outperforms the existing stateof-the-art methods on popular public datasets.