With expeditious advancements in AI-driven facial manipulation techniques,particularly deepfake technology,there is growing concern over its potential misuse.Deepfakes pose a significant threat to society,partic-ularl...With expeditious advancements in AI-driven facial manipulation techniques,particularly deepfake technology,there is growing concern over its potential misuse.Deepfakes pose a significant threat to society,partic-ularly by infringing on individuals’privacy.Amid significant endeavors to fabricate systems for identifying deepfake fabrications,existing methodologies often face hurdles in adjusting to innovative forgery techniques and demonstrate increased vulnerability to image and video clarity variations,thereby hindering their broad applicability to images and videos produced by unfamiliar technologies.In this manuscript,we endorse resilient training tactics to amplify generalization capabilities.In adversarial training,models are trained using deliberately crafted samples to deceive classification systems,thereby significantly enhancing their generalization ability.In response to this challenge,we propose an innovative hybrid adversarial training framework integrating Virtual Adversarial Training(VAT)with Two-Generated Blurred Adversarial Training.This combined framework bolsters the model’s resilience in detecting deepfakes made using unfamiliar deep learning technologies.Through such adversarial training,models are prompted to acquire more versatile attributes.Through experimental studies,we demonstrate that our model achieves higher accuracy than existing models.展开更多
Self-supervised monocular depth estimation has emerged as a major research focus in recent years,primarily due to the elimination of ground-truth depth dependence.However,the prevailing architectures in this domain su...Self-supervised monocular depth estimation has emerged as a major research focus in recent years,primarily due to the elimination of ground-truth depth dependence.However,the prevailing architectures in this domain suffer from inherent limitations:existing pose network branches infer camera ego-motion exclusively under static-scene and Lambertian-surface assumptions.These assumptions are often violated in real-world scenarios due to dynamic objects,non-Lambertian reflectance,and unstructured background elements,leading to pervasive artifacts such as depth discontinuities(“holes”),structural collapse,and ambiguous reconstruction.To address these challenges,we propose a novel framework that integrates scene dynamic pose estimation into the conventional self-supervised depth network,enhancing its ability to model complex scene dynamics.Our contributions are threefold:(1)a pixel-wise dynamic pose estimation module that jointly resolves the pose transformations of moving objects and localized scene perturbations;(2)a physically-informed loss function that couples dynamic pose and depth predictions,designed to mitigate depth errors arising from high-speed distant objects and geometrically inconsistent motion profiles;(3)an efficient SE(3)transformation parameterization that streamlines network complexity and temporal pre-processing.Extensive experiments on the KITTI and NYU-V2 benchmarks show that our framework achieves state-of-the-art performance in both quantitative metrics and qualitative visual fidelity,significantly improving the robustness and generalization of monocular depth estimation under dynamic conditions.展开更多
针对在仅具有三原色(red-green-blue,RGB)摄像头的通用消费设备上部署基于深度学习的人脸反欺诈(face anti-spoofing,FAS)算法时存在的挑战问题,提出一种高效且轻量的RGB单帧FAS(efficient and lightweight RGB frame-level face anti-s...针对在仅具有三原色(red-green-blue,RGB)摄像头的通用消费设备上部署基于深度学习的人脸反欺诈(face anti-spoofing,FAS)算法时存在的挑战问题,提出一种高效且轻量的RGB单帧FAS(efficient and lightweight RGB frame-level face anti-spoofing,EL-FAS)模型。探索一种新的全局空间自注意力机制捕获全局上下文信息的依赖关系,以提高模型泛化能力并在受限条件下实现高检测性能;设计一种等通道像素级二元监督方法,强制模型从不同的像素中学习共享特征;采用Bottleneck模块搭建骨干网络以减少模型参数。试验结果表明,EL-FAS模型在OULU-NPU数据集的大多数协议上平均分类错误率R_(ACE)最低,取得较好的人脸欺诈检测效果,在SiW数据集和跨数据集测试中也取得较好的性能,并且模型轻量,参数只有1.34×10^(6)个。展开更多
目的 由于不同伪造类型样本的数据分布差距较大,现有人脸伪造检测方法的准确度不够高,而且泛化性能差。为此,本文引入“图像块归属纯净性”和“残差图估计可靠性”的概念,提出了基于图像块比较和残差图估计的人脸伪造检测方法。方法 除...目的 由于不同伪造类型样本的数据分布差距较大,现有人脸伪造检测方法的准确度不够高,而且泛化性能差。为此,本文引入“图像块归属纯净性”和“残差图估计可靠性”的概念,提出了基于图像块比较和残差图估计的人脸伪造检测方法。方法 除了骨干网络,本文的人脸伪造检测神经网络主要由纯净图像块比较模块和可靠残差图估计模块两部分组成。为了避免在同时包含人脸和背景像素的图像块上提取的混杂特征对于图像块比较的干扰,纯净图像块比较模块中选择只包含人脸像素的纯净人脸图像块和只包含背景像素的纯净背景图像块,通过比较两种图像块纯净特征之间的差异来检测伪造图像,图像块的纯净性保障了特征提取的纯净性,从而提高了特征比较的鲁棒性。考虑到靠近伪造边缘的像素比远离伪造边缘的像素具有较高的残差估计准确度,本文在可靠残差图估计模块中根据像素到伪造边缘的距离设计了一个距离场加权的残差损失来引导网络的训练过程,使网络重点关注输入图像与对应真实图像在伪造边缘附近的差异,对于可靠信息的关注进一步增强了伪造检测的鲁棒性。结果在FF++(FaceForensics++)数据集上的测试结果显示:与对比算法中性能最好的F2Trans-B相比,本文方法的准确率和AUC(area under the ROC curve)指标分别提高了2.49%和3.31%,在FS(FaceSwap)与F2F(Face2Face)两种伪造数据上的准确率指标分别提高了6.01%和3.99%。在泛化性能方面,与11种已有方法在交叉数据集上的测试结果显示:本文方法与其中性能最好的方法相比,在CDF(Celeb-DF)数据集上的视频AUC指标和图像AUC指标分别提高了1.85%和1.03%。结论 与对比方法相比,由于提高了特征信息的纯净性和可靠性,本文提出的人脸图像伪造检测模型的泛化能力和准确率优于对比方法。展开更多
台标识别是典型的细微目标识别问题,针对台标区域小、信息量低,且镂空、半透明台标极易受到画面背景影响的难题,提出一个基于端到端全卷积网络的像素级台标识别网络——PNET.首先构建一个像素级标注的台标数据集,通过视频抽帧和图像预...台标识别是典型的细微目标识别问题,针对台标区域小、信息量低,且镂空、半透明台标极易受到画面背景影响的难题,提出一个基于端到端全卷积网络的像素级台标识别网络——PNET.首先构建一个像素级标注的台标数据集,通过视频抽帧和图像预处理获得台标图像集,并提出一种逐图像的像素级半自动标注方法获得二值标签图像集;然后提出一个像素级台标识别网络,在典型分类网络AlexNet,VGG的基础上,通过微调,将分类网络在分类任务中学习到的网络参数转换为像素级台标识别网络在台标分割任务中的所需的网络参数;最后引入跨层架构,融合来自网络深层的全局信息和浅层的局部信息.实验结果表明PNET实现了准确的像素级分割,准确率高达98.3%,在NVIDIA Tesla K80上单幅图像识别时间不超过1.5 s.展开更多
基金supported by King Saud University,Riyadh,Saudi Arabia,through the Researchers Supporting Project under Grant RSP2025R493。
文摘With expeditious advancements in AI-driven facial manipulation techniques,particularly deepfake technology,there is growing concern over its potential misuse.Deepfakes pose a significant threat to society,partic-ularly by infringing on individuals’privacy.Amid significant endeavors to fabricate systems for identifying deepfake fabrications,existing methodologies often face hurdles in adjusting to innovative forgery techniques and demonstrate increased vulnerability to image and video clarity variations,thereby hindering their broad applicability to images and videos produced by unfamiliar technologies.In this manuscript,we endorse resilient training tactics to amplify generalization capabilities.In adversarial training,models are trained using deliberately crafted samples to deceive classification systems,thereby significantly enhancing their generalization ability.In response to this challenge,we propose an innovative hybrid adversarial training framework integrating Virtual Adversarial Training(VAT)with Two-Generated Blurred Adversarial Training.This combined framework bolsters the model’s resilience in detecting deepfakes made using unfamiliar deep learning technologies.Through such adversarial training,models are prompted to acquire more versatile attributes.Through experimental studies,we demonstrate that our model achieves higher accuracy than existing models.
基金supported in part by the National Natural Science Foundation of China under Grants 62071345。
文摘Self-supervised monocular depth estimation has emerged as a major research focus in recent years,primarily due to the elimination of ground-truth depth dependence.However,the prevailing architectures in this domain suffer from inherent limitations:existing pose network branches infer camera ego-motion exclusively under static-scene and Lambertian-surface assumptions.These assumptions are often violated in real-world scenarios due to dynamic objects,non-Lambertian reflectance,and unstructured background elements,leading to pervasive artifacts such as depth discontinuities(“holes”),structural collapse,and ambiguous reconstruction.To address these challenges,we propose a novel framework that integrates scene dynamic pose estimation into the conventional self-supervised depth network,enhancing its ability to model complex scene dynamics.Our contributions are threefold:(1)a pixel-wise dynamic pose estimation module that jointly resolves the pose transformations of moving objects and localized scene perturbations;(2)a physically-informed loss function that couples dynamic pose and depth predictions,designed to mitigate depth errors arising from high-speed distant objects and geometrically inconsistent motion profiles;(3)an efficient SE(3)transformation parameterization that streamlines network complexity and temporal pre-processing.Extensive experiments on the KITTI and NYU-V2 benchmarks show that our framework achieves state-of-the-art performance in both quantitative metrics and qualitative visual fidelity,significantly improving the robustness and generalization of monocular depth estimation under dynamic conditions.
文摘针对在仅具有三原色(red-green-blue,RGB)摄像头的通用消费设备上部署基于深度学习的人脸反欺诈(face anti-spoofing,FAS)算法时存在的挑战问题,提出一种高效且轻量的RGB单帧FAS(efficient and lightweight RGB frame-level face anti-spoofing,EL-FAS)模型。探索一种新的全局空间自注意力机制捕获全局上下文信息的依赖关系,以提高模型泛化能力并在受限条件下实现高检测性能;设计一种等通道像素级二元监督方法,强制模型从不同的像素中学习共享特征;采用Bottleneck模块搭建骨干网络以减少模型参数。试验结果表明,EL-FAS模型在OULU-NPU数据集的大多数协议上平均分类错误率R_(ACE)最低,取得较好的人脸欺诈检测效果,在SiW数据集和跨数据集测试中也取得较好的性能,并且模型轻量,参数只有1.34×10^(6)个。
文摘目的 由于不同伪造类型样本的数据分布差距较大,现有人脸伪造检测方法的准确度不够高,而且泛化性能差。为此,本文引入“图像块归属纯净性”和“残差图估计可靠性”的概念,提出了基于图像块比较和残差图估计的人脸伪造检测方法。方法 除了骨干网络,本文的人脸伪造检测神经网络主要由纯净图像块比较模块和可靠残差图估计模块两部分组成。为了避免在同时包含人脸和背景像素的图像块上提取的混杂特征对于图像块比较的干扰,纯净图像块比较模块中选择只包含人脸像素的纯净人脸图像块和只包含背景像素的纯净背景图像块,通过比较两种图像块纯净特征之间的差异来检测伪造图像,图像块的纯净性保障了特征提取的纯净性,从而提高了特征比较的鲁棒性。考虑到靠近伪造边缘的像素比远离伪造边缘的像素具有较高的残差估计准确度,本文在可靠残差图估计模块中根据像素到伪造边缘的距离设计了一个距离场加权的残差损失来引导网络的训练过程,使网络重点关注输入图像与对应真实图像在伪造边缘附近的差异,对于可靠信息的关注进一步增强了伪造检测的鲁棒性。结果在FF++(FaceForensics++)数据集上的测试结果显示:与对比算法中性能最好的F2Trans-B相比,本文方法的准确率和AUC(area under the ROC curve)指标分别提高了2.49%和3.31%,在FS(FaceSwap)与F2F(Face2Face)两种伪造数据上的准确率指标分别提高了6.01%和3.99%。在泛化性能方面,与11种已有方法在交叉数据集上的测试结果显示:本文方法与其中性能最好的方法相比,在CDF(Celeb-DF)数据集上的视频AUC指标和图像AUC指标分别提高了1.85%和1.03%。结论 与对比方法相比,由于提高了特征信息的纯净性和可靠性,本文提出的人脸图像伪造检测模型的泛化能力和准确率优于对比方法。
文摘台标识别是典型的细微目标识别问题,针对台标区域小、信息量低,且镂空、半透明台标极易受到画面背景影响的难题,提出一个基于端到端全卷积网络的像素级台标识别网络——PNET.首先构建一个像素级标注的台标数据集,通过视频抽帧和图像预处理获得台标图像集,并提出一种逐图像的像素级半自动标注方法获得二值标签图像集;然后提出一个像素级台标识别网络,在典型分类网络AlexNet,VGG的基础上,通过微调,将分类网络在分类任务中学习到的网络参数转换为像素级台标识别网络在台标分割任务中的所需的网络参数;最后引入跨层架构,融合来自网络深层的全局信息和浅层的局部信息.实验结果表明PNET实现了准确的像素级分割,准确率高达98.3%,在NVIDIA Tesla K80上单幅图像识别时间不超过1.5 s.