Video colorization is a challenging and highly ill-posed problem.Although recent years have witnessed remarkable progress in single image colorization,there is relatively less research effort on video colorization,and...Video colorization is a challenging and highly ill-posed problem.Although recent years have witnessed remarkable progress in single image colorization,there is relatively less research effort on video colorization,and existing methods always suffer from severe flickering artifacts(temporal inconsistency)or unsatisfactory colorization.We address this problem from a new perspective,by jointly considering colorization and temporal consistency in a unified framework.Specifically,we propose a novel temporally consistent video colorization(TCVC)framework.TCVC effectively propagates frame-level deep features in a bidirectional way to enhance the temporal consistency of colorization.Furthermore,TCVC introduces a self-regularization learning(SRL)scheme to minimize the differences in predictions obtained using different time steps.SRL does not require any ground-truth color videos for training and can further improve temporal consistency.Experiments demonstrate that our method can not only provide visually pleasing colorized video,but also with clearly better temporal consistency than state-of-the-art methods.A video demo is provided at https://www.youtube.com/watch?v=c7dczMs-olE,while code is available at https://github.com/lyh-18/TCVC-Tem porally-Consistent-Video-Colorization.展开更多
目的快速检测工业场景中的文本,可以提高生产效率、降低成本,然而数据的标注耗时耗力,鲜有标注信息可用,针对目前方法在应用到工业数据时存在伪标签质量低和域差距较大等问题,本文提出了一种结合文本自训练和对抗学习的领域自适应工业...目的快速检测工业场景中的文本,可以提高生产效率、降低成本,然而数据的标注耗时耗力,鲜有标注信息可用,针对目前方法在应用到工业数据时存在伪标签质量低和域差距较大等问题,本文提出了一种结合文本自训练和对抗学习的领域自适应工业场景文本检测方法。方法首先,针对伪标签质量低的问题,采用教师学生框架进行文本自训练。教师和学生模型应用数据增强和相互学习缓解域偏移,提高伪标签的质量;其次,针对域差距,提出图像级和实例级对抗学习模块来对齐源域和目标域的特征分布,使网络学习域不变特征;最后,在两个对抗学习模块之间使用一致性正则化进一步缓解域差距,提高模型的域适应能力。结果实验证明,本文的方法在工业铭牌数据集的精确率、召回率和F1值分别达到96.2%、95.0%和95.6%,较基线模型分别提高了10%、15.3%和12.8%。同时在ICDAR15和MSRA-TD500数据集上也表现出良好性能,与当前先进的方法相比,F1值分别提高0.9%和3.1%。此外,本文的方法在应用到EAST(efficient and accurate scene text detector)文本检测模型后,铭牌数据集的各指标分别提升5%,11.8%和9.5%。结论本文提出的方法成功缓解了源域与目标域数据之间的差距,显著提高了模型的泛化能力,并且具有良好的通用性,同时模型推理阶段不会增加计算成本。展开更多
基金supported by grants from the National Natural Science Foundation of China(61906184)the Joint Lab of CAS–HK,and the Shanghai Committee of Science and Technology,China(20DZ1100800,21DZ1100100).
文摘Video colorization is a challenging and highly ill-posed problem.Although recent years have witnessed remarkable progress in single image colorization,there is relatively less research effort on video colorization,and existing methods always suffer from severe flickering artifacts(temporal inconsistency)or unsatisfactory colorization.We address this problem from a new perspective,by jointly considering colorization and temporal consistency in a unified framework.Specifically,we propose a novel temporally consistent video colorization(TCVC)framework.TCVC effectively propagates frame-level deep features in a bidirectional way to enhance the temporal consistency of colorization.Furthermore,TCVC introduces a self-regularization learning(SRL)scheme to minimize the differences in predictions obtained using different time steps.SRL does not require any ground-truth color videos for training and can further improve temporal consistency.Experiments demonstrate that our method can not only provide visually pleasing colorized video,but also with clearly better temporal consistency than state-of-the-art methods.A video demo is provided at https://www.youtube.com/watch?v=c7dczMs-olE,while code is available at https://github.com/lyh-18/TCVC-Tem porally-Consistent-Video-Colorization.
文摘目的快速检测工业场景中的文本,可以提高生产效率、降低成本,然而数据的标注耗时耗力,鲜有标注信息可用,针对目前方法在应用到工业数据时存在伪标签质量低和域差距较大等问题,本文提出了一种结合文本自训练和对抗学习的领域自适应工业场景文本检测方法。方法首先,针对伪标签质量低的问题,采用教师学生框架进行文本自训练。教师和学生模型应用数据增强和相互学习缓解域偏移,提高伪标签的质量;其次,针对域差距,提出图像级和实例级对抗学习模块来对齐源域和目标域的特征分布,使网络学习域不变特征;最后,在两个对抗学习模块之间使用一致性正则化进一步缓解域差距,提高模型的域适应能力。结果实验证明,本文的方法在工业铭牌数据集的精确率、召回率和F1值分别达到96.2%、95.0%和95.6%,较基线模型分别提高了10%、15.3%和12.8%。同时在ICDAR15和MSRA-TD500数据集上也表现出良好性能,与当前先进的方法相比,F1值分别提高0.9%和3.1%。此外,本文的方法在应用到EAST(efficient and accurate scene text detector)文本检测模型后,铭牌数据集的各指标分别提升5%,11.8%和9.5%。结论本文提出的方法成功缓解了源域与目标域数据之间的差距,显著提高了模型的泛化能力,并且具有良好的通用性,同时模型推理阶段不会增加计算成本。