摘要
近年来,图像实例和谐化作为图像生成领域中的重要分支得到了迅速发展。然而,如何确保前景实例与背景图像中的各个元素在语义上具备合理的逻辑关系,并使组合后的图像内容和谐一致,仍是当前研究面临的难点。此外,受限于高成本和设备要求,收集大规模的和谐化训练数据存在诸多困难。为解决这些问题,本文提出一种基于大规模预训练扩散模型的图像和谐化方法。该方法基于预训练的Stable Diffusion 2.0模型,采用自然语言引导图像填充任务,使模型能够在自然语言描述和待填充区域图像的条件下生成符合语义需求的和谐图像。本方法将实例图像的高频信息与低频信息分别作为控制条件,对预训练模型进行微调,以确保生成结果尽可能保留实例图像的关键内容,最终生成和谐的组合图像。实验结果表明,本方法在生成实例阴影、调节光照等方面均表现出优异的效果,有效提升了图像语义与视觉的和谐化质量。
Recent advancements in image generation have led to significant progress in image instance harmonization.However,maintaining semantic consistency between foreground and background elements and achieving visually plausible combinations remain a challenging task.Additionally,the scarcity of large-scale harmonization datasets limits the development of effective methods.To address these challenges,the paper proposes a novel image harmonization approach based on a large-scale pretrained diffusion model.Leveraging the powerful capabilities of Stable Diffusion 2.0,the paper formulates image harmonization as a text-guided image inpainting task.By providing natural language descriptions and specifying target regions,the proposed model can generate harmonized images that seamlessly blend with the background.To further enhance the quality of the generated images,the paper incorporates high-frequency and low-frequency information from the foreground instance as control conditions,ensuring that the essential features of the instance are preserved.Experimental results demonstrate that the proposed approach significantly improves image harmonization quality,especially in terms of generating realistic shadows and adjusting lighting effects.
作者
刘鹏举
石宇鹏
张宏志
姜峰
左旺孟
LIU Pengju;SHI Yupeng;ZHANG Hongzhi;JIANG Feng;ZUO Wangmeng(School of Medicine and Health,Harbin Institute of Technology,Harbin 150001,China;Zhengzhou Research Institute,Harbin Institute of Technology,Zhengzhou 450000,China;Faculty of Computing,Harbin Institute of Technology,Harbin 150001,China)
出处
《智能计算机与应用》
2025年第3期7-17,共11页
Intelligent Computer and Applications
基金
国家自然科学基金面上项目(6237011159)
中国博士后基金项目(2024M754208)。
关键词
图像实例和谐化
预训练扩散模型
自然语言引导
高频信息与低频信息
image instance harmonization
pretrained diffusion model
natural language descriptions
high-frequency and lowfrequency information