Text-to-image diffusion models have demonstrated impressive capabilities in image generation and have been effectively applied to image inpainting.While text prompt provides an intuitive guidance for conditional inpai...Text-to-image diffusion models have demonstrated impressive capabilities in image generation and have been effectively applied to image inpainting.While text prompt provides an intuitive guidance for conditional inpainting,users often seek the ability to inpaint a specific object with customized appearance by providing an exemplar image.Unfortunately,existing methods struggle to achieve high fidelity in exemplar-driven inpainting.To address this,we use a plug-and-play low-rank adaptation(LoRA)module based on a pretrained text-driven inpainting model.The LoRA module is dedicated to learn the exemplar-specific concepts through few-shot fine-tuning,bringing improved fitting capability to customized exemplar images,without intensive training on large-scale datasets.Additionally,we introduce GPT-4V prompting and prior noise initialization techniques to further facilitate the fidelity in inpainting results.In brief,the denoising diffusion process first starts with the noise derived from a composite exemplar-background image,and is subsequently guided by an expressive prompt generated from the exemplar using the GPT-4V model.Extensive experiments demonstrate that our method achieves state-of-the-art performance,qualitatively and quantitatively,offering users an exemplar-driven inpainting tool with enhanced customization capability.展开更多
基金Project supported by the National Natural Science Foundation of China(No.82027801)。
文摘Text-to-image diffusion models have demonstrated impressive capabilities in image generation and have been effectively applied to image inpainting.While text prompt provides an intuitive guidance for conditional inpainting,users often seek the ability to inpaint a specific object with customized appearance by providing an exemplar image.Unfortunately,existing methods struggle to achieve high fidelity in exemplar-driven inpainting.To address this,we use a plug-and-play low-rank adaptation(LoRA)module based on a pretrained text-driven inpainting model.The LoRA module is dedicated to learn the exemplar-specific concepts through few-shot fine-tuning,bringing improved fitting capability to customized exemplar images,without intensive training on large-scale datasets.Additionally,we introduce GPT-4V prompting and prior noise initialization techniques to further facilitate the fidelity in inpainting results.In brief,the denoising diffusion process first starts with the noise derived from a composite exemplar-background image,and is subsequently guided by an expressive prompt generated from the exemplar using the GPT-4V model.Extensive experiments demonstrate that our method achieves state-of-the-art performance,qualitatively and quantitatively,offering users an exemplar-driven inpainting tool with enhanced customization capability.