Finding suitable initial noise that retains the original image’s information is crucial for image-to-image(I2I)translation using text-to-image(T2I)diffusion models.A common approach is to add random noise directly to...Finding suitable initial noise that retains the original image’s information is crucial for image-to-image(I2I)translation using text-to-image(T2I)diffusion models.A common approach is to add random noise directly to the original image,as in SDEdit.However,we have observed that this can result in“semantic discrepancy”issues,wherein T2I diffusion models misinterpret the semantic relationships and generate content not present in the original image.We identify that the noise introduced by SDEdit disrupts the semantic integrity of the image,leading to unintended associations between unrelated regions after U-Net upsampling.Building on the widely-used latent diffusion model,Stable Diffusion,we propose a training-free,plugand-play method to alleviate semantic discrepancy and enhance the fidelity of the translated image.By leveraging the deterministic nature of denoising diffusion implicit models(DDIMs)inversion,we correct the erroneous features and correlations from the original generative process with accurate ones from DDIM inversion.This approach alleviates semantic discrepancy and surpasses recent DDIM-inversion-based methods such as PnP with fewer priors,achieving a speedup of 11.2 times in experiments conducted on COCO,ImageNet,and ImageNet-R datasets across multiple I2I translation tasks.展开更多
Diffusion-based models have recently achieved remarkable success in style transfer. However, when training data is scarce, existing methods struggle to effectively balance style and content. In this paper, we propose ...Diffusion-based models have recently achieved remarkable success in style transfer. However, when training data is scarce, existing methods struggle to effectively balance style and content. In this paper, we propose Style-Aware Diffusion (SAD), a novel method that harnesses efficient low-rank adaptation training techniques. Specifically, We extract latent representations of both style and content using DDIM inversion, formulated as an ordinary differential equation. Then, we use adaptive instance normalization and query–key–value injection to effectively align low-level style features with high-level content semantics. In addition, we propose parameter-efficient adaptation, which mitigates catastrophic forgetting and overfitting by rationally optimizing the weights of the attention layers, ensuring robust and effective performance, and achieving a 61.5% relative score increase over the plain model. The proposed method outperforms the high-performance DreamBooth-LoRA model and won the Fourth Jittor Artificial Intelligence Challenge. Our model is implemented using the Jittor framework and is available at https://github.com/liylo/jittor-qwqw-Few_Shot_Style_Transfer.展开更多
基金supported in part by the National Natural Science Foundation of China(62176059)supported by The Pennsylvania State University.
文摘Finding suitable initial noise that retains the original image’s information is crucial for image-to-image(I2I)translation using text-to-image(T2I)diffusion models.A common approach is to add random noise directly to the original image,as in SDEdit.However,we have observed that this can result in“semantic discrepancy”issues,wherein T2I diffusion models misinterpret the semantic relationships and generate content not present in the original image.We identify that the noise introduced by SDEdit disrupts the semantic integrity of the image,leading to unintended associations between unrelated regions after U-Net upsampling.Building on the widely-used latent diffusion model,Stable Diffusion,we propose a training-free,plugand-play method to alleviate semantic discrepancy and enhance the fidelity of the translated image.By leveraging the deterministic nature of denoising diffusion implicit models(DDIMs)inversion,we correct the erroneous features and correlations from the original generative process with accurate ones from DDIM inversion.This approach alleviates semantic discrepancy and surpasses recent DDIM-inversion-based methods such as PnP with fewer priors,achieving a speedup of 11.2 times in experiments conducted on COCO,ImageNet,and ImageNet-R datasets across multiple I2I translation tasks.
基金supported by the Postdoctoral Fellowship Program of CPSF(GZC20240829).
文摘Diffusion-based models have recently achieved remarkable success in style transfer. However, when training data is scarce, existing methods struggle to effectively balance style and content. In this paper, we propose Style-Aware Diffusion (SAD), a novel method that harnesses efficient low-rank adaptation training techniques. Specifically, We extract latent representations of both style and content using DDIM inversion, formulated as an ordinary differential equation. Then, we use adaptive instance normalization and query–key–value injection to effectively align low-level style features with high-level content semantics. In addition, we propose parameter-efficient adaptation, which mitigates catastrophic forgetting and overfitting by rationally optimizing the weights of the attention layers, ensuring robust and effective performance, and achieving a 61.5% relative score increase over the plain model. The proposed method outperforms the high-performance DreamBooth-LoRA model and won the Fourth Jittor Artificial Intelligence Challenge. Our model is implemented using the Jittor framework and is available at https://github.com/liylo/jittor-qwqw-Few_Shot_Style_Transfer.