摘要
随着计算机视觉和生成模型的发展,图像生成技术取得了显著突破,已广泛应用在电子商务的产品展示中以增强用户交互体验。逼真服装模特生成是图像生成技术与电子商务业务深度融合的创新应用技术之一。然而,服装模特生成技术在电商应用中仍面临着诸多挑战,尤其在生成高质量、真实感的服装图像方面,难以精确呈现服装的事实一致性、纹理和细节,相比于真实图像的自然度和一致性较差。为了提升服装模特生成技术在电商应用中的效果,提出了一种改进的稳定扩散生成模型LoRA-DAE。通过低秩分解优化跨注意力层和卷积层的权重调整机制,并在生成过程的方向扩散步骤中添加自适应增强模块;采用细粒度的纹理增强策略,动态调整生成过程中的纹理与细节分布,解决了当前主流服装模特图像生成模型的纹理模糊和边缘失真等问题,提升了服装图像的细节表达能力和整体真实感。实验结果表明,LoRA-DAE在Fashion Mannequin数据集上取得了优于主流方法的性能表现,生成的模特图像在感知质量(用户评价)、定量指标(FID、IS、PSNR、SSIM值)和多模态大模型VQA评估上均具有显著提升。
With the advancement of computer vision and generative models,image generation technology has made significant strides,particularly in e-commerce product displays,enhancing user interaction.Realistic clothing model generation has become an innovative application,deeply integrating generative technology with e-commerce.However,challenges remain,especially in generating high-quality,realistic clothing images that capture details,texture,and consistency.Current models often struggle with accurately representing the factual consistency of clothing and maintaining naturalness and coherence compared to real images.To improve the performance of clothing model generation technology in e-commerce applications,this study presented LoRA-DAE,an improved stable diffusion generative model that integrated LoRA for optimized weight adjustment in attention and convolution layers.Additionally,it added an adaptive enhancement module to the generation process,dynamically adjusting texture and detail distribution,addressing issues like texture blurring and edge distortion.Experiments show that LoRA-DAE outperforms mainstream methods on the Fashion Mannequin dataset,achieving notable improvements in perceived quality(user evaluation),quantitative metrics(FID,IS,PSNR,SSIM),and multi-modal large model VQA evaluation.
作者
刘大伟
于碧辉
石珈维
魏靖烜
史慧洋
靳赫烜
孙林壮
Liu Dawei;Yu Bihui;Shi Jiawei;Wei Jingxuan;Shi Huiyang;Jin Hexuan;Sun Linzhuang(Shenyang Institute of Computing Technology,Chinese Academy of Sciences,Shenyang 110168,China;University of Chinese Academy of Sciences,Beijing 100049,China;School of Computer Science&Technology,University of Chinese Academy of Sciences,Beijing 101408,China)
出处
《计算机应用研究》
北大核心
2025年第8期2267-2273,共7页
Application Research of Computers
基金
沈阳市科技计划资助项目(23-407-3-29)。
关键词
稳定扩散
图像生成
自适应增强
模型微调
多模态评估
stable diffusion
image generation
adaptive enhancement
model fine-tuning
multimodal evaluation