During the image generation phase,the parserfree Flow-Style-VTON model(PF-Flow-Style-VTON),which utilizes distilled appearance flows,faces two main challenges:blurring,deformation,occlusion,or loss of the arm or palm ...During the image generation phase,the parserfree Flow-Style-VTON model(PF-Flow-Style-VTON),which utilizes distilled appearance flows,faces two main challenges:blurring,deformation,occlusion,or loss of the arm or palm regions in the generated image when these regions of the person occlude the garment;blurring and deformation in the generated image when the person performs large pose movements and the target garment is complex with detailed patterns.To solve these two problems,an improved virtual try-on network model,denoted as IPF-Flow-Style-VTON,is proposed.Firstly,a target warped garment mask refinement module(M-RM)is introduced to refine the warped garment mask and remove erroneous information in the arm and palm regions,thereby improving the quality of subsequent image generation.Secondly,an improved global attention module(GAM)is integrated into the original image generation network,enhancing the ResUNet’s understanding of global context and optimizing the fusion of local features and global information,thereby further improving image generation quality.Finally,the UniPose model is used to provide the pose keypoint information of the target person image,guiding the task execution during the image generation phase.Experiments conducted on the VITON dataset show that the proposed method outperforms the original method,Flow-Style-VTON,by 5.4%,0.3%,6.7%,and 2.2%in Frchet inception distance(FID),structural similarity index measure(SSIM),learned perceptual image patch similarity(LPIPS),and peak signal-to-noise ratio(PSNR),respectively.Overall,the proposed method effectively improves upon the shortcomings of the original network and achieves better visual results.展开更多
基金National Key R&D Program of China(No.2019YFC1521300)。
文摘During the image generation phase,the parserfree Flow-Style-VTON model(PF-Flow-Style-VTON),which utilizes distilled appearance flows,faces two main challenges:blurring,deformation,occlusion,or loss of the arm or palm regions in the generated image when these regions of the person occlude the garment;blurring and deformation in the generated image when the person performs large pose movements and the target garment is complex with detailed patterns.To solve these two problems,an improved virtual try-on network model,denoted as IPF-Flow-Style-VTON,is proposed.Firstly,a target warped garment mask refinement module(M-RM)is introduced to refine the warped garment mask and remove erroneous information in the arm and palm regions,thereby improving the quality of subsequent image generation.Secondly,an improved global attention module(GAM)is integrated into the original image generation network,enhancing the ResUNet’s understanding of global context and optimizing the fusion of local features and global information,thereby further improving image generation quality.Finally,the UniPose model is used to provide the pose keypoint information of the target person image,guiding the task execution during the image generation phase.Experiments conducted on the VITON dataset show that the proposed method outperforms the original method,Flow-Style-VTON,by 5.4%,0.3%,6.7%,and 2.2%in Frchet inception distance(FID),structural similarity index measure(SSIM),learned perceptual image patch similarity(LPIPS),and peak signal-to-noise ratio(PSNR),respectively.Overall,the proposed method effectively improves upon the shortcomings of the original network and achieves better visual results.