Person Image Synthesis has been widely used in fashion with extensive application scenarios.The point of this task is how to synthesise person image from a single source image under arbitrary poses.Prior methods gener...Person Image Synthesis has been widely used in fashion with extensive application scenarios.The point of this task is how to synthesise person image from a single source image under arbitrary poses.Prior methods generate the person image with target pose well;however,they fail to preserve the fine style details of the source image.To address this problem,a robust style injection(RSI)model is proposed,which is a coarse-to-fine framework to synthesise target the person image.RSI develops a simple and efficient cross-attention based module to fuse the features of both source semantic styles and target pose for achieving the coarse aligned features.The adaptive instance normalisation is employed to enhance the aligned features in conjunction with source semantic styles.Subsequently,source semantic styles are further injected into the positional normalisation scheme to avoid the fine style details erosion caused by massive convolution.In training losses,optimal transport theory in the form of energy distance is introduced to constrain data distribution to refine the texture style details.Additionally,the authors’model is capable of editing the shape and texture of garments to the target style separately.The experiments demonstrate that the authors’RSI achieves better performance over the state-of-art methods.展开更多
基金National Natural Science Foundation of China,Grant/Award Number:62176124。
文摘Person Image Synthesis has been widely used in fashion with extensive application scenarios.The point of this task is how to synthesise person image from a single source image under arbitrary poses.Prior methods generate the person image with target pose well;however,they fail to preserve the fine style details of the source image.To address this problem,a robust style injection(RSI)model is proposed,which is a coarse-to-fine framework to synthesise target the person image.RSI develops a simple and efficient cross-attention based module to fuse the features of both source semantic styles and target pose for achieving the coarse aligned features.The adaptive instance normalisation is employed to enhance the aligned features in conjunction with source semantic styles.Subsequently,source semantic styles are further injected into the positional normalisation scheme to avoid the fine style details erosion caused by massive convolution.In training losses,optimal transport theory in the form of energy distance is introduced to constrain data distribution to refine the texture style details.Additionally,the authors’model is capable of editing the shape and texture of garments to the target style separately.The experiments demonstrate that the authors’RSI achieves better performance over the state-of-art methods.