Facial attribute editing has mainly two objectives:1)translating image from a source domain to a target one,and 2)only changing the facial regions related to a target attribute and preserving the attribute-excluding d...Facial attribute editing has mainly two objectives:1)translating image from a source domain to a target one,and 2)only changing the facial regions related to a target attribute and preserving the attribute-excluding details.In this work,we propose a multi-attention U-Net-based generative adversarial network(MU-GAN).First,we replace a classic convolutional encoder-decoder with a symmetric U-Net-like structure in a generator,and then apply an additive attention mechanism to build attention-based U-Net connections for adaptively transferring encoder representations to complement a decoder with attribute-excluding detail and enhance attribute editing ability.Second,a self-attention(SA)mechanism is incorporated into convolutional layers for modeling long-range and multi-level dependencies across image regions.Experimental results indicate that our method is capable of balancing attribute editing ability and details preservation ability,and can decouple the correlation among attributes.It outperforms the state-of-the-art methods in terms of attribute manipulation accuracy and image quality.Our code is available at https://github.com/SuSir1996/MU-GAN.展开更多
Face attribute classification(FAC)is a high-profile problem in biometric verification and face retrieval.Although recent research has been devoted to extracting more delicate image attribute features and exploiting th...Face attribute classification(FAC)is a high-profile problem in biometric verification and face retrieval.Although recent research has been devoted to extracting more delicate image attribute features and exploiting the inter-attribute correlations,significant challenges still remain.Wavelet scattering transform(WST)is a promising non-learned feature extractor.It has been shown to yield more discriminative representations and outperforms the learned representations in certain tasks.Applied to the image classification task,WST can enhance subtle image texture information and create local deformation stability.This paper designs a scattering-based hybrid block,to incorporate frequency-domain(WST)and image-domain features in a channel attention manner(Squeezeand-Excitation,SE),termed WS-SE block.Compared with CNN,WS-SE achieves a more efficient FAC performance and compensates for the model sensitivity of the small-scale affine transform.In addition,to further exploit the relationships among the attribute labels,we propose a learning strategy from a causal view.The cause attributes defined using the causalityrelated information can be utilized to infer the effect attributes with a high confidence level.Ablative analysis experiments demonstrate the effectiveness of our model,and our hybrid model obtains state-of-the-art results in two public datasets.展开更多
This study introduces a novel conditional recycle generative adversarial network for facial attribute transfor- mation, which can transform high-level semantic face attributes without changing the identity. In our app...This study introduces a novel conditional recycle generative adversarial network for facial attribute transfor- mation, which can transform high-level semantic face attributes without changing the identity. In our approach, we input a source facial image to the conditional generator with target attribute condition to generate a face with the target attribute. Then we recycle the generated face back to the same conditional generator with source attribute condition. A face which should be similar to that of the source face in personal identity and facial attributes is generated. Hence, we introduce a recycle reconstruction loss to enforce the final generated facial image and the source facial image to be identical. Evaluations on the CelebA dataset demonstrate the effectiveness of our approach. Qualitative results show that our approach can learn and generate high-quality identity-preserving facial images with specified attributes.展开更多
基金supported in part by the National Natural Science Foundation of China(NSFC)(62076093,61871182,61302163,61401154)the Beijing Natural Science Foundation(4192055)+3 种基金the Natural Science Foundation of Hebei Province of China(F2015502062,F2016502101,F2017502016)the Fundamental Research Funds for the Central Universities(2020YJ006,2020MS099)the Open Project Program of the National Laboratory of Pattern Recognition(NLPR)(201900051)The authors gratefully acknowledge the support of NVIDIA Corporation with the donation of the GPU used for this research.
文摘Facial attribute editing has mainly two objectives:1)translating image from a source domain to a target one,and 2)only changing the facial regions related to a target attribute and preserving the attribute-excluding details.In this work,we propose a multi-attention U-Net-based generative adversarial network(MU-GAN).First,we replace a classic convolutional encoder-decoder with a symmetric U-Net-like structure in a generator,and then apply an additive attention mechanism to build attention-based U-Net connections for adaptively transferring encoder representations to complement a decoder with attribute-excluding detail and enhance attribute editing ability.Second,a self-attention(SA)mechanism is incorporated into convolutional layers for modeling long-range and multi-level dependencies across image regions.Experimental results indicate that our method is capable of balancing attribute editing ability and details preservation ability,and can decouple the correlation among attributes.It outperforms the state-of-the-art methods in terms of attribute manipulation accuracy and image quality.Our code is available at https://github.com/SuSir1996/MU-GAN.
基金supported by the National Key Research and Development Project of China(Grant No.2018AAA0100802)Opening Foundation of National Engineering Laboratory for Intelligent Video Analysis and Application,and Experimental Center of Artificial Intelligence of Beijing Normal University.
文摘Face attribute classification(FAC)is a high-profile problem in biometric verification and face retrieval.Although recent research has been devoted to extracting more delicate image attribute features and exploiting the inter-attribute correlations,significant challenges still remain.Wavelet scattering transform(WST)is a promising non-learned feature extractor.It has been shown to yield more discriminative representations and outperforms the learned representations in certain tasks.Applied to the image classification task,WST can enhance subtle image texture information and create local deformation stability.This paper designs a scattering-based hybrid block,to incorporate frequency-domain(WST)and image-domain features in a channel attention manner(Squeezeand-Excitation,SE),termed WS-SE block.Compared with CNN,WS-SE achieves a more efficient FAC performance and compensates for the model sensitivity of the small-scale affine transform.In addition,to further exploit the relationships among the attribute labels,we propose a learning strategy from a causal view.The cause attributes defined using the causalityrelated information can be utilized to infer the effect attributes with a high confidence level.Ablative analysis experiments demonstrate the effectiveness of our model,and our hybrid model obtains state-of-the-art results in two public datasets.
基金This work was supported by the National Natural Science Foundation of China under Grant Nos. 61672520, 61573348, 61620106003, and 61720106006, the Beijing Natural Science Foundation of China under Grant No. 4162056, the National Key Technology Research and Development Program of China under Grant No. 2015BAH53F02, and the CASIA-Tencent YouTu Jointly Research Project. The Titan X used for this research was donated by the NVIDIA Corporation.
文摘This study introduces a novel conditional recycle generative adversarial network for facial attribute transfor- mation, which can transform high-level semantic face attributes without changing the identity. In our approach, we input a source facial image to the conditional generator with target attribute condition to generate a face with the target attribute. Then we recycle the generated face back to the same conditional generator with source attribute condition. A face which should be similar to that of the source face in personal identity and facial attributes is generated. Hence, we introduce a recycle reconstruction loss to enforce the final generated facial image and the source facial image to be identical. Evaluations on the CelebA dataset demonstrate the effectiveness of our approach. Qualitative results show that our approach can learn and generate high-quality identity-preserving facial images with specified attributes.