3D-aware image synthesis has attained high quality and robust 3D consistency.Existing 3D controllable generative models are designed to synthesize 3D-aware images through a single modality,such as 2D segmentation or s...3D-aware image synthesis has attained high quality and robust 3D consistency.Existing 3D controllable generative models are designed to synthesize 3D-aware images through a single modality,such as 2D segmentation or sketches,but lack the ability to finely control generated content,such as texture and age.In pursuit of enhancing user-guided controllability,we propose Multi3D,a 3D-aware controllable image synthesis model that supports multi-modal input.Our model can govern the geometry of the generated image using a 2D label map,such as a segmentation or sketch map,while concurrently regulating the appearance of the generated image through a textual description.To demonstrate the effectiveness of our method,we have conducted experiments on multiple datasets,including CelebAMask-HQ,AFHQ-cat,and shapenet-car.Qualitative and quantitative evaluations show that our method outperforms existing state-of-the-art methods.展开更多
基金supported by the National Science and Technology Major Project(Grant No.2021ZD0112902)the National Natural Science Foundation of China(Project No.62220106003)a Research Grant from Beijing Higher Institution Engineering Research Center,and Tsinghua–Tencent Joint Laboratory for Internet Innovation Technology.
文摘3D-aware image synthesis has attained high quality and robust 3D consistency.Existing 3D controllable generative models are designed to synthesize 3D-aware images through a single modality,such as 2D segmentation or sketches,but lack the ability to finely control generated content,such as texture and age.In pursuit of enhancing user-guided controllability,we propose Multi3D,a 3D-aware controllable image synthesis model that supports multi-modal input.Our model can govern the geometry of the generated image using a 2D label map,such as a segmentation or sketch map,while concurrently regulating the appearance of the generated image through a textual description.To demonstrate the effectiveness of our method,we have conducted experiments on multiple datasets,including CelebAMask-HQ,AFHQ-cat,and shapenet-car.Qualitative and quantitative evaluations show that our method outperforms existing state-of-the-art methods.