Semantic image synthesis aims to generate highquality images given semantic conditions,i.e.,segmentation masks and style reference images.Existing methods widely adopt generative adversarial networks(GANs).GANs take a...Semantic image synthesis aims to generate highquality images given semantic conditions,i.e.,segmentation masks and style reference images.Existing methods widely adopt generative adversarial networks(GANs).GANs take all conditional inputs and directly synthesize images in a single forward step.In this paper,semantic image synthesis is treated as an image denoising task and is handled with a novel image-to-image diffusion model(IIDM).展开更多
We present a novel framework for the multidomain synthesis of artworks from semantic layouts.One of the main limitations of this challenging task is the lack of publicly available segmentation datasets for art synthes...We present a novel framework for the multidomain synthesis of artworks from semantic layouts.One of the main limitations of this challenging task is the lack of publicly available segmentation datasets for art synthesis.To address this problem,we propose a dataset called ArtSem that contains 40,000 images of artwork from four different domains,with their corresponding semantic label maps.We first extracted semantic maps from landscape photography and used a conditional generative adversarial network(GAN)-based approach for generating high-quality artwork from semantic maps without requiring paired training data.Furthermore,we propose an artwork-synthesis model using domain-dependent variational encoders for high-quality multi-domain synthesis.Subsequently,the model was improved and complemented with a simple but effective normalization method based on jointly normalizing semantics and style,which we call spatially style-adaptive normalization(SSTAN).Compared to the previous methods,which only take semantic layout as the input,our model jointly learns style and semantic information representation,improving the generation quality of artistic images.These results indicate that our model learned to separate the domains in the latent space.Thus,we can perform fine-grained control of the synthesized artwork by identifying hyperplanes that separate the different domains.Moreover,by combining the proposed dataset and approach,we generated user-controllable artworks of higher quality than that of existing approaches,as corroborated by quantitative metrics and a user study.展开更多
基金supported by the National Natural Science Foundation for Young Scientists of China Award(No.62106289).
文摘Semantic image synthesis aims to generate highquality images given semantic conditions,i.e.,segmentation masks and style reference images.Existing methods widely adopt generative adversarial networks(GANs).GANs take all conditional inputs and directly synthesize images in a single forward step.In this paper,semantic image synthesis is treated as an image denoising task and is handled with a novel image-to-image diffusion model(IIDM).
基金supported by the Japan Science and Technology Agency Support for Pioneering Research Initiated by the Next Generation(JST SPRING)under Grant No.JPMJSP2124.
文摘We present a novel framework for the multidomain synthesis of artworks from semantic layouts.One of the main limitations of this challenging task is the lack of publicly available segmentation datasets for art synthesis.To address this problem,we propose a dataset called ArtSem that contains 40,000 images of artwork from four different domains,with their corresponding semantic label maps.We first extracted semantic maps from landscape photography and used a conditional generative adversarial network(GAN)-based approach for generating high-quality artwork from semantic maps without requiring paired training data.Furthermore,we propose an artwork-synthesis model using domain-dependent variational encoders for high-quality multi-domain synthesis.Subsequently,the model was improved and complemented with a simple but effective normalization method based on jointly normalizing semantics and style,which we call spatially style-adaptive normalization(SSTAN).Compared to the previous methods,which only take semantic layout as the input,our model jointly learns style and semantic information representation,improving the generation quality of artistic images.These results indicate that our model learned to separate the domains in the latent space.Thus,we can perform fine-grained control of the synthesized artwork by identifying hyperplanes that separate the different domains.Moreover,by combining the proposed dataset and approach,we generated user-controllable artworks of higher quality than that of existing approaches,as corroborated by quantitative metrics and a user study.