A simple and effective image inpainting method is proposed in this paper, which is proved to be suitable for different kinds of target regions with shapes from little scraps to large unseemly objects in a wide range o...A simple and effective image inpainting method is proposed in this paper, which is proved to be suitable for different kinds of target regions with shapes from little scraps to large unseemly objects in a wide range of images. It is an important improvement upon the traditional image inpainting techniques. By introducing a new bijeetive-mapping term into the matching cost function, the artificial repetition problem in the final inpainting image is practically solved. In addition, by adopting an inpainting error map, not only the target pixels are refined gradually during the inpainting process but also the overlapped target patches are combined more seamlessly than previous method. Finally, the inpainting time is dramatically decreased by using a new acceleration method in the matching process.展开更多
Image generation is a hot topic in the academic recently,and has been applied to AI drawing,which can bring Vivid AI paintings without labor costs.In image generation,we represent the image as a random vector,assuming...Image generation is a hot topic in the academic recently,and has been applied to AI drawing,which can bring Vivid AI paintings without labor costs.In image generation,we represent the image as a random vector,assuming that the images of the natural scene obey an unknown distribution,we hope to estimate its distribution through some observation samples.Especially,with the development of GAN(Generative Adversarial Network),The generator and discriminator improve the model capability through adversarial,the quality of the generated image is also increasing.The image quality generated by the existing GAN based image generation model is so well-paint that it can be passed for genuine one.Based on the brief introduction of the concept ofGAN,this paper analyzes themain ideas of image synthesis,studies the representative SOTA GAN based Image synthesis method.展开更多
Semantic image synthesis aims to generate highquality images given semantic conditions,i.e.,segmentation masks and style reference images.Existing methods widely adopt generative adversarial networks(GANs).GANs take a...Semantic image synthesis aims to generate highquality images given semantic conditions,i.e.,segmentation masks and style reference images.Existing methods widely adopt generative adversarial networks(GANs).GANs take all conditional inputs and directly synthesize images in a single forward step.In this paper,semantic image synthesis is treated as an image denoising task and is handled with a novel image-to-image diffusion model(IIDM).展开更多
Medical image synthesis(MIS)can greatly save the economic and time costs of medical diagnosis.However,due to the complexity of medical images and similar characteristics of different tissue cells,existing methods face...Medical image synthesis(MIS)can greatly save the economic and time costs of medical diagnosis.However,due to the complexity of medical images and similar characteristics of different tissue cells,existing methods face great challenges in meeting their biological consistency.To this end,we propose the hybrid augmented generative adversarial network(HAGAN)to maintain the authenticity of structural texture and tissue cells.HAGAN contains attention mixed(AttnMix)generator,hierarchical discriminator and reverse skip connection between discriminator and generator.The AttnMix consistency differentiable regularization encourages the perception in structural and textural variations between real and fake images,which improves the pathological integrity of synthetic images and the accuracy of features in local areas.The hierarchical discriminator introduces pixel-by-pixel discriminant feedback to generator for enhancing the saliency and discriminance of global and local details simultaneously.The reverse skip connection further improves the accuracy for fine details by fusing real and synthetic distribution features.Our experimental evaluations on two datasets of different scales,i.e.,ACDC and BraTS2018,demonstrate that HAGAN outperforms the existing methods and achieves state-of-the-art performance in both high-resolution and low-resolution.展开更多
Vehicle Re-identification(Re-ID)has drawn extensive exploration recently;nevertheless,the issue of accurately distinguishing features in latent space across varying vehicle poses,remains a challenging hurdle for real-...Vehicle Re-identification(Re-ID)has drawn extensive exploration recently;nevertheless,the issue of accurately distinguishing features in latent space across varying vehicle poses,remains a challenging hurdle for real-world application of Vehicle Re-ID.To address this challenge,we supply a novel idea which projects the various-pose vehicle images into a unified target pose so as to promote the discriminative capability of vehicle Re-ID model.Acknowledging the labor and cost of paired data for the same vehicle images across different traffic surveillance cameras in practical scenarios,we propose the pioneering Pair-flexible Pose Guided Image Synthesis for vehicle Re-ID,denominated as VehicleGAN.Our method is adept at both supervised(paired images of same vehicle)and unsupervised(unpaired images of any vehicle)settings,and bypasses the need of geometric 3D model information.Furthermore,we propose a novel Joint Metric Learning(JML)method to facilitate the effective fusion of both real and synthetic data.Comprehensive experimental analyses conducted on the public VeRi-776 and VehicleID datasets substantiate the precision and efficacy of our proposed VehicleGAN and JML.展开更多
This paper presents a survey of image synthesis and editing with Generative Adversarial Networks(GANs). GANs consist of two deep networks, a generator and a discriminator, which are trained in a competitive way. Due...This paper presents a survey of image synthesis and editing with Generative Adversarial Networks(GANs). GANs consist of two deep networks, a generator and a discriminator, which are trained in a competitive way. Due to the power of deep networks and the competitive training manner, GANs are capable of producing reasonable and realistic images, and have shown great capability in many image synthesis and editing applications.This paper surveys recent GAN papers regarding topics including, but not limited to, texture synthesis, image inpainting, image-to-image translation, and image editing.展开更多
In many applications of computer graphics,art,and design,it is desirable for a user to provide intuitive non-image input,such as text,sketch,stroke,graph,or layout,and have a computer system automatically generate pho...In many applications of computer graphics,art,and design,it is desirable for a user to provide intuitive non-image input,such as text,sketch,stroke,graph,or layout,and have a computer system automatically generate photo-realistic images according to that input.While classically,works that allow such automatic image content generation have followed a framework of image retrieval and composition,recent advances in deep generative models such as generative adversarial networks(GANs),variational autoencoders(VAEs),and flow-based methods have enabled more powerful and versatile image generation approaches.This paper reviews recent works for image synthesis given intuitive user input,covering advances in input versatility,image generation methodology,benchmark datasets,and evaluation metrics.This motivates new perspectives on input representation and interactivity,cross fertilization between major image generation paradigms,and evaluation and comparison of generation methods.展开更多
Synthesizing a complex scene image with multiple objects and background according to text description is a challenging problem.It needs to solve several difficult tasks across the fields of natural language processing...Synthesizing a complex scene image with multiple objects and background according to text description is a challenging problem.It needs to solve several difficult tasks across the fields of natural language processing and computer vision.We model it as a combination of semantic entity recognition,object retrieval and recombination,and objects’status optimization.To reach a satisfactory result,we propose a comprehensive pipeline to convert the input text to its visual counterpart.The pipeline includes text processing,foreground objects and background scene retrieval,image synthesis using constrained MCMC,and post-processing.Firstly,we roughly divide the objects parsed from the input text into foreground objects and background scenes.Secondly,we retrieve the required foreground objects from the foreground object dataset segmented from Microsoft COCO dataset,and retrieve an appropriate background scene image from the background image dataset extracted from the Internet.Thirdly,in order to ensure the rationality of foreground objects’positions and sizes in the image synthesis step,we design a cost function and use the Markov Chain Monte Carlo(MCMC)method as the optimizer to solve this constrained layout problem.Finally,to make the image look natural and harmonious,we further use Poisson-based and relighting-based methods to blend foreground objects and background scene image in the post-processing step.The synthesized results and comparison results based on Microsoft COCO dataset prove that our method outperforms some of the state-of-the-art methods based on generative adversarial networks(GANs)in visual quality of generated scene images.展开更多
3D-aware image synthesis has attained high quality and robust 3D consistency.Existing 3D controllable generative models are designed to synthesize 3D-aware images through a single modality,such as 2D segmentation or s...3D-aware image synthesis has attained high quality and robust 3D consistency.Existing 3D controllable generative models are designed to synthesize 3D-aware images through a single modality,such as 2D segmentation or sketches,but lack the ability to finely control generated content,such as texture and age.In pursuit of enhancing user-guided controllability,we propose Multi3D,a 3D-aware controllable image synthesis model that supports multi-modal input.Our model can govern the geometry of the generated image using a 2D label map,such as a segmentation or sketch map,while concurrently regulating the appearance of the generated image through a textual description.To demonstrate the effectiveness of our method,we have conducted experiments on multiple datasets,including CelebAMask-HQ,AFHQ-cat,and shapenet-car.Qualitative and quantitative evaluations show that our method outperforms existing state-of-the-art methods.展开更多
Brain tumor segmentation is critical in clinical diagnosis and treatment planning.Existing methods for brain tumor segmentation with missing modalities often struggle when dealing with multiple missing modalities,a co...Brain tumor segmentation is critical in clinical diagnosis and treatment planning.Existing methods for brain tumor segmentation with missing modalities often struggle when dealing with multiple missing modalities,a common scenario in real-world clinical settings.These methods primarily focus on handling a single missing modality at a time,making them insufficiently robust for the additional complexity encountered with incomplete data containing various missing modality combinations.Additionally,most existing methods rely on single models,which may limit their performance and increase the risk of overfitting the training data.This work proposes a novel method called the ensemble adversarial co-training neural network(EACNet)for accurate brain tumor segmentation from multi-modal magnetic resonance imaging(MRI)scans with multiple missing modalities.The proposed method consists of three key modules:the ensemble of pre-trained models,which captures diverse feature representations from the MRI data by employing an ensemble of pre-trained models;adversarial learning,which leverages a competitive training approach involving two models;a generator model,which creates realistic missing data,while sub-networks acting as discriminators learn to distinguish real data from the generated“fake”data.Co-training framework utilizes the information extracted by the multimodal path(trained on complete scans)to guide the learning process in the path handling missing modalities.The model potentially compensates for missing information through co-training interactions by exploiting the relationships between available modalities and the tumor segmentation task.EACNet was evaluated on the BraTS2018 and BraTS2020 challenge datasets and achieved state-of-the-art and competitive performance respectively.Notably,the segmentation results for the whole tumor(WT)dice similarity coefficient(DSC)reached 89.27%,surpassing the performance of existing methods.The analysis suggests that the ensemble approach offers potential benefits,and the adversarial co-training contributes to the increased robustness and accuracy of EACNet for brain tumor segmentation of MRI scans with missing modalities.The experimental results show that EACNet has promising results for the task of brain tumor segmentation of MRI scans with missing modalities and is a better candidate for real-world clinical applications.展开更多
Realistic urban scene generation has been extensively studied for the sake of the development of autonomous vehicles. However, the research has primarily focused on the synthesis of vehicles and pedestrians, while the...Realistic urban scene generation has been extensively studied for the sake of the development of autonomous vehicles. However, the research has primarily focused on the synthesis of vehicles and pedestrians, while the generation of cyclists is rarely presented due to its complexity. This paper proposes a perspective-aware and realistic cyclist generation method via object retrieval. Images, semantic maps, and depth labels of objects are first collected from existing datasets, categorized by class and perspective, and calculated by an algorithm newly designed according to imaging principles. During scene generation, objects with the desired class and perspective are retrieved from the collection and inserted into the background, which is then sent to the modified 2D synthesis model to generate images. This pipeline introduces a perspective computing method, utilizes object retrieval to control the perspective accurately, and modifies a diffusion model to achieve high fidelity. Experiments show that our proposal gets a 2.36 Fréchet Inception Distance, which is lower than the competitive methods, indicating a superior realistic expression ability. When these images are used for augmentation in the semantic segmentation task, the performance of ResNet-50 on the target class can be improved by 4.47%. These results demonstrate that the proposed method can be used to generate cyclists in corner cases to augment model training data, further enhancing the perception capability of autonomous vehicles and improving the safety performance of autonomous driving technology.展开更多
Images that are taken underwater mostly present color shift with hazy effects due to the special property of water.Underwater image enhancement methods are proposed to handle this issue.However,their enhancement resul...Images that are taken underwater mostly present color shift with hazy effects due to the special property of water.Underwater image enhancement methods are proposed to handle this issue.However,their enhancement results are only evaluated on a small number of underwater images.The lack of a sufficiently large and diverse dataset for efficient evaluation of underwater image enhancement methods provokes the present paper.The present paper proposes an organized method to synthesize diverse underwater images,which can function as a benchmark dataset.The present synthesis is based on the underwater image formation model,which describes the physical degradation process.The indoor RGB-D image dataset is used as the seed for underwater style image generation.The ambient light is simulated based on the statistical mean value of real-world underwater images.Attenuation coefficients for diverse water types are carefully selected.Finally,in total 14490 underwater images of 10 water types are synthesized.Based on the synthesized database,state-of-the-art image enhancement methods are appropriately evaluated.Besides,the large diverse underwater image database is beneficial in the development of learning-based methods.展开更多
In recent years,radiotherapy based only on Magnetic Resonance(MR)images has become a hot spot for radiotherapy planning research in the current medical field.However,functional computed tomography(CT)is still needed f...In recent years,radiotherapy based only on Magnetic Resonance(MR)images has become a hot spot for radiotherapy planning research in the current medical field.However,functional computed tomography(CT)is still needed for dose calculation in the clinic.Recent deep-learning approaches to synthesized CT images from MR images have raised much research interest,making radiotherapy based only on MR images possible.In this paper,we proposed a novel unsupervised image synthesis framework with registration networks.This paper aims to enforce the constraints between the reconstructed image and the input image by registering the reconstructed image with the input image and registering the cycle-consistent image with the input image.Furthermore,this paper added ConvNeXt blocks to the network and used large kernel convolutional layers to improve the network’s ability to extract features.This research used the collected head and neck data of 180 patients with nasopharyngeal carcinoma to experiment and evaluate the training model with four evaluation metrics.At the same time,this research made a quantitative comparison of several commonly used model frameworks.We evaluate the model performance in four evaluation metrics which achieve Mean Absolute Error(MAE),Root Mean Square Error(RMSE),Peak Signal-to-Noise Ratio(PSNR),and Structural Similarity(SSIM)are 18.55±1.44,86.91±4.31,33.45±0.74 and 0.960±0.005,respectively.Compared with other methods,MAE decreased by 2.17,RMSE decreased by 7.82,PSNR increased by 0.76,and SSIM increased by 0.011.The results show that the model proposed in this paper outperforms other methods in the quality of image synthesis.The work in this paper is of guiding significance to the study of MR-only radiotherapy planning.展开更多
Intelligent identification of sandstone slice images using deep learning technology is the development trend of mineral identification,and accurate mineral particle segmentation is the most critical step for intellige...Intelligent identification of sandstone slice images using deep learning technology is the development trend of mineral identification,and accurate mineral particle segmentation is the most critical step for intelligent identification.A typical identification model requires many training samples to learn as many distinguishable features as possible.However,limited by the difficulty of data acquisition,the high cost of labeling,and privacy protection,this has led to a sparse sample number and cannot meet the training requirements of deep learning image identification models.In order to increase the number of samples and improve the training effect of deep learning models,this paper proposes a tight sandstone image data augmentation method by combining the advantages of the data deformation method and the data oversampling method in the Putaohua reservoir in the Sanzhao Sag of the Songliao Basin as the target area.First,the Style Generative Adversarial Network(StyleGAN)is improved to generate high-resolution tight sandstone images to improve data diversity.Second,we improve the Automatic Data Augmentation(AutoAugment)algorithm to search for the optimal augmentation strategy to expand the data scale.Finally,we design comparison experiments to demonstrate that this method has obvious advantages in generating image quality and improving the identification effect of deep learning models in real application scenarios.展开更多
The application of generative artificial intelligence(AI)is bringing about notable changes in anime creation.This paper surveys recent advancements and applications of diffusion and language models in anime generation...The application of generative artificial intelligence(AI)is bringing about notable changes in anime creation.This paper surveys recent advancements and applications of diffusion and language models in anime generation,focusing on their demonstrated potential to enhance production efficiency through automation and personalization.Despite these benefits,it is crucial to acknowledge the substantial initial computational investments required for training and deploying these models.We conduct an in-depth survey of cutting-edge generative AI technologies,encompassing models such as Stable Diffusion and GPT,and appraise pivotal large-scale datasets alongside quantifiable evaluation metrics.Review of the surveyed literature indicates the achievement of considerable maturity in the capacity of AI models to synthesize high-quality,aesthetically compelling anime visual images from textual prompts,alongside discernible progress in the generation of coherent narratives.However,achieving perfect long-form consistency,mitigating artifacts like flickering in video sequences,and enabling fine-grained artistic control remain critical ongoing challenges.Building upon these advancements,research efforts have increasingly pivoted towards the synthesis of higher-dimensional content,such as video and three-dimensional assets,with recent studies demonstrating significant progress in this burgeoning field.Nevertheless,formidable challenges endure amidst these advancements.Foremost among these are the substantial computational exigencies requisite for training and deploying these sophisticated models,particularly pronounced in the realm of high-dimensional generation such as video synthesis.Additional persistent hurdles include maintaining spatial-temporal consistency across complex scenes and mitigating ethical considerations surrounding bias and the preservation of human creative autonomy.This research underscores the transformative potential and inherent complexities of AI-driven synergy within the creative industries.We posit that future research should be dedicated to the synergistic fusion of diffusion and autoregressive models,the integration of multimodal inputs,and the balanced consideration of ethical implications,particularly regarding bias and the preservation of human creative autonomy,thereby establishing a robust foundation for the advancement of anime creation and the broader landscape of AI-driven content generation.展开更多
Background Three-dimensional terrain models are essential in domains such as video game development and film production.Because surface color is often correlated with terrain geometry,capturing this relationship is cr...Background Three-dimensional terrain models are essential in domains such as video game development and film production.Because surface color is often correlated with terrain geometry,capturing this relationship is critical for generating realistic results.However,most existing methods synthesize either a heightmap or a texture without adequately modeling their inherent correlation.Methods We propose a method that jointly generates terrain heightmaps and textures using a latent diffusion model.First,we train the model in an unsupervised manner to randomly generate paired heightmaps and textures.Then,we perform supervised learning on an external adapter to enable user control via hand-drawn sketches.Results Experiments demonstrate that our approach supports intuitive terrain generation while preserving the correlation between heightmaps and textures.Conclusion Our method outperforms the two-stage and GAN-based baselines by ensuring structural coherence,in which textures naturally align with geometry,successfully accommodating both realistic landscapes and extreme user-defined shapes.展开更多
The polyp dataset involves the confidentiality of medical records, so it might be difficult to obtain datasets with accurate annotations. This problem can be effectively solved by expanding the polyp data set with alg...The polyp dataset involves the confidentiality of medical records, so it might be difficult to obtain datasets with accurate annotations. This problem can be effectively solved by expanding the polyp data set with algorithms. The traditional polyp dataset expansion scheme usually requires the use of two models or traditional visual methods. These methods are both tedious and difficult to provide new polyp features for training data. Therefore, our research aims to efficiently generate high-quality polyp samples, so as to effectively expand the polyp dataset. In this study, we first added the attention mechanism to the generation model and improved the loss function to reduce the interference caused by reflection in the image generation process. Meanwhile, we used the improved generation model to remove polyps from the original image. In addition, we used masks of different shapes generated by random combinations to generate polyps with more characteristic information. The same generation model was used for the removal and generation of polyps. The generated polyp image has its own annotation, which is conducive to us directly using the expanded data set for training. Finally, we verified the effectiveness of the improved model and the dataset expansion scheme through a series of comparative experiments on the public dataset. The results showed that using the dataset we generate for training can significantly optimize the main performance indicators.展开更多
The performance of the deconvolution algorithm plays a crucial role in data processing of radio interferometers.The multi-scale multi-frequency synthesis(MSMFS)CLEAN is a widely used deconvolution algorithm for radio ...The performance of the deconvolution algorithm plays a crucial role in data processing of radio interferometers.The multi-scale multi-frequency synthesis(MSMFS)CLEAN is a widely used deconvolution algorithm for radio interferometric imaging,which combines the advantages of both wide-band synthesis imaging and multi-scale imaging and can substantially improve performance.However,how best to effectively determine the optimal scale is an important problem when implementing the MSMFS CLEAN algorithm.In this study,we proposed a Gaussian fitting method for multiple sources based on the gradient descent algorithm,with consideration of the influence of the point spread function(PSF).After fitting,we analyzed the fitting components using statistical analysis to derive reasonable scale information through the model parameters.A series of simulation validations demonstrated that the scales extracted by our proposed algorithm are accurate and reasonable.The proposed method can be applied to the deconvolution algorithm and provide modeling analysis for Gaussian sources,offering data support for source extraction algorithms.展开更多
Recently,the evolution of Generative Adversarial Networks(GANs)has embarked on a journey of revolutionizing the field of artificial and computational intelligence.To improve the generating ability of GANs,various loss...Recently,the evolution of Generative Adversarial Networks(GANs)has embarked on a journey of revolutionizing the field of artificial and computational intelligence.To improve the generating ability of GANs,various loss functions are introduced to measure the degree of similarity between the samples generated by the generator and the real data samples,and the effectiveness of the loss functions in improving the generating ability of GANs.In this paper,we present a detailed survey for the loss functions used in GANs,and provide a critical analysis on the pros and cons of these loss functions.First,the basic theory of GANs along with the training mechanism are introduced.Then,the most commonly used loss functions in GANs are introduced and analyzed.Third,the experimental analyses and comparison of these loss functions are presented in different GAN architectures.Finally,several suggestions on choosing suitable loss functions for image synthesis tasks are given.展开更多
Arbitrary style transfer aims to perceptually reflect the style of a reference image in artistic creations with visual aesthetics.Traditional style transfer models,particularly those using adaptive instance normalizat...Arbitrary style transfer aims to perceptually reflect the style of a reference image in artistic creations with visual aesthetics.Traditional style transfer models,particularly those using adaptive instance normalization(AdaIN)layer,rely on global statistics,which often fail to capture the spatially local color distribution,leading to outputs that lack variation despite geometric transformations.To address this,we introduce Patchified AdaIN,a color-inspired style transfer method that applies AdaIN to localized patches,utilizing local statistics to capture the spatial color distribution of the reference image.This approach enables enhanced color awareness in style transfer,adapting dynamically to geometric transformations by leveraging local image statistics.Since Patchified AdaIN builds on AdaIN,it integrates seamlessly into existing frameworks without the need for additional training,allowing users to control the output quality through adjustable blending parameters.Our comprehensive experiments demonstrate that Patchified AdaIN can reflect geometric transformations(e.g.,translation,rotation,flipping)of images for style transfer,thereby achieving superior results compared to state-of-the-art methods.Additional experiments show the compatibility of Patchified AdaIN for integration into existing networks to enable spatial color-aware arbitrary style transfer by replacing the conventional AdaIN layer with the Patchified AdaIN layer.展开更多
基金Supported by the National Natural Science Foundation of China (No. 60403044, No. 60373070) and partly funded by Microsoft Research Asia: Project 2004-Image-01.
文摘A simple and effective image inpainting method is proposed in this paper, which is proved to be suitable for different kinds of target regions with shapes from little scraps to large unseemly objects in a wide range of images. It is an important improvement upon the traditional image inpainting techniques. By introducing a new bijeetive-mapping term into the matching cost function, the artificial repetition problem in the final inpainting image is practically solved. In addition, by adopting an inpainting error map, not only the target pixels are refined gradually during the inpainting process but also the overlapped target patches are combined more seamlessly than previous method. Finally, the inpainting time is dramatically decreased by using a new acceleration method in the matching process.
文摘Image generation is a hot topic in the academic recently,and has been applied to AI drawing,which can bring Vivid AI paintings without labor costs.In image generation,we represent the image as a random vector,assuming that the images of the natural scene obey an unknown distribution,we hope to estimate its distribution through some observation samples.Especially,with the development of GAN(Generative Adversarial Network),The generator and discriminator improve the model capability through adversarial,the quality of the generated image is also increasing.The image quality generated by the existing GAN based image generation model is so well-paint that it can be passed for genuine one.Based on the brief introduction of the concept ofGAN,this paper analyzes themain ideas of image synthesis,studies the representative SOTA GAN based Image synthesis method.
基金supported by the National Natural Science Foundation for Young Scientists of China Award(No.62106289).
文摘Semantic image synthesis aims to generate highquality images given semantic conditions,i.e.,segmentation masks and style reference images.Existing methods widely adopt generative adversarial networks(GANs).GANs take all conditional inputs and directly synthesize images in a single forward step.In this paper,semantic image synthesis is treated as an image denoising task and is handled with a novel image-to-image diffusion model(IIDM).
基金supported in part by the National Natural Science Foundation of China(Nos.62376037,62306042,62006227 and 82202244)the Open Project Program of the Key Laboratory of Artificial Intelligence for Perception and Understanding,Liaoning Province(AIPU),China(No.20230006).
文摘Medical image synthesis(MIS)can greatly save the economic and time costs of medical diagnosis.However,due to the complexity of medical images and similar characteristics of different tissue cells,existing methods face great challenges in meeting their biological consistency.To this end,we propose the hybrid augmented generative adversarial network(HAGAN)to maintain the authenticity of structural texture and tissue cells.HAGAN contains attention mixed(AttnMix)generator,hierarchical discriminator and reverse skip connection between discriminator and generator.The AttnMix consistency differentiable regularization encourages the perception in structural and textural variations between real and fake images,which improves the pathological integrity of synthetic images and the accuracy of features in local areas.The hierarchical discriminator introduces pixel-by-pixel discriminant feedback to generator for enhancing the saliency and discriminance of global and local details simultaneously.The reverse skip connection further improves the accuracy for fine details by fusing real and synthetic distribution features.Our experimental evaluations on two datasets of different scales,i.e.,ACDC and BraTS2018,demonstrate that HAGAN outperforms the existing methods and achieves state-of-the-art performance in both high-resolution and low-resolution.
文摘Vehicle Re-identification(Re-ID)has drawn extensive exploration recently;nevertheless,the issue of accurately distinguishing features in latent space across varying vehicle poses,remains a challenging hurdle for real-world application of Vehicle Re-ID.To address this challenge,we supply a novel idea which projects the various-pose vehicle images into a unified target pose so as to promote the discriminative capability of vehicle Re-ID model.Acknowledging the labor and cost of paired data for the same vehicle images across different traffic surveillance cameras in practical scenarios,we propose the pioneering Pair-flexible Pose Guided Image Synthesis for vehicle Re-ID,denominated as VehicleGAN.Our method is adept at both supervised(paired images of same vehicle)and unsupervised(unpaired images of any vehicle)settings,and bypasses the need of geometric 3D model information.Furthermore,we propose a novel Joint Metric Learning(JML)method to facilitate the effective fusion of both real and synthetic data.Comprehensive experimental analyses conducted on the public VeRi-776 and VehicleID datasets substantiate the precision and efficacy of our proposed VehicleGAN and JML.
基金supported by the National Key Technology R&D Program(No.2016YFB1001402)the National Natural Science Foundation of China(No.61521002)+2 种基金the Joint NSFC-ISF Research Program(No.61561146393)Research Grant of Beijing Higher Institution Engineering Research Center and Tsinghua-Tencent Joint Laboratory for Internet Innovation Technologysupported by the EPSRC CDE(No.EP/L016540/1)
文摘This paper presents a survey of image synthesis and editing with Generative Adversarial Networks(GANs). GANs consist of two deep networks, a generator and a discriminator, which are trained in a competitive way. Due to the power of deep networks and the competitive training manner, GANs are capable of producing reasonable and realistic images, and have shown great capability in many image synthesis and editing applications.This paper surveys recent GAN papers regarding topics including, but not limited to, texture synthesis, image inpainting, image-to-image translation, and image editing.
基金supported by the National Natural Science Foundation of China(Project Nos.61521002 and 61772298)。
文摘In many applications of computer graphics,art,and design,it is desirable for a user to provide intuitive non-image input,such as text,sketch,stroke,graph,or layout,and have a computer system automatically generate photo-realistic images according to that input.While classically,works that allow such automatic image content generation have followed a framework of image retrieval and composition,recent advances in deep generative models such as generative adversarial networks(GANs),variational autoencoders(VAEs),and flow-based methods have enabled more powerful and versatile image generation approaches.This paper reviews recent works for image synthesis given intuitive user input,covering advances in input versatility,image generation methodology,benchmark datasets,and evaluation metrics.This motivates new perspectives on input representation and interactivity,cross fertilization between major image generation paradigms,and evaluation and comparison of generation methods.
基金supported by the Key Technological Innovation Projects of Hubei Province of China under Grant No.2018AAA062the Wuhan Science and Technology Plan Project of Hubei Province of China under Grant No.2017010201010109,the National Key Research and Development Program of China under Grant No.2017YFB1002600the National Natural Science Foundation of China under Grant Nos.61672390 and 61972298.
文摘Synthesizing a complex scene image with multiple objects and background according to text description is a challenging problem.It needs to solve several difficult tasks across the fields of natural language processing and computer vision.We model it as a combination of semantic entity recognition,object retrieval and recombination,and objects’status optimization.To reach a satisfactory result,we propose a comprehensive pipeline to convert the input text to its visual counterpart.The pipeline includes text processing,foreground objects and background scene retrieval,image synthesis using constrained MCMC,and post-processing.Firstly,we roughly divide the objects parsed from the input text into foreground objects and background scenes.Secondly,we retrieve the required foreground objects from the foreground object dataset segmented from Microsoft COCO dataset,and retrieve an appropriate background scene image from the background image dataset extracted from the Internet.Thirdly,in order to ensure the rationality of foreground objects’positions and sizes in the image synthesis step,we design a cost function and use the Markov Chain Monte Carlo(MCMC)method as the optimizer to solve this constrained layout problem.Finally,to make the image look natural and harmonious,we further use Poisson-based and relighting-based methods to blend foreground objects and background scene image in the post-processing step.The synthesized results and comparison results based on Microsoft COCO dataset prove that our method outperforms some of the state-of-the-art methods based on generative adversarial networks(GANs)in visual quality of generated scene images.
基金supported by the National Science and Technology Major Project(Grant No.2021ZD0112902)the National Natural Science Foundation of China(Project No.62220106003)a Research Grant from Beijing Higher Institution Engineering Research Center,and Tsinghua–Tencent Joint Laboratory for Internet Innovation Technology.
文摘3D-aware image synthesis has attained high quality and robust 3D consistency.Existing 3D controllable generative models are designed to synthesize 3D-aware images through a single modality,such as 2D segmentation or sketches,but lack the ability to finely control generated content,such as texture and age.In pursuit of enhancing user-guided controllability,we propose Multi3D,a 3D-aware controllable image synthesis model that supports multi-modal input.Our model can govern the geometry of the generated image using a 2D label map,such as a segmentation or sketch map,while concurrently regulating the appearance of the generated image through a textual description.To demonstrate the effectiveness of our method,we have conducted experiments on multiple datasets,including CelebAMask-HQ,AFHQ-cat,and shapenet-car.Qualitative and quantitative evaluations show that our method outperforms existing state-of-the-art methods.
基金supported by Gansu Natural Science Foundation Programme(No.24JRRA231)National Natural Science Foundation of China(No.62061023)Gansu Provincial Education,Science and Technology Innovation and Industry(No.2021CYZC-04)。
文摘Brain tumor segmentation is critical in clinical diagnosis and treatment planning.Existing methods for brain tumor segmentation with missing modalities often struggle when dealing with multiple missing modalities,a common scenario in real-world clinical settings.These methods primarily focus on handling a single missing modality at a time,making them insufficiently robust for the additional complexity encountered with incomplete data containing various missing modality combinations.Additionally,most existing methods rely on single models,which may limit their performance and increase the risk of overfitting the training data.This work proposes a novel method called the ensemble adversarial co-training neural network(EACNet)for accurate brain tumor segmentation from multi-modal magnetic resonance imaging(MRI)scans with multiple missing modalities.The proposed method consists of three key modules:the ensemble of pre-trained models,which captures diverse feature representations from the MRI data by employing an ensemble of pre-trained models;adversarial learning,which leverages a competitive training approach involving two models;a generator model,which creates realistic missing data,while sub-networks acting as discriminators learn to distinguish real data from the generated“fake”data.Co-training framework utilizes the information extracted by the multimodal path(trained on complete scans)to guide the learning process in the path handling missing modalities.The model potentially compensates for missing information through co-training interactions by exploiting the relationships between available modalities and the tumor segmentation task.EACNet was evaluated on the BraTS2018 and BraTS2020 challenge datasets and achieved state-of-the-art and competitive performance respectively.Notably,the segmentation results for the whole tumor(WT)dice similarity coefficient(DSC)reached 89.27%,surpassing the performance of existing methods.The analysis suggests that the ensemble approach offers potential benefits,and the adversarial co-training contributes to the increased robustness and accuracy of EACNet for brain tumor segmentation of MRI scans with missing modalities.The experimental results show that EACNet has promising results for the task of brain tumor segmentation of MRI scans with missing modalities and is a better candidate for real-world clinical applications.
基金supported by the Cultivation Program for Major Scientific Research Projects of Harbin Institute of Technology(ZDXMPY20180109).
文摘Realistic urban scene generation has been extensively studied for the sake of the development of autonomous vehicles. However, the research has primarily focused on the synthesis of vehicles and pedestrians, while the generation of cyclists is rarely presented due to its complexity. This paper proposes a perspective-aware and realistic cyclist generation method via object retrieval. Images, semantic maps, and depth labels of objects are first collected from existing datasets, categorized by class and perspective, and calculated by an algorithm newly designed according to imaging principles. During scene generation, objects with the desired class and perspective are retrieved from the collection and inserted into the background, which is then sent to the modified 2D synthesis model to generate images. This pipeline introduces a perspective computing method, utilizes object retrieval to control the perspective accurately, and modifies a diffusion model to achieve high fidelity. Experiments show that our proposal gets a 2.36 Fréchet Inception Distance, which is lower than the competitive methods, indicating a superior realistic expression ability. When these images are used for augmentation in the semantic segmentation task, the performance of ResNet-50 on the target class can be improved by 4.47%. These results demonstrate that the proposed method can be used to generate cyclists in corner cases to augment model training data, further enhancing the perception capability of autonomous vehicles and improving the safety performance of autonomous driving technology.
文摘Images that are taken underwater mostly present color shift with hazy effects due to the special property of water.Underwater image enhancement methods are proposed to handle this issue.However,their enhancement results are only evaluated on a small number of underwater images.The lack of a sufficiently large and diverse dataset for efficient evaluation of underwater image enhancement methods provokes the present paper.The present paper proposes an organized method to synthesize diverse underwater images,which can function as a benchmark dataset.The present synthesis is based on the underwater image formation model,which describes the physical degradation process.The indoor RGB-D image dataset is used as the seed for underwater style image generation.The ambient light is simulated based on the statistical mean value of real-world underwater images.Attenuation coefficients for diverse water types are carefully selected.Finally,in total 14490 underwater images of 10 water types are synthesized.Based on the synthesized database,state-of-the-art image enhancement methods are appropriately evaluated.Besides,the large diverse underwater image database is beneficial in the development of learning-based methods.
基金supported by the National Science Foundation for Young Scientists of China(Grant No.61806060)2019-2021,the Basic and Applied Basic Research Foundation of Guangdong Province(2021A1515220140)the Youth Innovation Project of Sun Yat-sen University Cancer Center(QNYCPY32).
文摘In recent years,radiotherapy based only on Magnetic Resonance(MR)images has become a hot spot for radiotherapy planning research in the current medical field.However,functional computed tomography(CT)is still needed for dose calculation in the clinic.Recent deep-learning approaches to synthesized CT images from MR images have raised much research interest,making radiotherapy based only on MR images possible.In this paper,we proposed a novel unsupervised image synthesis framework with registration networks.This paper aims to enforce the constraints between the reconstructed image and the input image by registering the reconstructed image with the input image and registering the cycle-consistent image with the input image.Furthermore,this paper added ConvNeXt blocks to the network and used large kernel convolutional layers to improve the network’s ability to extract features.This research used the collected head and neck data of 180 patients with nasopharyngeal carcinoma to experiment and evaluate the training model with four evaluation metrics.At the same time,this research made a quantitative comparison of several commonly used model frameworks.We evaluate the model performance in four evaluation metrics which achieve Mean Absolute Error(MAE),Root Mean Square Error(RMSE),Peak Signal-to-Noise Ratio(PSNR),and Structural Similarity(SSIM)are 18.55±1.44,86.91±4.31,33.45±0.74 and 0.960±0.005,respectively.Compared with other methods,MAE decreased by 2.17,RMSE decreased by 7.82,PSNR increased by 0.76,and SSIM increased by 0.011.The results show that the model proposed in this paper outperforms other methods in the quality of image synthesis.The work in this paper is of guiding significance to the study of MR-only radiotherapy planning.
基金This research was funded by the National Natural Science Foundation of China(Project No.42172161)Heilongjiang Provincial Natural Science Foundation of China(Project No.LH2020F003)+1 种基金Heilongjiang Provincial Department of Education Project of China(Project No.UNPYSCT-2020144)Northeast Petroleum University Guided Innovation Fund(2021YDL-12).
文摘Intelligent identification of sandstone slice images using deep learning technology is the development trend of mineral identification,and accurate mineral particle segmentation is the most critical step for intelligent identification.A typical identification model requires many training samples to learn as many distinguishable features as possible.However,limited by the difficulty of data acquisition,the high cost of labeling,and privacy protection,this has led to a sparse sample number and cannot meet the training requirements of deep learning image identification models.In order to increase the number of samples and improve the training effect of deep learning models,this paper proposes a tight sandstone image data augmentation method by combining the advantages of the data deformation method and the data oversampling method in the Putaohua reservoir in the Sanzhao Sag of the Songliao Basin as the target area.First,the Style Generative Adversarial Network(StyleGAN)is improved to generate high-resolution tight sandstone images to improve data diversity.Second,we improve the Automatic Data Augmentation(AutoAugment)algorithm to search for the optimal augmentation strategy to expand the data scale.Finally,we design comparison experiments to demonstrate that this method has obvious advantages in generating image quality and improving the identification effect of deep learning models in real application scenarios.
基金supported by the National Natural Science Foundation of China(Grant No.62202210).
文摘The application of generative artificial intelligence(AI)is bringing about notable changes in anime creation.This paper surveys recent advancements and applications of diffusion and language models in anime generation,focusing on their demonstrated potential to enhance production efficiency through automation and personalization.Despite these benefits,it is crucial to acknowledge the substantial initial computational investments required for training and deploying these models.We conduct an in-depth survey of cutting-edge generative AI technologies,encompassing models such as Stable Diffusion and GPT,and appraise pivotal large-scale datasets alongside quantifiable evaluation metrics.Review of the surveyed literature indicates the achievement of considerable maturity in the capacity of AI models to synthesize high-quality,aesthetically compelling anime visual images from textual prompts,alongside discernible progress in the generation of coherent narratives.However,achieving perfect long-form consistency,mitigating artifacts like flickering in video sequences,and enabling fine-grained artistic control remain critical ongoing challenges.Building upon these advancements,research efforts have increasingly pivoted towards the synthesis of higher-dimensional content,such as video and three-dimensional assets,with recent studies demonstrating significant progress in this burgeoning field.Nevertheless,formidable challenges endure amidst these advancements.Foremost among these are the substantial computational exigencies requisite for training and deploying these sophisticated models,particularly pronounced in the realm of high-dimensional generation such as video synthesis.Additional persistent hurdles include maintaining spatial-temporal consistency across complex scenes and mitigating ethical considerations surrounding bias and the preservation of human creative autonomy.This research underscores the transformative potential and inherent complexities of AI-driven synergy within the creative industries.We posit that future research should be dedicated to the synergistic fusion of diffusion and autoregressive models,the integration of multimodal inputs,and the balanced consideration of ethical implications,particularly regarding bias and the preservation of human creative autonomy,thereby establishing a robust foundation for the advancement of anime creation and the broader landscape of AI-driven content generation.
文摘Background Three-dimensional terrain models are essential in domains such as video game development and film production.Because surface color is often correlated with terrain geometry,capturing this relationship is critical for generating realistic results.However,most existing methods synthesize either a heightmap or a texture without adequately modeling their inherent correlation.Methods We propose a method that jointly generates terrain heightmaps and textures using a latent diffusion model.First,we train the model in an unsupervised manner to randomly generate paired heightmaps and textures.Then,we perform supervised learning on an external adapter to enable user control via hand-drawn sketches.Results Experiments demonstrate that our approach supports intuitive terrain generation while preserving the correlation between heightmaps and textures.Conclusion Our method outperforms the two-stage and GAN-based baselines by ensuring structural coherence,in which textures naturally align with geometry,successfully accommodating both realistic landscapes and extreme user-defined shapes.
基金supported by the Natural Science Foundation Project of Fujian Province,China(Grant Nos.2023J011439 and 2019J01859).
文摘The polyp dataset involves the confidentiality of medical records, so it might be difficult to obtain datasets with accurate annotations. This problem can be effectively solved by expanding the polyp data set with algorithms. The traditional polyp dataset expansion scheme usually requires the use of two models or traditional visual methods. These methods are both tedious and difficult to provide new polyp features for training data. Therefore, our research aims to efficiently generate high-quality polyp samples, so as to effectively expand the polyp dataset. In this study, we first added the attention mechanism to the generation model and improved the loss function to reduce the interference caused by reflection in the image generation process. Meanwhile, we used the improved generation model to remove polyps from the original image. In addition, we used masks of different shapes generated by random combinations to generate polyps with more characteristic information. The same generation model was used for the removal and generation of polyps. The generated polyp image has its own annotation, which is conducive to us directly using the expanded data set for training. Finally, we verified the effectiveness of the improved model and the dataset expansion scheme through a series of comparative experiments on the public dataset. The results showed that using the dataset we generate for training can significantly optimize the main performance indicators.
基金supported by the China National SKA Program(2020SKA0110300)the Natural Science Foundation of China(12433012,12373097)the Guangdong Province Basic and Applied Basic Research Foundation Project of Guangdong Province(2024A1515011503).
文摘The performance of the deconvolution algorithm plays a crucial role in data processing of radio interferometers.The multi-scale multi-frequency synthesis(MSMFS)CLEAN is a widely used deconvolution algorithm for radio interferometric imaging,which combines the advantages of both wide-band synthesis imaging and multi-scale imaging and can substantially improve performance.However,how best to effectively determine the optimal scale is an important problem when implementing the MSMFS CLEAN algorithm.In this study,we proposed a Gaussian fitting method for multiple sources based on the gradient descent algorithm,with consideration of the influence of the point spread function(PSF).After fitting,we analyzed the fitting components using statistical analysis to derive reasonable scale information through the model parameters.A series of simulation validations demonstrated that the scales extracted by our proposed algorithm are accurate and reasonable.The proposed method can be applied to the deconvolution algorithm and provide modeling analysis for Gaussian sources,offering data support for source extraction algorithms.
文摘Recently,the evolution of Generative Adversarial Networks(GANs)has embarked on a journey of revolutionizing the field of artificial and computational intelligence.To improve the generating ability of GANs,various loss functions are introduced to measure the degree of similarity between the samples generated by the generator and the real data samples,and the effectiveness of the loss functions in improving the generating ability of GANs.In this paper,we present a detailed survey for the loss functions used in GANs,and provide a critical analysis on the pros and cons of these loss functions.First,the basic theory of GANs along with the training mechanism are introduced.Then,the most commonly used loss functions in GANs are introduced and analyzed.Third,the experimental analyses and comparison of these loss functions are presented in different GAN architectures.Finally,several suggestions on choosing suitable loss functions for image synthesis tasks are given.
基金supported by the National Research Foundation of Korea (NRF)grant funded by the Korean government (MSIT) (No.2022R1A2C1004657,Contribution Rate:50%)Culture,Sports and Tourism R&D Program through the Korea Creative Content Agency grant funded by Ministry of Culture Sports and Tourism in 2024 (Project Name:Developing Professionals for R&D in Contents Production Based on Generative Ai and Cloud,Project Number:RS-2024-00352578,Contribution Rate:50%).
文摘Arbitrary style transfer aims to perceptually reflect the style of a reference image in artistic creations with visual aesthetics.Traditional style transfer models,particularly those using adaptive instance normalization(AdaIN)layer,rely on global statistics,which often fail to capture the spatially local color distribution,leading to outputs that lack variation despite geometric transformations.To address this,we introduce Patchified AdaIN,a color-inspired style transfer method that applies AdaIN to localized patches,utilizing local statistics to capture the spatial color distribution of the reference image.This approach enables enhanced color awareness in style transfer,adapting dynamically to geometric transformations by leveraging local image statistics.Since Patchified AdaIN builds on AdaIN,it integrates seamlessly into existing frameworks without the need for additional training,allowing users to control the output quality through adjustable blending parameters.Our comprehensive experiments demonstrate that Patchified AdaIN can reflect geometric transformations(e.g.,translation,rotation,flipping)of images for style transfer,thereby achieving superior results compared to state-of-the-art methods.Additional experiments show the compatibility of Patchified AdaIN for integration into existing networks to enable spatial color-aware arbitrary style transfer by replacing the conventional AdaIN layer with the Patchified AdaIN layer.