Generative adversarial networks(GANs)with gaming abilities have been widely applied in image generation.However,gamistic generators and discriminators may reduce the robustness of the obtained GANs in image generation...Generative adversarial networks(GANs)with gaming abilities have been widely applied in image generation.However,gamistic generators and discriminators may reduce the robustness of the obtained GANs in image generation under varying scenes.Enhancing the relation of hierarchical information in a generation network and enlarging differences of different network architectures can facilitate more structural information to improve the generation effect for image generation.In this paper,we propose an enhanced GAN via improving a generator for image generation(EIGGAN).EIGGAN applies a spatial attention to a generator to extract salient information to enhance the truthfulness of the generated images.Taking into relation the context account,parallel residual operations are fused into a generation network to extract more structural information from the different layers.Finally,a mixed loss function in a GAN is exploited to make a tradeoff between speed and accuracy to generate more realistic images.Experimental results show that the proposed method is superior to popular methods,i.e.,Wasserstein GAN with gradient penalty(WGAN-GP)in terms of many indexes,i.e.,Frechet Inception Distance,Learned Perceptual Image Patch Similarity,Multi-Scale Structural Similarity Index Measure,Kernel Inception Distance,Number of Statistically-Different Bins,Inception Score and some visual images for image generation.展开更多
The applications of machine learning(ML)in the medical domain are often hindered by the limited availability of high-quality data.To address this challenge,we explore the synthetic generation of echocardiography image...The applications of machine learning(ML)in the medical domain are often hindered by the limited availability of high-quality data.To address this challenge,we explore the synthetic generation of echocardiography images(echoCG)using state-of-the-art generative models.We conduct a comprehensive evaluation of three prominent methods:Cycle-consistent generative adversarial network(CycleGAN),Contrastive Unpaired Translation(CUT),and Stable Diffusion 1.5 with Low-Rank Adaptation(LoRA).Our research presents the data generation methodol-ogy,image samples,and evaluation strategy,followed by an extensive user study involving licensed cardiologists and surgeons who assess the perceived quality and medical soundness of the generated images.Our findings indicate that Stable Diffusion outperforms both CycleGAN and CUT in generating images that are nearly indistinguishable from real echoCG images,making it a promising tool for augmenting medical datasets.However,we also identify limitations in the synthetic images generated by CycleGAN and CUT,which are easily distinguishable as non-realistic by medical professionals.This study highlights the potential of diffusion models in medical imaging and their applicability in addressing data scarcity,while also outlining the areas for future improvement.展开更多
Remote sensing images often need to be merged into a larger mosaic image to support analysis on large areas in many applications.However,the performance of the mosaic imagery may be severely restricted if there are ma...Remote sensing images often need to be merged into a larger mosaic image to support analysis on large areas in many applications.However,the performance of the mosaic imagery may be severely restricted if there are many areas with cloud coverage or if these images used for merging have a long-time span.Therefore,this paper proposes a method of image selection for full coverage image(i.e.a mosaic image with no cloud-contaminated pixels)generation.Specifically,a novel High-Frequency-Aware(HFA)-Net based on Swin-Transformer for region quality grading is presented to provide a data basis for image selection.Spatiotemporal constraints are presented to optimize the image selection.In the temporal dimension,the shortest-time-span constraint shortens the time span of the selected images,obviously improving the timeliness of the image selection results(i.e.with a shorter time span).In the spatial dimension,a spatial continuity constraint is proposed to select data with better quality and larger area,thus improving the radiometric continuity of the results.Experiments on the GF-1 images indicate that the proposed method reduces the averages by 76.1%and 38.7%in terms of the shortest time span compared to the Improved Coverage-oriented Retrieval algorithm(MICR)and Retrieval Method based on Grid Compensation(RMGC)methods,respectively.Moreover,the proposed method also reduces the residual cloud amount by an average of 91.2%,89.8%,and 83.4%when compared to the MICR,RMGC,and Pixel-based Time-series Synthesis Method(PTSM)methods,respectively.展开更多
A phase-aware cross-modal framework is presented that synthesizes UWF_FA from non-invasive UWF_RI for diabetic retinopathy(DR)stratification.A curated cohort of 1198 patients(2915 UWF_RI and 17,854 UWF_FA images)with ...A phase-aware cross-modal framework is presented that synthesizes UWF_FA from non-invasive UWF_RI for diabetic retinopathy(DR)stratification.A curated cohort of 1198 patients(2915 UWF_RI and 17,854 UWF_FA images)with strict registration quality supports training across three angiographic phases(initial,mid,final).The generator is based on a modified pix2pixHD with an added Gradient Variance Loss to better preserve microvasculature,and is evaluated using MAE,PSNR,SSIM,and MS-SSIM on held-out pairs.Quantitatively,the mid phase achieves the lowestMAE(98.76±42.67),while SSIM remains high across phases.Expert reviewshows substantial agreement(Cohen's κ=0.78–0.82)and Turing-stylemisclassification of 50%–70%of synthetic images as real,indicating strong perceptual realism.For downstream DR stratification,fusing multi-phase synthetic UWF_FA with UWF_RI in a Swin Transformer classifier yields significant gains over a UWF_RI-only baseline,with the full-phase setting(Set D)reaching AUC=0.910 and accuracy=0.829.These results support synthetic UWF_FA as a scalable,non-invasive complement to dye-based angiography that enhances screening accuracy while avoiding injection-related risks.展开更多
Training generative adversarial networks is data-demanding,which limits the development of these models on target domains with inadequate training data.Recently,researchers have leveraged generative models pretrained ...Training generative adversarial networks is data-demanding,which limits the development of these models on target domains with inadequate training data.Recently,researchers have leveraged generative models pretrained on sufficient data and fine-tuned them using small training samples,thus reducing data requirements.However,due to the lack of explicit focus on target styles and disproportionately concentrating on generative consistency,these methods do not perform well in diversity preservation which represents the adaptation ability for few-shot generative models.To mitigate the diversity degradation,we propose a framework with two key strategies:1)To obtain more diverse styles from limited training data effectively,we propose a cross-modal module that explicitly obtains the target styles with a style prototype space and text-guided style instructions.2)To inherit the generation capability from the pretrained model,we aim to constrain the similarity between the generated and source images with a structural discrepancy alignment module by maintaining the structure correlation in multiscale areas.We demonstrate the effectiveness of our method,which outperforms state-of-the-art methods in mitigating diversity degradation through extensive experiments and analyses.展开更多
Image generation models have made remarkable progress,and image evaluation is crucial for explaining and driving the development of these models.Previous studies have extensively explored human and automatic evaluatio...Image generation models have made remarkable progress,and image evaluation is crucial for explaining and driving the development of these models.Previous studies have extensively explored human and automatic evaluations of image generation.Herein,these studies are comprehensively surveyed,specifically for two main parts:evaluation protocols and evaluation methods.First,10 image generation tasks are summarized with focus on their differences in evaluation aspects.Based on this,a novel protocol is proposed to cover human and automatic evaluation aspects required for various image generation tasks.Second,the review of automatic evaluation methods in the past five years is highlighted.To our knowledge,this paper presents the first comprehensive summary of human evaluation,encompassing evaluation methods,tools,details,and data analysis methods.Finally,the challenges and potential directions for image generation evaluation are discussed.We hope that this survey will help researchers develop a systematic understanding of image generation evaluation,stay updated with the latest advancements in the field,and encourage further research.展开更多
As a form of discrete representation learning,Vector Quantized Variational Autoencoders(VQ-VAE)have increasingly been applied to generative and multimodal tasks due to their ease of embedding and representative capaci...As a form of discrete representation learning,Vector Quantized Variational Autoencoders(VQ-VAE)have increasingly been applied to generative and multimodal tasks due to their ease of embedding and representative capacity.However,existing VQ-VAEs often perform quantization in the spatial domain,ignoring global structural information and potentially suffering from codebook collapse and information coupling issues.This paper proposes a frequency quantized variational autoencoder(FQ-VAE)to address these issues.The proposed method transforms image features into linear combinations in the frequency domain using a 2D fast Fourier transform(2D-FFT)and performs adaptive quantization on these frequency components to preserve image’s global relationships.The codebook is dynamically optimized to avoid collapse and information coupling issue by considering the usage frequency and dependency of code vectors.Furthermore,we introduce a post-processing module based on graph convolutional networks to further improve reconstruction quality.Experimental results on four public datasets demonstrate that the proposed method outperforms state-of-the-art approaches in terms of Structural Similarity Index(SSIM),Learned Perceptual Image Patch Similarity(LPIPS),and Reconstruction Fréchet Inception Distance(rFID).In the experiments on the CIFAR-10 dataset,compared to the baselinemethod VQ-VAE,the proposedmethod improves the abovemetrics by 4.9%,36.4%,and 52.8%,respectively.展开更多
In this study,we explore the potential of Multiway Transformers for text-to-image generation to achieve performance improvements through a concise and efficient decoupled model design and the inference efficiency prov...In this study,we explore the potential of Multiway Transformers for text-to-image generation to achieve performance improvements through a concise and efficient decoupled model design and the inference efficiency provided by bidirectional encoding.We propose a method for improving the image tokenizer using pretrained Vision Transformers.Next,we employ bidirectional Multiway Transformers to restore the masked visual tokens combined with the unmasked text tokens.On the MS-COCO benchmark,our Multiway Transformers outperform vanilla Transformers,achieving superior FID scores and confirming the efficacy of the modality-specific parameter computation design.Ablation studies reveal that the fusion of visual and text tokens in bidirectional encoding contributes to improved model performance.Additionally,our proposed tokenizer outperforms VQGAN in image reconstruction quality and enhances the text-to-image generation results.By incorporating the additional CC-3M dataset for intermediate finetuning on our model with 688M parameters,we achieve competitive results with a finetuned FID score of 4.98 on MS-COCO.展开更多
The rapid advancement of autonomous driving technology has reshaped the automotive industry, highlighting the need for diverse and high-quality image data. Existing image datasets for training and improving autonomous...The rapid advancement of autonomous driving technology has reshaped the automotive industry, highlighting the need for diverse and high-quality image data. Existing image datasets for training and improving autonomous driving technologies lack rare scenarios like extreme weather, limiting the effectiveness and reliability of autonomous driving technologies. One possible way of expanding the dataset coverage is to augment the existing dataset with artificial ones, which, however, still suffers from various challenges like limited controllability and unclear corner case boundaries. To address these challenges, we design and develop an interactive visual analysis system, HuGe, to achieve efficient and semi-automatic controllable image generation. HuGe incorporates weather transformation models and a novel semi-automatic knowledge-based controllable object insertion method which leverages the controllability of convex optimization and the variability of diffusion models. We formulate the design requirements, propose an effective framework, and design four coordinated views to support controllable image generation, multidimensional dataset analysis, and evaluation of the generated samples. Two case studies, a metric-based evaluation and interviews with domain experts demonstrate the practicality and effectiveness of HuGe in controllable image generation for autonomous driving.展开更多
Artificial intelligence generated content(AIGC)has emerged as an indispensable tool for producing large-scale content in various forms,such as images,thanks to the significant role that AI plays in imitation and produ...Artificial intelligence generated content(AIGC)has emerged as an indispensable tool for producing large-scale content in various forms,such as images,thanks to the significant role that AI plays in imitation and production.However,interpretability and controllability remain challenges.Existing AI methods often face challenges in producing images that are both flexible and controllable while considering causal relationships within the images.To address this issue,we have developed a novel method for causal controllable image generation(CCIG)that combines causal representation learning with bi-directional generative adversarial networks(GANs).This approach enables humans to control image attributes while considering the rationality and interpretability of the generated images and also allows for the generation of counterfactual images.The key of our approach,CCIG,lies in the use of a causal structure learning module to learn the causal relationships between image attributes and joint optimization with the encoder,generator,and joint discriminator in the image generation module.By doing so,we can learn causal representations in image’s latent space and use causal intervention operations to control image generation.We conduct extensive experiments on a real-world dataset,CelebA.The experimental results illustrate the effectiveness of CCIG.展开更多
As computer graphics technology supports pursuing a photorealistic style,replicated artworks with a photorealistic style overwhelmingly predominate in the computer-generated art circle.Along with the progression of ge...As computer graphics technology supports pursuing a photorealistic style,replicated artworks with a photorealistic style overwhelmingly predominate in the computer-generated art circle.Along with the progression of generative technology,this trend may make generative art a virtual world of photorealistic fake,in which the single criterion of expressive style imperils art into the context of a single boring stereotype.This article focuses on the issue of style diversity and its technical feasibility by artistic experiments of generating flower images in StyleGAN.The author insisted that photo both technology and artistic style should not be confined merely for realistic purposes.This proposition was validated in the GAN generation experiment by changing the training materials.展开更多
The difficulty of bumblebee data collecting and the laborious nature of bumblebee data annotation sometimes result in a lack of training data,which impairs the effectiveness of deep learning based counting methods.Giv...The difficulty of bumblebee data collecting and the laborious nature of bumblebee data annotation sometimes result in a lack of training data,which impairs the effectiveness of deep learning based counting methods.Given that it is challenging to produce the detailed background information in the generated bumblebee images using current data augmentation methods,in this paper,a joint multi-scale convolutional neural network and multi-channel attention based generative adversarial network(MMGAN)is proposed.MMGAN generates the bumblebee image in accordance with the corresponding density map marking the bumblebee positions.Specifically,the multi-scale convolutional neural network(CNN)module utilizes multiple convolution kernels to completely extract features of different scales from the input bumblebee image and density map.To generate various targets in the generated image,the multi-channel attention module builds numerous intermediate generation layers and attention maps.These targets are then stacked to produce a bumblebee image with a specific number of bumblebees.The proposed model obtains the greatest performance in bumblebee image generating tasks,and such generated bumblebee images considerably improve the efficiency of deep learning based counting methods in bumblebee counting applications.展开更多
Medical image generation has recently garnered significant interest among researchers.However,the primary generative models,such as Generative Adversarial Networks(GANs),often encounter challenges during training,incl...Medical image generation has recently garnered significant interest among researchers.However,the primary generative models,such as Generative Adversarial Networks(GANs),often encounter challenges during training,including mode collapse.To address these issues,we proposed the AECOT-GAN model(Autoencoder-based Conditional Optimal Transport Generative Adversarial Network)for the generation of medical images belonging to specific categories.The training process of our model comprises three fundamental components.The training process of our model encompasses three fundamental components.First,we employ an autoencoder model to obtain a low-dimensional manifold representation of real images.Second,we apply extended semi-discrete optimal transport to map Gaussian noise distribution to the latent space distribution and obtain corresponding labels effectively.This procedure leads to the generation of new latent codes with known labels.Finally,we integrate a GAN to train the decoder further to generate medical images.To evaluate the performance of the AE-COT-GAN model,we conducted experiments on two medical image datasets,namely DermaMNIST and BloodMNIST.The model’s performance was compared with state-of-the-art generative models.Results show that the AE-COT-GAN model had excellent performance in generating medical images.Moreover,it effectively addressed the common issues associated with traditional GANs.展开更多
We propose a systematic analysis of the neglected spectral bias in the frequency domain in this paper.Traditional generative adversarial networks(GANs)try to fulfill the details of images by designing specific network...We propose a systematic analysis of the neglected spectral bias in the frequency domain in this paper.Traditional generative adversarial networks(GANs)try to fulfill the details of images by designing specific network architectures or losses,focusing on generating visually qualitative images.The convolution theorem shows that image processing in the frequency domain is parallelizable and performs better and faster than that in the spatial domain.However,there is little work about discussing the bias of frequency features between the generated images and the real ones.In this paper,we first empirically demonstrate the general distribution bias across datasets and GANs with different sampling methods.Then,we explain the causes of the spectral bias through the deduction that reconsiders the sampling process of the GAN generator.Based on these studies,we provide a low-spectral-bias hybrid generative model to reduce the spectral bias and improve the quality of the generated images.展开更多
The rapid advancements in computer vision(CV)technology have transformed the traditional approaches to material microstructure analysis.This review outlines the history of CV and explores the applications of deep-lear...The rapid advancements in computer vision(CV)technology have transformed the traditional approaches to material microstructure analysis.This review outlines the history of CV and explores the applications of deep-learning(DL)-driven CV in four key areas of materials science:microstructure-based performance prediction,microstructure information generation,microstructure defect detection,and crystal structure-based property prediction.The CV has significantly reduced the cost of traditional experimental methods used in material performance prediction.Moreover,recent progress made in generating microstructure images and detecting microstructural defects using CV has led to increased efficiency and reliability in material performance assessments.The DL-driven CV models can accelerate the design of new materials with optimized performance by integrating predictions based on both crystal and microstructural data,thereby allowing for the discovery and innovation of next-generation materials.Finally,the review provides insights into the rapid interdisciplinary developments in the field of materials science and future prospects.展开更多
In the context of high compression rates applied to Joint Photographic Experts Group(JPEG)images through lossy compression techniques,image-blocking artifacts may manifest.This necessitates the restoration of the imag...In the context of high compression rates applied to Joint Photographic Experts Group(JPEG)images through lossy compression techniques,image-blocking artifacts may manifest.This necessitates the restoration of the image to its original quality.The challenge lies in regenerating significantly compressed images into a state in which these become identifiable.Therefore,this study focuses on the restoration of JPEG images subjected to substantial degradation caused by maximum lossy compression using Generative Adversarial Networks(GAN).The generator in this network is based on theU-Net architecture.It features a newhourglass structure that preserves the characteristics of the deep layers.In addition,the network incorporates two loss functions to generate natural and high-quality images:Low Frequency(LF)loss and High Frequency(HF)loss.HF loss uses a pretrained VGG-16 network and is configured using a specific layer that best represents features.This can enhance the performance in the high-frequency region.In contrast,LF loss is used to handle the low-frequency region.The two loss functions facilitate the generation of images by the generator,which can mislead the discriminator while accurately generating high-and low-frequency regions.Consequently,by removing the blocking effects frommaximum lossy compressed images,images inwhich identities could be recognized are generated.This study represents a significant improvement over previous research in terms of the image resolution performance.展开更多
Near infrared-visible(NIR-VIS)face recognition is to match an NIR face image to a VIS image.The main challenges of NIR-VIS face recognition are the gap caused by cross-modality and the lack of sufficient paired NIR-VI...Near infrared-visible(NIR-VIS)face recognition is to match an NIR face image to a VIS image.The main challenges of NIR-VIS face recognition are the gap caused by cross-modality and the lack of sufficient paired NIR-VIS face images to train models.This paper focuses on the generation of paired NIR-VIS face images and proposes a dual variational generator based on ResNeSt(RS-DVG).RS-DVG can generate a large number of paired NIR-VIS face images from noise,and these generated NIR-VIS face images can be used as the training set together with the real NIR-VIS face images.In addition,a triplet loss function is introduced and a novel triplet selection method is proposed specifically for the training of the current face recognition model,which maximizes the inter-class distance and minimizes the intra-class distance in the input face images.The method proposed in this paper was evaluated on the datasets CASIA NIR-VIS 2.0 and BUAA-VisNir,and relatively good results were obtained.展开更多
Measurement of bloodflow velocity is key to understanding physiology and pathology in vivo.While most measurements are performed at the middle of the blood vessel,little research has been done on characterizing the in...Measurement of bloodflow velocity is key to understanding physiology and pathology in vivo.While most measurements are performed at the middle of the blood vessel,little research has been done on characterizing the instantaneous bloodflow velocity distribution.This is mainly due to the lack of measurement technology with high spatial and temporal resolution.Here,we tackle this problem with our recently developed dual-wavelength line-scan third-harmonic generation(THG)imaging technology.Simultaneous acquisition of dual-wavelength THG line-scanning signals enables measurement of bloodflow velocities at two radially symmetric positions in both venules and arterioles in mouse brain in vivo.Our results clearly show that the instantaneous bloodflow velocity is not symmetric under general conditions.展开更多
With the development of computer graphics,realistic computer graphics(CG)have become more and more common in our field of vision.This rendered image is invisible to the naked eye.How to effectively identify CG and nat...With the development of computer graphics,realistic computer graphics(CG)have become more and more common in our field of vision.This rendered image is invisible to the naked eye.How to effectively identify CG and natural images(NI)has been become a new issue in the field of digital forensics.In recent years,a series of deep learning network frameworks have shown great advantages in the field of images,which provides a good choice for us to solve this problem.This paper aims to track the latest developments and applications of deep learning in the field of CG and NI forensics in a timely manner.Firstly,it introduces the background of deep learning and the knowledge of convolutional neural networks.The purpose is to understand the basic model structure of deep learning applications in the image field,and then outlines the mainstream framework;secondly,it briefly introduces the application of deep learning in CG and NI forensics,and finally points out the problems of deep learning in this field and the prospects for the future.展开更多
For traffic object detection in foggy environment based on convolutional neural network(CNN),data sets in fog-free environment are generally used to train the network directly.As a result,the network cannot learn the ...For traffic object detection in foggy environment based on convolutional neural network(CNN),data sets in fog-free environment are generally used to train the network directly.As a result,the network cannot learn the object characteristics in the foggy environment in the training set,and the detection effect is not good.To improve the traffic object detection in foggy environment,we propose a method of generating foggy images on fog-free images from the perspective of data set construction.First,taking the KITTI objection detection data set as an original fog-free image,we generate the depth image of the original image by using improved Monodepth unsupervised depth estimation method.Then,a geometric prior depth template is constructed to fuse the image entropy taken as weight with the depth image.After that,a foggy image is acquired from the depth image based on the atmospheric scattering model.Finally,we take two typical object-detection frameworks,that is,the two-stage object-detection Fster region-based convolutional neural network(Faster-RCNN)and the one-stage object-detection network YOLOv4,to train the original data set,the foggy data set and the mixed data set,respectively.According to the test results on RESIDE-RTTS data set in the outdoor natural foggy environment,the model under the training on the mixed data set shows the best effect.The mean average precision(mAP)values are increased by 5.6%and by 5.0%under the YOLOv4 model and the Faster-RCNN network,respectively.It is proved that the proposed method can effectively improve object identification ability foggy environment.展开更多
基金supported in part by the Science and Technology Development Fund,Macao S.A.R(FDCT)0028/2023/RIA1,in part by Leading Talents in Gusu Innovation and Entrepreneurship Grant ZXL2023170in part by the TCL Science and Technology Innovation Fund under Grant D5140240118in part by the Guangdong Basic and Applied Basic Research Foundation under Grant 2021A1515110079.
文摘Generative adversarial networks(GANs)with gaming abilities have been widely applied in image generation.However,gamistic generators and discriminators may reduce the robustness of the obtained GANs in image generation under varying scenes.Enhancing the relation of hierarchical information in a generation network and enlarging differences of different network architectures can facilitate more structural information to improve the generation effect for image generation.In this paper,we propose an enhanced GAN via improving a generator for image generation(EIGGAN).EIGGAN applies a spatial attention to a generator to extract salient information to enhance the truthfulness of the generated images.Taking into relation the context account,parallel residual operations are fused into a generation network to extract more structural information from the different layers.Finally,a mixed loss function in a GAN is exploited to make a tradeoff between speed and accuracy to generate more realistic images.Experimental results show that the proposed method is superior to popular methods,i.e.,Wasserstein GAN with gradient penalty(WGAN-GP)in terms of many indexes,i.e.,Frechet Inception Distance,Learned Perceptual Image Patch Similarity,Multi-Scale Structural Similarity Index Measure,Kernel Inception Distance,Number of Statistically-Different Bins,Inception Score and some visual images for image generation.
基金funded by the Science Committee of the Ministry of Science and Higher Education of the Republic of Kazakhstan(Grant No.AP13068032-Development of Methods and Algorithms for Machine Learning for Predicting Pathologies of the Cardiovascular System Based on Echocardiography and Electrocardiography).
文摘The applications of machine learning(ML)in the medical domain are often hindered by the limited availability of high-quality data.To address this challenge,we explore the synthetic generation of echocardiography images(echoCG)using state-of-the-art generative models.We conduct a comprehensive evaluation of three prominent methods:Cycle-consistent generative adversarial network(CycleGAN),Contrastive Unpaired Translation(CUT),and Stable Diffusion 1.5 with Low-Rank Adaptation(LoRA).Our research presents the data generation methodol-ogy,image samples,and evaluation strategy,followed by an extensive user study involving licensed cardiologists and surgeons who assess the perceived quality and medical soundness of the generated images.Our findings indicate that Stable Diffusion outperforms both CycleGAN and CUT in generating images that are nearly indistinguishable from real echoCG images,making it a promising tool for augmenting medical datasets.However,we also identify limitations in the synthetic images generated by CycleGAN and CUT,which are easily distinguishable as non-realistic by medical professionals.This study highlights the potential of diffusion models in medical imaging and their applicability in addressing data scarcity,while also outlining the areas for future improvement.
基金supported by the National Natural Science Foundation of China[grant numbers 41971422 and 42090010]the Fundamental Research Funds for the Central Universities,China[grant number 2042022dx0001].
文摘Remote sensing images often need to be merged into a larger mosaic image to support analysis on large areas in many applications.However,the performance of the mosaic imagery may be severely restricted if there are many areas with cloud coverage or if these images used for merging have a long-time span.Therefore,this paper proposes a method of image selection for full coverage image(i.e.a mosaic image with no cloud-contaminated pixels)generation.Specifically,a novel High-Frequency-Aware(HFA)-Net based on Swin-Transformer for region quality grading is presented to provide a data basis for image selection.Spatiotemporal constraints are presented to optimize the image selection.In the temporal dimension,the shortest-time-span constraint shortens the time span of the selected images,obviously improving the timeliness of the image selection results(i.e.with a shorter time span).In the spatial dimension,a spatial continuity constraint is proposed to select data with better quality and larger area,thus improving the radiometric continuity of the results.Experiments on the GF-1 images indicate that the proposed method reduces the averages by 76.1%and 38.7%in terms of the shortest time span compared to the Improved Coverage-oriented Retrieval algorithm(MICR)and Retrieval Method based on Grid Compensation(RMGC)methods,respectively.Moreover,the proposed method also reduces the residual cloud amount by an average of 91.2%,89.8%,and 83.4%when compared to the MICR,RMGC,and Pixel-based Time-series Synthesis Method(PTSM)methods,respectively.
基金funded by theDeanship of Research andGraduate Studies at King Khalid University through Large Research Project under grant number RGP2/417/46.
文摘A phase-aware cross-modal framework is presented that synthesizes UWF_FA from non-invasive UWF_RI for diabetic retinopathy(DR)stratification.A curated cohort of 1198 patients(2915 UWF_RI and 17,854 UWF_FA images)with strict registration quality supports training across three angiographic phases(initial,mid,final).The generator is based on a modified pix2pixHD with an added Gradient Variance Loss to better preserve microvasculature,and is evaluated using MAE,PSNR,SSIM,and MS-SSIM on held-out pairs.Quantitatively,the mid phase achieves the lowestMAE(98.76±42.67),while SSIM remains high across phases.Expert reviewshows substantial agreement(Cohen's κ=0.78–0.82)and Turing-stylemisclassification of 50%–70%of synthetic images as real,indicating strong perceptual realism.For downstream DR stratification,fusing multi-phase synthetic UWF_FA with UWF_RI in a Swin Transformer classifier yields significant gains over a UWF_RI-only baseline,with the full-phase setting(Set D)reaching AUC=0.910 and accuracy=0.829.These results support synthetic UWF_FA as a scalable,non-invasive complement to dye-based angiography that enhances screening accuracy while avoiding injection-related risks.
基金supported by the National Key Research and Development Program of China,China(No.2021YFC3320103)the National Natural Science Foundation of China,China(NSFC)(Nos.62372452 and 62272460)+1 种基金the Open Research Project of the State Key Laboratory of Media Convergence and Communication,Communication University of China,China(No.SKLM CC2022KF002)Youth Innovation Promotion Association CAS,China.
文摘Training generative adversarial networks is data-demanding,which limits the development of these models on target domains with inadequate training data.Recently,researchers have leveraged generative models pretrained on sufficient data and fine-tuned them using small training samples,thus reducing data requirements.However,due to the lack of explicit focus on target styles and disproportionately concentrating on generative consistency,these methods do not perform well in diversity preservation which represents the adaptation ability for few-shot generative models.To mitigate the diversity degradation,we propose a framework with two key strategies:1)To obtain more diverse styles from limited training data effectively,we propose a cross-modal module that explicitly obtains the target styles with a style prototype space and text-guided style instructions.2)To inherit the generation capability from the pretrained model,we aim to constrain the similarity between the generated and source images with a structural discrepancy alignment module by maintaining the structure correlation in multiscale areas.We demonstrate the effectiveness of our method,which outperforms state-of-the-art methods in mitigating diversity degradation through extensive experiments and analyses.
基金supported by the National Natural Science Foundation of China(No.62006208)the Provincial Key Research and Development Plan of Zhejiang Province(No.2024C01250(SD2))。
文摘Image generation models have made remarkable progress,and image evaluation is crucial for explaining and driving the development of these models.Previous studies have extensively explored human and automatic evaluations of image generation.Herein,these studies are comprehensively surveyed,specifically for two main parts:evaluation protocols and evaluation methods.First,10 image generation tasks are summarized with focus on their differences in evaluation aspects.Based on this,a novel protocol is proposed to cover human and automatic evaluation aspects required for various image generation tasks.Second,the review of automatic evaluation methods in the past five years is highlighted.To our knowledge,this paper presents the first comprehensive summary of human evaluation,encompassing evaluation methods,tools,details,and data analysis methods.Finally,the challenges and potential directions for image generation evaluation are discussed.We hope that this survey will help researchers develop a systematic understanding of image generation evaluation,stay updated with the latest advancements in the field,and encourage further research.
基金supported by the Interdisciplinary project of Dalian University DLUXK-2023-ZD-001.
文摘As a form of discrete representation learning,Vector Quantized Variational Autoencoders(VQ-VAE)have increasingly been applied to generative and multimodal tasks due to their ease of embedding and representative capacity.However,existing VQ-VAEs often perform quantization in the spatial domain,ignoring global structural information and potentially suffering from codebook collapse and information coupling issues.This paper proposes a frequency quantized variational autoencoder(FQ-VAE)to address these issues.The proposed method transforms image features into linear combinations in the frequency domain using a 2D fast Fourier transform(2D-FFT)and performs adaptive quantization on these frequency components to preserve image’s global relationships.The codebook is dynamically optimized to avoid collapse and information coupling issue by considering the usage frequency and dependency of code vectors.Furthermore,we introduce a post-processing module based on graph convolutional networks to further improve reconstruction quality.Experimental results on four public datasets demonstrate that the proposed method outperforms state-of-the-art approaches in terms of Structural Similarity Index(SSIM),Learned Perceptual Image Patch Similarity(LPIPS),and Reconstruction Fréchet Inception Distance(rFID).In the experiments on the CIFAR-10 dataset,compared to the baselinemethod VQ-VAE,the proposedmethod improves the abovemetrics by 4.9%,36.4%,and 52.8%,respectively.
文摘In this study,we explore the potential of Multiway Transformers for text-to-image generation to achieve performance improvements through a concise and efficient decoupled model design and the inference efficiency provided by bidirectional encoding.We propose a method for improving the image tokenizer using pretrained Vision Transformers.Next,we employ bidirectional Multiway Transformers to restore the masked visual tokens combined with the unmasked text tokens.On the MS-COCO benchmark,our Multiway Transformers outperform vanilla Transformers,achieving superior FID scores and confirming the efficacy of the modality-specific parameter computation design.Ablation studies reveal that the fusion of visual and text tokens in bidirectional encoding contributes to improved model performance.Additionally,our proposed tokenizer outperforms VQGAN in image reconstruction quality and enhances the text-to-image generation results.By incorporating the additional CC-3M dataset for intermediate finetuning on our model with 688M parameters,we achieve competitive results with a finetuned FID score of 4.98 on MS-COCO.
基金supported by grants from the Guangzhou Health Science and Technology Major Program(Project Number:2025A031002)the National Natural Science Foundation of China(No.62302531)the Science and Technology Planning Project of Guangdong Province,China(No.2023B1212060029).
文摘The rapid advancement of autonomous driving technology has reshaped the automotive industry, highlighting the need for diverse and high-quality image data. Existing image datasets for training and improving autonomous driving technologies lack rare scenarios like extreme weather, limiting the effectiveness and reliability of autonomous driving technologies. One possible way of expanding the dataset coverage is to augment the existing dataset with artificial ones, which, however, still suffers from various challenges like limited controllability and unclear corner case boundaries. To address these challenges, we design and develop an interactive visual analysis system, HuGe, to achieve efficient and semi-automatic controllable image generation. HuGe incorporates weather transformation models and a novel semi-automatic knowledge-based controllable object insertion method which leverages the controllability of convex optimization and the variability of diffusion models. We formulate the design requirements, propose an effective framework, and design four coordinated views to support controllable image generation, multidimensional dataset analysis, and evaluation of the generated samples. Two case studies, a metric-based evaluation and interviews with domain experts demonstrate the practicality and effectiveness of HuGe in controllable image generation for autonomous driving.
基金Project supported by the National Major Science and Technology Projects of China(No.2022YFB3303302)the National Natural Science Foundation of China(Nos.61977012 and 62207007)the Central Universities Project in China at Chongqing University(Nos.2021CDJYGRH011 and 2020CDJSK06PT14)。
文摘Artificial intelligence generated content(AIGC)has emerged as an indispensable tool for producing large-scale content in various forms,such as images,thanks to the significant role that AI plays in imitation and production.However,interpretability and controllability remain challenges.Existing AI methods often face challenges in producing images that are both flexible and controllable while considering causal relationships within the images.To address this issue,we have developed a novel method for causal controllable image generation(CCIG)that combines causal representation learning with bi-directional generative adversarial networks(GANs).This approach enables humans to control image attributes while considering the rationality and interpretability of the generated images and also allows for the generation of counterfactual images.The key of our approach,CCIG,lies in the use of a causal structure learning module to learn the causal relationships between image attributes and joint optimization with the encoder,generator,and joint discriminator in the image generation module.By doing so,we can learn causal representations in image’s latent space and use causal intervention operations to control image generation.We conduct extensive experiments on a real-world dataset,CelebA.The experimental results illustrate the effectiveness of CCIG.
文摘As computer graphics technology supports pursuing a photorealistic style,replicated artworks with a photorealistic style overwhelmingly predominate in the computer-generated art circle.Along with the progression of generative technology,this trend may make generative art a virtual world of photorealistic fake,in which the single criterion of expressive style imperils art into the context of a single boring stereotype.This article focuses on the issue of style diversity and its technical feasibility by artistic experiments of generating flower images in StyleGAN.The author insisted that photo both technology and artistic style should not be confined merely for realistic purposes.This proposition was validated in the GAN generation experiment by changing the training materials.
文摘The difficulty of bumblebee data collecting and the laborious nature of bumblebee data annotation sometimes result in a lack of training data,which impairs the effectiveness of deep learning based counting methods.Given that it is challenging to produce the detailed background information in the generated bumblebee images using current data augmentation methods,in this paper,a joint multi-scale convolutional neural network and multi-channel attention based generative adversarial network(MMGAN)is proposed.MMGAN generates the bumblebee image in accordance with the corresponding density map marking the bumblebee positions.Specifically,the multi-scale convolutional neural network(CNN)module utilizes multiple convolution kernels to completely extract features of different scales from the input bumblebee image and density map.To generate various targets in the generated image,the multi-channel attention module builds numerous intermediate generation layers and attention maps.These targets are then stacked to produce a bumblebee image with a specific number of bumblebees.The proposed model obtains the greatest performance in bumblebee image generating tasks,and such generated bumblebee images considerably improve the efficiency of deep learning based counting methods in bumblebee counting applications.
基金the National Key R&D Program of China under Grant No.2022ZD0117000the National Institutes of Health,United States under award number 3R01LM012434-05S1 and 1R21EB029733-01A1the National Science Foundation,United States under Grant No.FAIN-2115095 and Grant No.CMMI-1762287.
文摘Medical image generation has recently garnered significant interest among researchers.However,the primary generative models,such as Generative Adversarial Networks(GANs),often encounter challenges during training,including mode collapse.To address these issues,we proposed the AECOT-GAN model(Autoencoder-based Conditional Optimal Transport Generative Adversarial Network)for the generation of medical images belonging to specific categories.The training process of our model comprises three fundamental components.The training process of our model encompasses three fundamental components.First,we employ an autoencoder model to obtain a low-dimensional manifold representation of real images.Second,we apply extended semi-discrete optimal transport to map Gaussian noise distribution to the latent space distribution and obtain corresponding labels effectively.This procedure leads to the generation of new latent codes with known labels.Finally,we integrate a GAN to train the decoder further to generate medical images.To evaluate the performance of the AE-COT-GAN model,we conducted experiments on two medical image datasets,namely DermaMNIST and BloodMNIST.The model’s performance was compared with state-of-the-art generative models.Results show that the AE-COT-GAN model had excellent performance in generating medical images.Moreover,it effectively addressed the common issues associated with traditional GANs.
基金supported in part by the National Key Research and Development Program of China under Grant no.2020YFB1806403.
文摘We propose a systematic analysis of the neglected spectral bias in the frequency domain in this paper.Traditional generative adversarial networks(GANs)try to fulfill the details of images by designing specific network architectures or losses,focusing on generating visually qualitative images.The convolution theorem shows that image processing in the frequency domain is parallelizable and performs better and faster than that in the spatial domain.However,there is little work about discussing the bias of frequency features between the generated images and the real ones.In this paper,we first empirically demonstrate the general distribution bias across datasets and GANs with different sampling methods.Then,we explain the causes of the spectral bias through the deduction that reconsiders the sampling process of the GAN generator.Based on these studies,we provide a low-spectral-bias hybrid generative model to reduce the spectral bias and improve the quality of the generated images.
基金financially supported by the National Science Fund for Distinguished Young Scholars,China(No.52025041)the National Natural Science Foundation of China(Nos.52450003,U2341267,and 52174294)+1 种基金the National Postdoctoral Program for Innovative Talents,China(No.BX20240437)the Fundamental Research Funds for the Central Universities,China(Nos.FRF-IDRY-23-037 and FRF-TP-20-02C2)。
文摘The rapid advancements in computer vision(CV)technology have transformed the traditional approaches to material microstructure analysis.This review outlines the history of CV and explores the applications of deep-learning(DL)-driven CV in four key areas of materials science:microstructure-based performance prediction,microstructure information generation,microstructure defect detection,and crystal structure-based property prediction.The CV has significantly reduced the cost of traditional experimental methods used in material performance prediction.Moreover,recent progress made in generating microstructure images and detecting microstructural defects using CV has led to increased efficiency and reliability in material performance assessments.The DL-driven CV models can accelerate the design of new materials with optimized performance by integrating predictions based on both crystal and microstructural data,thereby allowing for the discovery and innovation of next-generation materials.Finally,the review provides insights into the rapid interdisciplinary developments in the field of materials science and future prospects.
基金supported by the Technology Development Program(S3344882)funded by the Ministry of SMEs and Startups(MSS,Korea).
文摘In the context of high compression rates applied to Joint Photographic Experts Group(JPEG)images through lossy compression techniques,image-blocking artifacts may manifest.This necessitates the restoration of the image to its original quality.The challenge lies in regenerating significantly compressed images into a state in which these become identifiable.Therefore,this study focuses on the restoration of JPEG images subjected to substantial degradation caused by maximum lossy compression using Generative Adversarial Networks(GAN).The generator in this network is based on theU-Net architecture.It features a newhourglass structure that preserves the characteristics of the deep layers.In addition,the network incorporates two loss functions to generate natural and high-quality images:Low Frequency(LF)loss and High Frequency(HF)loss.HF loss uses a pretrained VGG-16 network and is configured using a specific layer that best represents features.This can enhance the performance in the high-frequency region.In contrast,LF loss is used to handle the low-frequency region.The two loss functions facilitate the generation of images by the generator,which can mislead the discriminator while accurately generating high-and low-frequency regions.Consequently,by removing the blocking effects frommaximum lossy compressed images,images inwhich identities could be recognized are generated.This study represents a significant improvement over previous research in terms of the image resolution performance.
基金National Natural Science Foundation of China(No.62006039)National Key Research and Development Program of China(No.2019YFE0190500)。
文摘Near infrared-visible(NIR-VIS)face recognition is to match an NIR face image to a VIS image.The main challenges of NIR-VIS face recognition are the gap caused by cross-modality and the lack of sufficient paired NIR-VIS face images to train models.This paper focuses on the generation of paired NIR-VIS face images and proposes a dual variational generator based on ResNeSt(RS-DVG).RS-DVG can generate a large number of paired NIR-VIS face images from noise,and these generated NIR-VIS face images can be used as the training set together with the real NIR-VIS face images.In addition,a triplet loss function is introduced and a novel triplet selection method is proposed specifically for the training of the current face recognition model,which maximizes the inter-class distance and minimizes the intra-class distance in the input face images.The method proposed in this paper was evaluated on the datasets CASIA NIR-VIS 2.0 and BUAA-VisNir,and relatively good results were obtained.
基金funded by the National Natural Science Foundation of China(Grant/Award Numbers 62075135 and 61975126)the Science and Technology Innovation Commission of Shenzhen(Grant/Award Numbers JCYJ20190808174819083 and JCYJ20190808175201640)Shenzhen Science and Technology Planning Project(ZDSYS 20210623092006020).
文摘Measurement of bloodflow velocity is key to understanding physiology and pathology in vivo.While most measurements are performed at the middle of the blood vessel,little research has been done on characterizing the instantaneous bloodflow velocity distribution.This is mainly due to the lack of measurement technology with high spatial and temporal resolution.Here,we tackle this problem with our recently developed dual-wavelength line-scan third-harmonic generation(THG)imaging technology.Simultaneous acquisition of dual-wavelength THG line-scanning signals enables measurement of bloodflow velocities at two radially symmetric positions in both venules and arterioles in mouse brain in vivo.Our results clearly show that the instantaneous bloodflow velocity is not symmetric under general conditions.
基金supported by National Natural Science Foundation of China(62072250).
文摘With the development of computer graphics,realistic computer graphics(CG)have become more and more common in our field of vision.This rendered image is invisible to the naked eye.How to effectively identify CG and natural images(NI)has been become a new issue in the field of digital forensics.In recent years,a series of deep learning network frameworks have shown great advantages in the field of images,which provides a good choice for us to solve this problem.This paper aims to track the latest developments and applications of deep learning in the field of CG and NI forensics in a timely manner.Firstly,it introduces the background of deep learning and the knowledge of convolutional neural networks.The purpose is to understand the basic model structure of deep learning applications in the image field,and then outlines the mainstream framework;secondly,it briefly introduces the application of deep learning in CG and NI forensics,and finally points out the problems of deep learning in this field and the prospects for the future.
文摘For traffic object detection in foggy environment based on convolutional neural network(CNN),data sets in fog-free environment are generally used to train the network directly.As a result,the network cannot learn the object characteristics in the foggy environment in the training set,and the detection effect is not good.To improve the traffic object detection in foggy environment,we propose a method of generating foggy images on fog-free images from the perspective of data set construction.First,taking the KITTI objection detection data set as an original fog-free image,we generate the depth image of the original image by using improved Monodepth unsupervised depth estimation method.Then,a geometric prior depth template is constructed to fuse the image entropy taken as weight with the depth image.After that,a foggy image is acquired from the depth image based on the atmospheric scattering model.Finally,we take two typical object-detection frameworks,that is,the two-stage object-detection Fster region-based convolutional neural network(Faster-RCNN)and the one-stage object-detection network YOLOv4,to train the original data set,the foggy data set and the mixed data set,respectively.According to the test results on RESIDE-RTTS data set in the outdoor natural foggy environment,the model under the training on the mixed data set shows the best effect.The mean average precision(mAP)values are increased by 5.6%and by 5.0%under the YOLOv4 model and the Faster-RCNN network,respectively.It is proved that the proposed method can effectively improve object identification ability foggy environment.