期刊文献+
共找到37篇文章
< 1 2 >
每页显示 20 50 100
An Enhanced GAN for Image Generation 被引量:1
1
作者 Chunwei Tian Haoyang Gao +1 位作者 Pengwei Wang Bob Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第7期105-118,共14页
Generative adversarial networks(GANs)with gaming abilities have been widely applied in image generation.However,gamistic generators and discriminators may reduce the robustness of the obtained GANs in image generation... Generative adversarial networks(GANs)with gaming abilities have been widely applied in image generation.However,gamistic generators and discriminators may reduce the robustness of the obtained GANs in image generation under varying scenes.Enhancing the relation of hierarchical information in a generation network and enlarging differences of different network architectures can facilitate more structural information to improve the generation effect for image generation.In this paper,we propose an enhanced GAN via improving a generator for image generation(EIGGAN).EIGGAN applies a spatial attention to a generator to extract salient information to enhance the truthfulness of the generated images.Taking into relation the context account,parallel residual operations are fused into a generation network to extract more structural information from the different layers.Finally,a mixed loss function in a GAN is exploited to make a tradeoff between speed and accuracy to generate more realistic images.Experimental results show that the proposed method is superior to popular methods,i.e.,Wasserstein GAN with gradient penalty(WGAN-GP)in terms of many indexes,i.e.,Frechet Inception Distance,Learned Perceptual Image Patch Similarity,Multi-Scale Structural Similarity Index Measure,Kernel Inception Distance,Number of Statistically-Different Bins,Inception Score and some visual images for image generation. 展开更多
关键词 Generative adversarial networks spatial attention mixed loss image generation
在线阅读 下载PDF
Evaluation of Modern Generative Networks for EchoCG Image Generation
2
作者 Sabina Rakhmetulayeva Zhandos Zhanabekov Aigerim Bolshibayeva 《Computers, Materials & Continua》 SCIE EI 2024年第12期4503-4523,共21页
The applications of machine learning(ML)in the medical domain are often hindered by the limited availability of high-quality data.To address this challenge,we explore the synthetic generation of echocardiography image... The applications of machine learning(ML)in the medical domain are often hindered by the limited availability of high-quality data.To address this challenge,we explore the synthetic generation of echocardiography images(echoCG)using state-of-the-art generative models.We conduct a comprehensive evaluation of three prominent methods:Cycle-consistent generative adversarial network(CycleGAN),Contrastive Unpaired Translation(CUT),and Stable Diffusion 1.5 with Low-Rank Adaptation(LoRA).Our research presents the data generation methodol-ogy,image samples,and evaluation strategy,followed by an extensive user study involving licensed cardiologists and surgeons who assess the perceived quality and medical soundness of the generated images.Our findings indicate that Stable Diffusion outperforms both CycleGAN and CUT in generating images that are nearly indistinguishable from real echoCG images,making it a promising tool for augmenting medical datasets.However,we also identify limitations in the synthetic images generated by CycleGAN and CUT,which are easily distinguishable as non-realistic by medical professionals.This study highlights the potential of diffusion models in medical imaging and their applicability in addressing data scarcity,while also outlining the areas for future improvement. 展开更多
关键词 Synthetic image generation synthetic echogcardiography generative adversarial networks CycleGAN latent diffusion models stable diffusion
在线阅读 下载PDF
Spatiotemporal imagery selection for full coverage image generation over a large area with HFA-Net based quality grading
3
作者 Jun Pan Liangyu Chen +3 位作者 Qidi Shu Qiang Zhao Jin Yang Shuying Jin 《Geo-Spatial Information Science》 CSCD 2024年第5期1524-1541,共18页
Remote sensing images often need to be merged into a larger mosaic image to support analysis on large areas in many applications.However,the performance of the mosaic imagery may be severely restricted if there are ma... Remote sensing images often need to be merged into a larger mosaic image to support analysis on large areas in many applications.However,the performance of the mosaic imagery may be severely restricted if there are many areas with cloud coverage or if these images used for merging have a long-time span.Therefore,this paper proposes a method of image selection for full coverage image(i.e.a mosaic image with no cloud-contaminated pixels)generation.Specifically,a novel High-Frequency-Aware(HFA)-Net based on Swin-Transformer for region quality grading is presented to provide a data basis for image selection.Spatiotemporal constraints are presented to optimize the image selection.In the temporal dimension,the shortest-time-span constraint shortens the time span of the selected images,obviously improving the timeliness of the image selection results(i.e.with a shorter time span).In the spatial dimension,a spatial continuity constraint is proposed to select data with better quality and larger area,thus improving the radiometric continuity of the results.Experiments on the GF-1 images indicate that the proposed method reduces the averages by 76.1%and 38.7%in terms of the shortest time span compared to the Improved Coverage-oriented Retrieval algorithm(MICR)and Retrieval Method based on Grid Compensation(RMGC)methods,respectively.Moreover,the proposed method also reduces the residual cloud amount by an average of 91.2%,89.8%,and 83.4%when compared to the MICR,RMGC,and Pixel-based Time-series Synthesis Method(PTSM)methods,respectively. 展开更多
关键词 image selection spatiotemporal constraints full coverage image generation High-Frequency-Aware(HFA)-Net regional quality grading
原文传递
CMSL:Cross-modal Style Learning for Few-shot Image Generation
4
作者 Yue Jiang Yueming Lyu +2 位作者 Bo Peng Wei Wang Jing Dong 《Machine Intelligence Research》 2025年第4期752-768,共17页
Training generative adversarial networks is data-demanding,which limits the development of these models on target domains with inadequate training data.Recently,researchers have leveraged generative models pretrained ... Training generative adversarial networks is data-demanding,which limits the development of these models on target domains with inadequate training data.Recently,researchers have leveraged generative models pretrained on sufficient data and fine-tuned them using small training samples,thus reducing data requirements.However,due to the lack of explicit focus on target styles and disproportionately concentrating on generative consistency,these methods do not perform well in diversity preservation which represents the adaptation ability for few-shot generative models.To mitigate the diversity degradation,we propose a framework with two key strategies:1)To obtain more diverse styles from limited training data effectively,we propose a cross-modal module that explicitly obtains the target styles with a style prototype space and text-guided style instructions.2)To inherit the generation capability from the pretrained model,we aim to constrain the similarity between the generated and source images with a structural discrepancy alignment module by maintaining the structure correlation in multiscale areas.We demonstrate the effectiveness of our method,which outperforms state-of-the-art methods in mitigating diversity degradation through extensive experiments and analyses. 展开更多
关键词 Few-shot image generation cross-modal learning prototype learning contrastive learning computer vision
原文传递
Image generation evaluation:a comprehensive survey of human and automatic evaluations
5
作者 Qi LIU Shuanglin YANG +4 位作者 Zejian LI Lefan HOU Chenye MENG Ying ZHANG Lingyun SUN 《Frontiers of Information Technology & Electronic Engineering》 2025年第7期1027-1065,共39页
Image generation models have made remarkable progress,and image evaluation is crucial for explaining and driving the development of these models.Previous studies have extensively explored human and automatic evaluatio... Image generation models have made remarkable progress,and image evaluation is crucial for explaining and driving the development of these models.Previous studies have extensively explored human and automatic evaluations of image generation.Herein,these studies are comprehensively surveyed,specifically for two main parts:evaluation protocols and evaluation methods.First,10 image generation tasks are summarized with focus on their differences in evaluation aspects.Based on this,a novel protocol is proposed to cover human and automatic evaluation aspects required for various image generation tasks.Second,the review of automatic evaluation methods in the past five years is highlighted.To our knowledge,this paper presents the first comprehensive summary of human evaluation,encompassing evaluation methods,tools,details,and data analysis methods.Finally,the challenges and potential directions for image generation evaluation are discussed.We hope that this survey will help researchers develop a systematic understanding of image generation evaluation,stay updated with the latest advancements in the field,and encourage further research. 展开更多
关键词 image generation evaluation Human evaluation Automatic evaluation Evaluation protocols Evaluation aspects
原文传递
Frequency-Quantized Variational Autoencoder Based on 2D-FFT for Enhanced Image Reconstruction and Generation
6
作者 Jianxin Feng Xiaoyao Liu 《Computers, Materials & Continua》 2025年第5期2087-2107,共21页
As a form of discrete representation learning,Vector Quantized Variational Autoencoders(VQ-VAE)have increasingly been applied to generative and multimodal tasks due to their ease of embedding and representative capaci... As a form of discrete representation learning,Vector Quantized Variational Autoencoders(VQ-VAE)have increasingly been applied to generative and multimodal tasks due to their ease of embedding and representative capacity.However,existing VQ-VAEs often perform quantization in the spatial domain,ignoring global structural information and potentially suffering from codebook collapse and information coupling issues.This paper proposes a frequency quantized variational autoencoder(FQ-VAE)to address these issues.The proposed method transforms image features into linear combinations in the frequency domain using a 2D fast Fourier transform(2D-FFT)and performs adaptive quantization on these frequency components to preserve image’s global relationships.The codebook is dynamically optimized to avoid collapse and information coupling issue by considering the usage frequency and dependency of code vectors.Furthermore,we introduce a post-processing module based on graph convolutional networks to further improve reconstruction quality.Experimental results on four public datasets demonstrate that the proposed method outperforms state-of-the-art approaches in terms of Structural Similarity Index(SSIM),Learned Perceptual Image Patch Similarity(LPIPS),and Reconstruction Fréchet Inception Distance(rFID).In the experiments on the CIFAR-10 dataset,compared to the baselinemethod VQ-VAE,the proposedmethod improves the abovemetrics by 4.9%,36.4%,and 52.8%,respectively. 展开更多
关键词 VAE 2D-FFT image reconstruction image generation
在线阅读 下载PDF
Controllable image generation based on causal representation learning 被引量:2
7
作者 Shanshan HUANG Yuanhao WANG +3 位作者 Zhili GONG Jun LIAO Shu WANG Li LIU 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2024年第1期135-148,共14页
Artificial intelligence generated content(AIGC)has emerged as an indispensable tool for producing large-scale content in various forms,such as images,thanks to the significant role that AI plays in imitation and produ... Artificial intelligence generated content(AIGC)has emerged as an indispensable tool for producing large-scale content in various forms,such as images,thanks to the significant role that AI plays in imitation and production.However,interpretability and controllability remain challenges.Existing AI methods often face challenges in producing images that are both flexible and controllable while considering causal relationships within the images.To address this issue,we have developed a novel method for causal controllable image generation(CCIG)that combines causal representation learning with bi-directional generative adversarial networks(GANs).This approach enables humans to control image attributes while considering the rationality and interpretability of the generated images and also allows for the generation of counterfactual images.The key of our approach,CCIG,lies in the use of a causal structure learning module to learn the causal relationships between image attributes and joint optimization with the encoder,generator,and joint discriminator in the image generation module.By doing so,we can learn causal representations in image’s latent space and use causal intervention operations to control image generation.We conduct extensive experiments on a real-world dataset,CelebA.The experimental results illustrate the effectiveness of CCIG. 展开更多
关键词 image generation Controllable image editing Causal structure learning Causal representation learning
原文传递
Autoencoder-based conditional optimal transport generative adversarial network for medical image generation
8
作者 Jun Wang Bohan Lei +3 位作者 Liya Ding Xiaoyin Xu Xianfeng Gu Min Zhang 《Visual Informatics》 EI 2024年第1期15-25,共11页
Medical image generation has recently garnered significant interest among researchers.However,the primary generative models,such as Generative Adversarial Networks(GANs),often encounter challenges during training,incl... Medical image generation has recently garnered significant interest among researchers.However,the primary generative models,such as Generative Adversarial Networks(GANs),often encounter challenges during training,including mode collapse.To address these issues,we proposed the AECOT-GAN model(Autoencoder-based Conditional Optimal Transport Generative Adversarial Network)for the generation of medical images belonging to specific categories.The training process of our model comprises three fundamental components.The training process of our model encompasses three fundamental components.First,we employ an autoencoder model to obtain a low-dimensional manifold representation of real images.Second,we apply extended semi-discrete optimal transport to map Gaussian noise distribution to the latent space distribution and obtain corresponding labels effectively.This procedure leads to the generation of new latent codes with known labels.Finally,we integrate a GAN to train the decoder further to generate medical images.To evaluate the performance of the AE-COT-GAN model,we conducted experiments on two medical image datasets,namely DermaMNIST and BloodMNIST.The model’s performance was compared with state-of-the-art generative models.Results show that the AE-COT-GAN model had excellent performance in generating medical images.Moreover,it effectively addressed the common issues associated with traditional GANs. 展开更多
关键词 Medical image generation Mode collapse Mode mixing Optimal transport Generative adversarial networks
原文传递
On generated artistic styles:Image generation experiments with GAN algorithms 被引量:1
9
作者 Jianheng Xiang 《Visual Informatics》 EI 2023年第4期36-40,共5页
As computer graphics technology supports pursuing a photorealistic style,replicated artworks with a photorealistic style overwhelmingly predominate in the computer-generated art circle.Along with the progression of ge... As computer graphics technology supports pursuing a photorealistic style,replicated artworks with a photorealistic style overwhelmingly predominate in the computer-generated art circle.Along with the progression of generative technology,this trend may make generative art a virtual world of photorealistic fake,in which the single criterion of expressive style imperils art into the context of a single boring stereotype.This article focuses on the issue of style diversity and its technical feasibility by artistic experiments of generating flower images in StyleGAN.The author insisted that photo both technology and artistic style should not be confined merely for realistic purposes.This proposition was validated in the GAN generation experiment by changing the training materials. 展开更多
关键词 CG art Virtual realistic image generation Deep learning Machine replication
原文传递
Data augmentation via joint multi-scale CNN and multi-channel attention for bumblebee image generation 被引量:1
10
作者 Du Rong Chen Shudong +3 位作者 Li Weiwei Zhang Xueting Wang Xianhui Ge Jin 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2023年第3期32-40,98,共10页
The difficulty of bumblebee data collecting and the laborious nature of bumblebee data annotation sometimes result in a lack of training data,which impairs the effectiveness of deep learning based counting methods.Giv... The difficulty of bumblebee data collecting and the laborious nature of bumblebee data annotation sometimes result in a lack of training data,which impairs the effectiveness of deep learning based counting methods.Given that it is challenging to produce the detailed background information in the generated bumblebee images using current data augmentation methods,in this paper,a joint multi-scale convolutional neural network and multi-channel attention based generative adversarial network(MMGAN)is proposed.MMGAN generates the bumblebee image in accordance with the corresponding density map marking the bumblebee positions.Specifically,the multi-scale convolutional neural network(CNN)module utilizes multiple convolution kernels to completely extract features of different scales from the input bumblebee image and density map.To generate various targets in the generated image,the multi-channel attention module builds numerous intermediate generation layers and attention maps.These targets are then stacked to produce a bumblebee image with a specific number of bumblebees.The proposed model obtains the greatest performance in bumblebee image generating tasks,and such generated bumblebee images considerably improve the efficiency of deep learning based counting methods in bumblebee counting applications. 展开更多
关键词 data augmentation image generation attention mechanism
原文传递
Restoration of the JPEG Maximum Lossy Compressed Face Images with Hourglass Block-GAN 被引量:2
11
作者 Jongwook Si Sungyoung Kim 《Computers, Materials & Continua》 SCIE EI 2024年第3期2893-2908,共16页
In the context of high compression rates applied to Joint Photographic Experts Group(JPEG)images through lossy compression techniques,image-blocking artifacts may manifest.This necessitates the restoration of the imag... In the context of high compression rates applied to Joint Photographic Experts Group(JPEG)images through lossy compression techniques,image-blocking artifacts may manifest.This necessitates the restoration of the image to its original quality.The challenge lies in regenerating significantly compressed images into a state in which these become identifiable.Therefore,this study focuses on the restoration of JPEG images subjected to substantial degradation caused by maximum lossy compression using Generative Adversarial Networks(GAN).The generator in this network is based on theU-Net architecture.It features a newhourglass structure that preserves the characteristics of the deep layers.In addition,the network incorporates two loss functions to generate natural and high-quality images:Low Frequency(LF)loss and High Frequency(HF)loss.HF loss uses a pretrained VGG-16 network and is configured using a specific layer that best represents features.This can enhance the performance in the high-frequency region.In contrast,LF loss is used to handle the low-frequency region.The two loss functions facilitate the generation of images by the generator,which can mislead the discriminator while accurately generating high-and low-frequency regions.Consequently,by removing the blocking effects frommaximum lossy compressed images,images inwhich identities could be recognized are generated.This study represents a significant improvement over previous research in terms of the image resolution performance. 展开更多
关键词 JPEG lossy compression RESTORATION image generation GAN
在线阅读 下载PDF
A Low Spectral Bias Generative Adversarial Model for Image Generation
12
作者 Lei Xu Zhentao Liu +1 位作者 Peng Liu Liyan Cai 《国际计算机前沿大会会议论文集》 2022年第1期354-362,共9页
We propose a systematic analysis of the neglected spectral bias in the frequency domain in this paper.Traditional generative adversarial networks(GANs)try to fulfill the details of images by designing specific network... We propose a systematic analysis of the neglected spectral bias in the frequency domain in this paper.Traditional generative adversarial networks(GANs)try to fulfill the details of images by designing specific network architectures or losses,focusing on generating visually qualitative images.The convolution theorem shows that image processing in the frequency domain is parallelizable and performs better and faster than that in the spatial domain.However,there is little work about discussing the bias of frequency features between the generated images and the real ones.In this paper,we first empirically demonstrate the general distribution bias across datasets and GANs with different sampling methods.Then,we explain the causes of the spectral bias through the deduction that reconsiders the sampling process of the GAN generator.Based on these studies,we provide a low-spectral-bias hybrid generative model to reduce the spectral bias and improve the quality of the generated images. 展开更多
关键词 Deep learning applications image generation models Generative adversarial network
原文传递
In vivo label-free measurement of blood flow velocity symmetry based on dual line scanning third-harmonic generation microscopy excited at the 1700 nm window 被引量:1
13
作者 Hui Cheng Jincheng Zhong +1 位作者 Ping Qiu Ke Wang 《Journal of Innovative Optical Health Sciences》 SCIE EI CSCD 2024年第1期61-68,共8页
Measurement of bloodflow velocity is key to understanding physiology and pathology in vivo.While most measurements are performed at the middle of the blood vessel,little research has been done on characterizing the in... Measurement of bloodflow velocity is key to understanding physiology and pathology in vivo.While most measurements are performed at the middle of the blood vessel,little research has been done on characterizing the instantaneous bloodflow velocity distribution.This is mainly due to the lack of measurement technology with high spatial and temporal resolution.Here,we tackle this problem with our recently developed dual-wavelength line-scan third-harmonic generation(THG)imaging technology.Simultaneous acquisition of dual-wavelength THG line-scanning signals enables measurement of bloodflow velocities at two radially symmetric positions in both venules and arterioles in mouse brain in vivo.Our results clearly show that the instantaneous bloodflow velocity is not symmetric under general conditions. 展开更多
关键词 1700 nm-Window third-harmonic generation imaging blood flow velocity
原文传递
Research on multi-view collaborative detection system for UAV swarms based on Pix2Pix framework and BAM attention mechanism
14
作者 Yan Ding Qingxin Cao +2 位作者 Bozhi Zhang Peilin Li Zhongjiao Shi 《Defence Technology(防务技术)》 2025年第4期213-226,共14页
Drone swarm systems,equipped with photoelectric imaging and intelligent target perception,are essential for reconnaissance and strike missions in complex and high-risk environments.They excel in information sharing,an... Drone swarm systems,equipped with photoelectric imaging and intelligent target perception,are essential for reconnaissance and strike missions in complex and high-risk environments.They excel in information sharing,anti-jamming capabilities,and combat performance,making them critical for future warfare.However,varied perspectives in collaborative combat scenarios pose challenges to object detection,hindering traditional detection algorithms and reducing accuracy.Limited angle-prior data and sparse samples further complicate detection.This paper presents the Multi-View Collaborative Detection System,which tackles the challenges of multi-view object detection in collaborative combat scenarios.The system is designed to enhance multi-view image generation and detection algorithms,thereby improving the accuracy and efficiency of object detection across varying perspectives.First,an observation model for three-dimensional targets through line-of-sight angle transformation is constructed,and a multi-view image generation algorithm based on the Pix2Pix network is designed.For object detection,YOLOX is utilized,and a deep feature extraction network,BA-RepCSPDarknet,is developed to address challenges related to small target scale and feature extraction challenges.Additionally,a feature fusion network NS-PAFPN is developed to mitigate the issue of deep feature map information loss in UAV images.A visual attention module(BAM)is employed to manage appearance differences under varying angles,while a feature mapping module(DFM)prevents fine-grained feature loss.These advancements lead to the development of BA-YOLOX,a multi-view object detection network model suitable for drone platforms,enhancing accuracy and effectively targeting small objects. 展开更多
关键词 Drone swarm systems Reconnaissance and strike image generation Multi-view detection Pix2Pix framework Attention mechanism
在线阅读 下载PDF
HRAM-VITON: High-Resolution Virtual Try-On with Attention Mechanism
15
作者 Yue Chen Xiaoman Liang +2 位作者 Mugang Lin Fachao Zhang Huihuang Zhao 《Computers, Materials & Continua》 2025年第2期2753-2768,共16页
The objective of image-based virtual try-on is to seamlessly integrate clothing onto a target image, generating a realistic representation of the character in the specified attire. However, existing virtual try-on met... The objective of image-based virtual try-on is to seamlessly integrate clothing onto a target image, generating a realistic representation of the character in the specified attire. However, existing virtual try-on methods frequently encounter challenges, including misalignment between the body and clothing, noticeable artifacts, and the loss of intricate garment details. To overcome these challenges, we introduce a two-stage high-resolution virtual try-on framework that integrates an attention mechanism, comprising a garment warping stage and an image generation stage. During the garment warping stage, we incorporate a channel attention mechanism to effectively retain the critical features of the garment, addressing challenges such as the loss of patterns, colors, and other essential details commonly observed in virtual try-on images produced by existing methods. During the image generation stage, with the aim of maximizing the utilization of the information proffered by the input image, the input features undergo double sampling within the normalization procedure, thereby enhancing the detail fidelity and clothing alignment efficacy of the output image. Experimental evaluations conducted on high-resolution datasets validate the effectiveness of the proposed method. Results demonstrate significant improvements in preserving garment details, reducing artifacts, and achieving superior alignment between the clothing and body compared to baseline methods, establishing its advantage in generating realistic and high-quality virtual try-on images. 展开更多
关键词 Virtual try-on attention mechanism HIGH-RESOLUTION image generation
在线阅读 下载PDF
Dual Variational Generation Based ResNeSt for Near Infrared-Visible Face Recognition
16
作者 DING Xiangwu LIU Chao QIN Yanxia 《Journal of Donghua University(English Edition)》 CAS 2022年第2期156-162,共7页
Near infrared-visible(NIR-VIS)face recognition is to match an NIR face image to a VIS image.The main challenges of NIR-VIS face recognition are the gap caused by cross-modality and the lack of sufficient paired NIR-VI... Near infrared-visible(NIR-VIS)face recognition is to match an NIR face image to a VIS image.The main challenges of NIR-VIS face recognition are the gap caused by cross-modality and the lack of sufficient paired NIR-VIS face images to train models.This paper focuses on the generation of paired NIR-VIS face images and proposes a dual variational generator based on ResNeSt(RS-DVG).RS-DVG can generate a large number of paired NIR-VIS face images from noise,and these generated NIR-VIS face images can be used as the training set together with the real NIR-VIS face images.In addition,a triplet loss function is introduced and a novel triplet selection method is proposed specifically for the training of the current face recognition model,which maximizes the inter-class distance and minimizes the intra-class distance in the input face images.The method proposed in this paper was evaluated on the datasets CASIA NIR-VIS 2.0 and BUAA-VisNir,and relatively good results were obtained. 展开更多
关键词 near infrared-visible face recognition face image generation ResNeSt triplet loss function attention mechanism
在线阅读 下载PDF
Text to image generation with bidirectional Multiway Transformers
17
作者 Hangbo Bao Li Dong +1 位作者 Songhao Piao Furu Wei 《Computational Visual Media》 2025年第2期405-422,共18页
In this study,we explore the potential of Multiway Transformers for text-to-image generation to achieve performance improvements through a concise and efficient decoupled model design and the inference efficiency prov... In this study,we explore the potential of Multiway Transformers for text-to-image generation to achieve performance improvements through a concise and efficient decoupled model design and the inference efficiency provided by bidirectional encoding.We propose a method for improving the image tokenizer using pretrained Vision Transformers.Next,we employ bidirectional Multiway Transformers to restore the masked visual tokens combined with the unmasked text tokens.On the MS-COCO benchmark,our Multiway Transformers outperform vanilla Transformers,achieving superior FID scores and confirming the efficacy of the modality-specific parameter computation design.Ablation studies reveal that the fusion of visual and text tokens in bidirectional encoding contributes to improved model performance.Additionally,our proposed tokenizer outperforms VQGAN in image reconstruction quality and enhances the text-to-image generation results.By incorporating the additional CC-3M dataset for intermediate finetuning on our model with 688M parameters,we achieve competitive results with a finetuned FID score of 4.98 on MS-COCO. 展开更多
关键词 text to image generation VQ-VAE Transformer generative models
原文传递
Deep Learning for Distinguishing Computer Generated Images and Natural Images:A Survey 被引量:4
18
作者 Bingtao Hu Jinwei Wang 《Journal of Information Hiding and Privacy Protection》 2020年第2期95-105,共11页
With the development of computer graphics,realistic computer graphics(CG)have become more and more common in our field of vision.This rendered image is invisible to the naked eye.How to effectively identify CG and nat... With the development of computer graphics,realistic computer graphics(CG)have become more and more common in our field of vision.This rendered image is invisible to the naked eye.How to effectively identify CG and natural images(NI)has been become a new issue in the field of digital forensics.In recent years,a series of deep learning network frameworks have shown great advantages in the field of images,which provides a good choice for us to solve this problem.This paper aims to track the latest developments and applications of deep learning in the field of CG and NI forensics in a timely manner.Firstly,it introduces the background of deep learning and the knowledge of convolutional neural networks.The purpose is to understand the basic model structure of deep learning applications in the image field,and then outlines the mainstream framework;secondly,it briefly introduces the application of deep learning in CG and NI forensics,and finally points out the problems of deep learning in this field and the prospects for the future. 展开更多
关键词 Deep learning convolutional neural network image forensics computer generated image natural image
在线阅读 下载PDF
A method to generate foggy optical images based on unsupervised depth estimation
19
作者 WANG Xiangjun LIU Linghao +1 位作者 NI Yubo WANG Lin 《Journal of Measurement Science and Instrumentation》 CAS CSCD 2021年第1期44-52,共9页
For traffic object detection in foggy environment based on convolutional neural network(CNN),data sets in fog-free environment are generally used to train the network directly.As a result,the network cannot learn the ... For traffic object detection in foggy environment based on convolutional neural network(CNN),data sets in fog-free environment are generally used to train the network directly.As a result,the network cannot learn the object characteristics in the foggy environment in the training set,and the detection effect is not good.To improve the traffic object detection in foggy environment,we propose a method of generating foggy images on fog-free images from the perspective of data set construction.First,taking the KITTI objection detection data set as an original fog-free image,we generate the depth image of the original image by using improved Monodepth unsupervised depth estimation method.Then,a geometric prior depth template is constructed to fuse the image entropy taken as weight with the depth image.After that,a foggy image is acquired from the depth image based on the atmospheric scattering model.Finally,we take two typical object-detection frameworks,that is,the two-stage object-detection Fster region-based convolutional neural network(Faster-RCNN)and the one-stage object-detection network YOLOv4,to train the original data set,the foggy data set and the mixed data set,respectively.According to the test results on RESIDE-RTTS data set in the outdoor natural foggy environment,the model under the training on the mixed data set shows the best effect.The mean average precision(mAP)values are increased by 5.6%and by 5.0%under the YOLOv4 model and the Faster-RCNN network,respectively.It is proved that the proposed method can effectively improve object identification ability foggy environment. 展开更多
关键词 traffic object detection foggy images generation unsupervised depth estimation YOLOv4 model Faster region-based convolutional neural network(Faster-RCNN)
在线阅读 下载PDF
An Interactive Collaborative Creation System for Shadow Puppets Based on Smooth Generative Adversarial Networks
20
作者 Cheng Yang Miaojia Lou +1 位作者 Xiaoyu Chen Zixuan Ren 《Computers, Materials & Continua》 SCIE EI 2024年第6期4107-4126,共20页
Chinese shadow puppetry has been recognized as a world intangible cultural heritage.However,it faces substantial challenges in its preservation and advancement due to the intricate and labor-intensive nature of crafti... Chinese shadow puppetry has been recognized as a world intangible cultural heritage.However,it faces substantial challenges in its preservation and advancement due to the intricate and labor-intensive nature of crafting shadow puppets.To ensure the inheritance and development of this cultural heritage,it is imperative to enable traditional art to flourish in the digital era.This paper presents an Interactive Collaborative Creation System for shadow puppets,designed to facilitate the creation of high-quality shadow puppet images with greater ease.The system comprises four key functions:Image contour extraction,intelligent reference recommendation,generation network,and color adjustment,all aimed at assisting users in various aspects of the creative process,including drawing,inspiration,and content generation.Additionally,we propose an enhanced algorithm called Smooth Generative Adversarial Networks(SmoothGAN),which exhibits more stable gradient training and a greater capacity for generating high-resolution shadow puppet images.Furthermore,we have built a new dataset comprising high-quality shadow puppet images to train the shadow puppet generation model.Both qualitative and quantitative experimental results demonstrate that SmoothGAN significantly improves the quality of image generation,while our system efficiently assists users in creating high-quality shadow puppet images,with a SUS scale score of 84.4.This study provides a valuable theoretical and practical reference for the digital creation of shadow puppet art. 展开更多
关键词 Shadow puppets deep learning image generation co-create
在线阅读 下载PDF
上一页 1 2 下一页 到第
使用帮助 返回顶部