With the development of computer graphics,realistic computer graphics(CG)have become more and more common in our field of vision.This rendered image is invisible to the naked eye.How to effectively identify CG and nat...With the development of computer graphics,realistic computer graphics(CG)have become more and more common in our field of vision.This rendered image is invisible to the naked eye.How to effectively identify CG and natural images(NI)has been become a new issue in the field of digital forensics.In recent years,a series of deep learning network frameworks have shown great advantages in the field of images,which provides a good choice for us to solve this problem.This paper aims to track the latest developments and applications of deep learning in the field of CG and NI forensics in a timely manner.Firstly,it introduces the background of deep learning and the knowledge of convolutional neural networks.The purpose is to understand the basic model structure of deep learning applications in the image field,and then outlines the mainstream framework;secondly,it briefly introduces the application of deep learning in CG and NI forensics,and finally points out the problems of deep learning in this field and the prospects for the future.展开更多
Objective To develop a facial image generation method based on a facial color-preserving generative adversarial network(FCP-GAN)that effectively decouples identity features from diagnostic facial complexion characteri...Objective To develop a facial image generation method based on a facial color-preserving generative adversarial network(FCP-GAN)that effectively decouples identity features from diagnostic facial complexion characteristics in traditional Chinese medicine(TCM)inspection,thereby addressing the critical challenge of privacy preservation in medical image analysis.Methods A facial image dataset was constructed from participants at Nanjing University of Chinese Medicine between April 23 and June 10,2023,using a TCM full-body inspection data acquisition equipment under controlled illumination.The proposed FCP-GAN model was designed to achieve the dual objectives of removing identity features and preserving colors through three key components:(i)a multi-space combination module that comprehensively extracts color attributes from red,green,blue(RGB),hue,saturation,value(HSV),and Lab spaces;(ii)a generator incorporating efficient channel attention(ECA)mechanism to enhance the representation of diagnostically critical color channels;and(iii)a dual-loss function that combines adversarial loss for de-identification with a dedicated color preservation loss.The model was trained and evaluated using a stratified 5-fold cross-validation strategy and evaluated against four baseline generative models:conditional GAN(CGAN),deep convolutional GAN(DCGAN),dual discriminator CGAN(DDCGAN),and medical GAN(MedGAN).Performance was assessed in terms of image quality[peak signal-to-noise ratio(PSNR)and structural similarity(SSIM)],distribution similarity[Fréchet inception distance(FID)],privacy protection(face recognition accuracy),and diagnostic consistency[mean squared error(MSE)and Pearson correlation coefficient(PCC)].Results The final analysis included facial images from 216 participants.Compared with baseline models,FCP-GAN achieved superior performance,with PSNR=31.02 dB and SSIM=0.908,representing an improvement of 1.21 dB and 0.034 in SSIM over the strongest baseline(MedGAN).The FID value(23.45)was also the lowest among all models,indicating superior distributional similarity to real images.The multi-space feature fusion and the ECA mechanism contributed significantly to these performance gains,as evidenced by ablation studies.The stratified 5-fold cross-validation confirmed the model’s robustness,with results reported as mean±standard deviation(SD)across all folds.The model effectively protected privacy by reducing face recognition accuracy from 95.2%(original images)to 60.1%(generated images).Critically,it maintained high diagnostic fidelity,as evidenced by a low MSE(<0.051)and a high PCC(>0.98)for key TCM facial features between original and generated images.Conclusion The FCP-GAN model provides an effective technical solution for ensuring privacy in TCM diagnostic imaging,successfully having removed identity features while preserving clinically vital facial color features.This study offers significant value for developing intelligent and secure TCM telemedicine systems.展开更多
The rapid advancements in computer vision(CV)technology have transformed the traditional approaches to material microstructure analysis.This review outlines the history of CV and explores the applications of deep-lear...The rapid advancements in computer vision(CV)technology have transformed the traditional approaches to material microstructure analysis.This review outlines the history of CV and explores the applications of deep-learning(DL)-driven CV in four key areas of materials science:microstructure-based performance prediction,microstructure information generation,microstructure defect detection,and crystal structure-based property prediction.The CV has significantly reduced the cost of traditional experimental methods used in material performance prediction.Moreover,recent progress made in generating microstructure images and detecting microstructural defects using CV has led to increased efficiency and reliability in material performance assessments.The DL-driven CV models can accelerate the design of new materials with optimized performance by integrating predictions based on both crystal and microstructural data,thereby allowing for the discovery and innovation of next-generation materials.Finally,the review provides insights into the rapid interdisciplinary developments in the field of materials science and future prospects.展开更多
During the image generation phase,the parserfree Flow-Style-VTON model(PF-Flow-Style-VTON),which utilizes distilled appearance flows,faces two main challenges:blurring,deformation,occlusion,or loss of the arm or palm ...During the image generation phase,the parserfree Flow-Style-VTON model(PF-Flow-Style-VTON),which utilizes distilled appearance flows,faces two main challenges:blurring,deformation,occlusion,or loss of the arm or palm regions in the generated image when these regions of the person occlude the garment;blurring and deformation in the generated image when the person performs large pose movements and the target garment is complex with detailed patterns.To solve these two problems,an improved virtual try-on network model,denoted as IPF-Flow-Style-VTON,is proposed.Firstly,a target warped garment mask refinement module(M-RM)is introduced to refine the warped garment mask and remove erroneous information in the arm and palm regions,thereby improving the quality of subsequent image generation.Secondly,an improved global attention module(GAM)is integrated into the original image generation network,enhancing the ResUNet’s understanding of global context and optimizing the fusion of local features and global information,thereby further improving image generation quality.Finally,the UniPose model is used to provide the pose keypoint information of the target person image,guiding the task execution during the image generation phase.Experiments conducted on the VITON dataset show that the proposed method outperforms the original method,Flow-Style-VTON,by 5.4%,0.3%,6.7%,and 2.2%in Frchet inception distance(FID),structural similarity index measure(SSIM),learned perceptual image patch similarity(LPIPS),and peak signal-to-noise ratio(PSNR),respectively.Overall,the proposed method effectively improves upon the shortcomings of the original network and achieves better visual results.展开更多
With the rapid development of image-generative AI (artificial intelligence) technology, its application in undergraduate Landscape Architecture education has demonstrated significant potential. Based on this, the pres...With the rapid development of image-generative AI (artificial intelligence) technology, its application in undergraduate Landscape Architecture education has demonstrated significant potential. Based on this, the present study explores the implications of integrating image-generative AI into Landscape Architecture courses from three perspectives: stimulating students creative design potential, expanding approaches to form and concept generation, and enhancing the visualization of spatial scenes. Furthermore, it discusses application strategies from three dimensions: AI-assisted conceptual generation, human-machine collaboration for design refinement, and optimization of scheme presentation and evaluation. This paper aims to provide relevant educators with insights and references.展开更多
In the context of high compression rates applied to Joint Photographic Experts Group(JPEG)images through lossy compression techniques,image-blocking artifacts may manifest.This necessitates the restoration of the imag...In the context of high compression rates applied to Joint Photographic Experts Group(JPEG)images through lossy compression techniques,image-blocking artifacts may manifest.This necessitates the restoration of the image to its original quality.The challenge lies in regenerating significantly compressed images into a state in which these become identifiable.Therefore,this study focuses on the restoration of JPEG images subjected to substantial degradation caused by maximum lossy compression using Generative Adversarial Networks(GAN).The generator in this network is based on theU-Net architecture.It features a newhourglass structure that preserves the characteristics of the deep layers.In addition,the network incorporates two loss functions to generate natural and high-quality images:Low Frequency(LF)loss and High Frequency(HF)loss.HF loss uses a pretrained VGG-16 network and is configured using a specific layer that best represents features.This can enhance the performance in the high-frequency region.In contrast,LF loss is used to handle the low-frequency region.The two loss functions facilitate the generation of images by the generator,which can mislead the discriminator while accurately generating high-and low-frequency regions.Consequently,by removing the blocking effects frommaximum lossy compressed images,images inwhich identities could be recognized are generated.This study represents a significant improvement over previous research in terms of the image resolution performance.展开更多
For traffic object detection in foggy environment based on convolutional neural network(CNN),data sets in fog-free environment are generally used to train the network directly.As a result,the network cannot learn the ...For traffic object detection in foggy environment based on convolutional neural network(CNN),data sets in fog-free environment are generally used to train the network directly.As a result,the network cannot learn the object characteristics in the foggy environment in the training set,and the detection effect is not good.To improve the traffic object detection in foggy environment,we propose a method of generating foggy images on fog-free images from the perspective of data set construction.First,taking the KITTI objection detection data set as an original fog-free image,we generate the depth image of the original image by using improved Monodepth unsupervised depth estimation method.Then,a geometric prior depth template is constructed to fuse the image entropy taken as weight with the depth image.After that,a foggy image is acquired from the depth image based on the atmospheric scattering model.Finally,we take two typical object-detection frameworks,that is,the two-stage object-detection Fster region-based convolutional neural network(Faster-RCNN)and the one-stage object-detection network YOLOv4,to train the original data set,the foggy data set and the mixed data set,respectively.According to the test results on RESIDE-RTTS data set in the outdoor natural foggy environment,the model under the training on the mixed data set shows the best effect.The mean average precision(mAP)values are increased by 5.6%and by 5.0%under the YOLOv4 model and the Faster-RCNN network,respectively.It is proved that the proposed method can effectively improve object identification ability foggy environment.展开更多
In pathological examinations,tissue must first be stained to meet specific diagnostic requirements,a meticulous process demanding significant time and expertise from specialists.With advancements in deep learning,this...In pathological examinations,tissue must first be stained to meet specific diagnostic requirements,a meticulous process demanding significant time and expertise from specialists.With advancements in deep learning,this staining process can now be achieved through computational methods known as virtual staining.This technique replicates the visual effects of traditional histological staining in pathological imaging,enhancing efficiency and reducing costs.Extensive research in virtual staining for pathology has already demonstrated its effectiveness in generating clinically relevant stained images across a variety of diagnostic scenarios.Unlike previous reviews that broadly cover the clinical applications of virtual staining,this paper focuses on the technical methodologies,encompassing current models,datasets,and evaluation methods.It highlights the unique challenges of virtual staining compared to traditional image translation,discusses limitations in existing work,and explores future perspectives.Adopting a macro perspective,we avoid overly intricate technical details to make the content accessible to clinical experts.Additionally,we provide a brief introduction to the purpose of virtual staining from a medical standpoint,which may inspire algorithm-focused researchers.This paper aims to promote a deeper understanding of interdisciplinary knowledge between algorithm developers and clinicians,fostering the integration of technical solutions and medical expertise in the development of virtual staining models.This collaboration seeks to create more efficient,generalized,and versatile virtual staining models for a wide range of clinical applications.展开更多
The objective of image-based virtual try-on is to seamlessly integrate clothing onto a target image, generating a realistic representation of the character in the specified attire. However, existing virtual try-on met...The objective of image-based virtual try-on is to seamlessly integrate clothing onto a target image, generating a realistic representation of the character in the specified attire. However, existing virtual try-on methods frequently encounter challenges, including misalignment between the body and clothing, noticeable artifacts, and the loss of intricate garment details. To overcome these challenges, we introduce a two-stage high-resolution virtual try-on framework that integrates an attention mechanism, comprising a garment warping stage and an image generation stage. During the garment warping stage, we incorporate a channel attention mechanism to effectively retain the critical features of the garment, addressing challenges such as the loss of patterns, colors, and other essential details commonly observed in virtual try-on images produced by existing methods. During the image generation stage, with the aim of maximizing the utilization of the information proffered by the input image, the input features undergo double sampling within the normalization procedure, thereby enhancing the detail fidelity and clothing alignment efficacy of the output image. Experimental evaluations conducted on high-resolution datasets validate the effectiveness of the proposed method. Results demonstrate significant improvements in preserving garment details, reducing artifacts, and achieving superior alignment between the clothing and body compared to baseline methods, establishing its advantage in generating realistic and high-quality virtual try-on images.展开更多
As a form of discrete representation learning,Vector Quantized Variational Autoencoders(VQ-VAE)have increasingly been applied to generative and multimodal tasks due to their ease of embedding and representative capaci...As a form of discrete representation learning,Vector Quantized Variational Autoencoders(VQ-VAE)have increasingly been applied to generative and multimodal tasks due to their ease of embedding and representative capacity.However,existing VQ-VAEs often perform quantization in the spatial domain,ignoring global structural information and potentially suffering from codebook collapse and information coupling issues.This paper proposes a frequency quantized variational autoencoder(FQ-VAE)to address these issues.The proposed method transforms image features into linear combinations in the frequency domain using a 2D fast Fourier transform(2D-FFT)and performs adaptive quantization on these frequency components to preserve image’s global relationships.The codebook is dynamically optimized to avoid collapse and information coupling issue by considering the usage frequency and dependency of code vectors.Furthermore,we introduce a post-processing module based on graph convolutional networks to further improve reconstruction quality.Experimental results on four public datasets demonstrate that the proposed method outperforms state-of-the-art approaches in terms of Structural Similarity Index(SSIM),Learned Perceptual Image Patch Similarity(LPIPS),and Reconstruction Fréchet Inception Distance(rFID).In the experiments on the CIFAR-10 dataset,compared to the baselinemethod VQ-VAE,the proposedmethod improves the abovemetrics by 4.9%,36.4%,and 52.8%,respectively.展开更多
Drone swarm systems,equipped with photoelectric imaging and intelligent target perception,are essential for reconnaissance and strike missions in complex and high-risk environments.They excel in information sharing,an...Drone swarm systems,equipped with photoelectric imaging and intelligent target perception,are essential for reconnaissance and strike missions in complex and high-risk environments.They excel in information sharing,anti-jamming capabilities,and combat performance,making them critical for future warfare.However,varied perspectives in collaborative combat scenarios pose challenges to object detection,hindering traditional detection algorithms and reducing accuracy.Limited angle-prior data and sparse samples further complicate detection.This paper presents the Multi-View Collaborative Detection System,which tackles the challenges of multi-view object detection in collaborative combat scenarios.The system is designed to enhance multi-view image generation and detection algorithms,thereby improving the accuracy and efficiency of object detection across varying perspectives.First,an observation model for three-dimensional targets through line-of-sight angle transformation is constructed,and a multi-view image generation algorithm based on the Pix2Pix network is designed.For object detection,YOLOX is utilized,and a deep feature extraction network,BA-RepCSPDarknet,is developed to address challenges related to small target scale and feature extraction challenges.Additionally,a feature fusion network NS-PAFPN is developed to mitigate the issue of deep feature map information loss in UAV images.A visual attention module(BAM)is employed to manage appearance differences under varying angles,while a feature mapping module(DFM)prevents fine-grained feature loss.These advancements lead to the development of BA-YOLOX,a multi-view object detection network model suitable for drone platforms,enhancing accuracy and effectively targeting small objects.展开更多
A phase-aware cross-modal framework is presented that synthesizes UWF_FA from non-invasive UWF_RI for diabetic retinopathy(DR)stratification.A curated cohort of 1198 patients(2915 UWF_RI and 17,854 UWF_FA images)with ...A phase-aware cross-modal framework is presented that synthesizes UWF_FA from non-invasive UWF_RI for diabetic retinopathy(DR)stratification.A curated cohort of 1198 patients(2915 UWF_RI and 17,854 UWF_FA images)with strict registration quality supports training across three angiographic phases(initial,mid,final).The generator is based on a modified pix2pixHD with an added Gradient Variance Loss to better preserve microvasculature,and is evaluated using MAE,PSNR,SSIM,and MS-SSIM on held-out pairs.Quantitatively,the mid phase achieves the lowestMAE(98.76±42.67),while SSIM remains high across phases.Expert reviewshows substantial agreement(Cohen's κ=0.78–0.82)and Turing-stylemisclassification of 50%–70%of synthetic images as real,indicating strong perceptual realism.For downstream DR stratification,fusing multi-phase synthetic UWF_FA with UWF_RI in a Swin Transformer classifier yields significant gains over a UWF_RI-only baseline,with the full-phase setting(Set D)reaching AUC=0.910 and accuracy=0.829.These results support synthetic UWF_FA as a scalable,non-invasive complement to dye-based angiography that enhances screening accuracy while avoiding injection-related risks.展开更多
Measurement of bloodflow velocity is key to understanding physiology and pathology in vivo.While most measurements are performed at the middle of the blood vessel,little research has been done on characterizing the in...Measurement of bloodflow velocity is key to understanding physiology and pathology in vivo.While most measurements are performed at the middle of the blood vessel,little research has been done on characterizing the instantaneous bloodflow velocity distribution.This is mainly due to the lack of measurement technology with high spatial and temporal resolution.Here,we tackle this problem with our recently developed dual-wavelength line-scan third-harmonic generation(THG)imaging technology.Simultaneous acquisition of dual-wavelength THG line-scanning signals enables measurement of bloodflow velocities at two radially symmetric positions in both venules and arterioles in mouse brain in vivo.Our results clearly show that the instantaneous bloodflow velocity is not symmetric under general conditions.展开更多
Generative adversarial networks(GANs)with gaming abilities have been widely applied in image generation.However,gamistic generators and discriminators may reduce the robustness of the obtained GANs in image generation...Generative adversarial networks(GANs)with gaming abilities have been widely applied in image generation.However,gamistic generators and discriminators may reduce the robustness of the obtained GANs in image generation under varying scenes.Enhancing the relation of hierarchical information in a generation network and enlarging differences of different network architectures can facilitate more structural information to improve the generation effect for image generation.In this paper,we propose an enhanced GAN via improving a generator for image generation(EIGGAN).EIGGAN applies a spatial attention to a generator to extract salient information to enhance the truthfulness of the generated images.Taking into relation the context account,parallel residual operations are fused into a generation network to extract more structural information from the different layers.Finally,a mixed loss function in a GAN is exploited to make a tradeoff between speed and accuracy to generate more realistic images.Experimental results show that the proposed method is superior to popular methods,i.e.,Wasserstein GAN with gradient penalty(WGAN-GP)in terms of many indexes,i.e.,Frechet Inception Distance,Learned Perceptual Image Patch Similarity,Multi-Scale Structural Similarity Index Measure,Kernel Inception Distance,Number of Statistically-Different Bins,Inception Score and some visual images for image generation.展开更多
In this paper an evaluation of the influence of luminance L* at the L*a*b* color space during color segmentation is presented. A comparative study is made between the behavior of segmentation in color images using onl...In this paper an evaluation of the influence of luminance L* at the L*a*b* color space during color segmentation is presented. A comparative study is made between the behavior of segmentation in color images using only the Euclidean metric of a* and b* and an adaptive color similarity function defined as a product of Gaussian functions in a modified HSI color space. For the evaluation synthetic images were particularly designed to accurately assess the performance of the color segmentation. The testing system can be used either to explore the behavior of a similarity function (or metric) in different color spaces or to explore different metrics (or similarity functions) in the same color space. From the results is obtained that the color parameters a* and b* are not independent of the luminance parameter L* as one might initially assume.展开更多
A new nonlinear optical third\|harmonic imaging technology in reflected fashion in bio\|tissues by using cascading effect, a process whereby the second\|order effects combine to contribute to a...A new nonlinear optical third\|harmonic imaging technology in reflected fashion in bio\|tissues by using cascading effect, a process whereby the second\|order effects combine to contribute to a third\|order nonlinear process, has been analyzed. The performance of the reflected optical third harmonic imaging enhanced by cascading effect in bio\|tissues is analyzed with the semi\|classical theory. The microscopic understanding of the enhancement of cascaded optical third\|harmonic imaging in reflected manner in bio\|tissues has been discussed.Some i deas for further enhancement is given.展开更多
Chinese shadow puppetry has been recognized as a world intangible cultural heritage.However,it faces substantial challenges in its preservation and advancement due to the intricate and labor-intensive nature of crafti...Chinese shadow puppetry has been recognized as a world intangible cultural heritage.However,it faces substantial challenges in its preservation and advancement due to the intricate and labor-intensive nature of crafting shadow puppets.To ensure the inheritance and development of this cultural heritage,it is imperative to enable traditional art to flourish in the digital era.This paper presents an Interactive Collaborative Creation System for shadow puppets,designed to facilitate the creation of high-quality shadow puppet images with greater ease.The system comprises four key functions:Image contour extraction,intelligent reference recommendation,generation network,and color adjustment,all aimed at assisting users in various aspects of the creative process,including drawing,inspiration,and content generation.Additionally,we propose an enhanced algorithm called Smooth Generative Adversarial Networks(SmoothGAN),which exhibits more stable gradient training and a greater capacity for generating high-resolution shadow puppet images.Furthermore,we have built a new dataset comprising high-quality shadow puppet images to train the shadow puppet generation model.Both qualitative and quantitative experimental results demonstrate that SmoothGAN significantly improves the quality of image generation,while our system efficiently assists users in creating high-quality shadow puppet images,with a SUS scale score of 84.4.This study provides a valuable theoretical and practical reference for the digital creation of shadow puppet art.展开更多
The applications of machine learning(ML)in the medical domain are often hindered by the limited availability of high-quality data.To address this challenge,we explore the synthetic generation of echocardiography image...The applications of machine learning(ML)in the medical domain are often hindered by the limited availability of high-quality data.To address this challenge,we explore the synthetic generation of echocardiography images(echoCG)using state-of-the-art generative models.We conduct a comprehensive evaluation of three prominent methods:Cycle-consistent generative adversarial network(CycleGAN),Contrastive Unpaired Translation(CUT),and Stable Diffusion 1.5 with Low-Rank Adaptation(LoRA).Our research presents the data generation methodol-ogy,image samples,and evaluation strategy,followed by an extensive user study involving licensed cardiologists and surgeons who assess the perceived quality and medical soundness of the generated images.Our findings indicate that Stable Diffusion outperforms both CycleGAN and CUT in generating images that are nearly indistinguishable from real echoCG images,making it a promising tool for augmenting medical datasets.However,we also identify limitations in the synthetic images generated by CycleGAN and CUT,which are easily distinguishable as non-realistic by medical professionals.This study highlights the potential of diffusion models in medical imaging and their applicability in addressing data scarcity,while also outlining the areas for future improvement.展开更多
Near infrared-visible(NIR-VIS)face recognition is to match an NIR face image to a VIS image.The main challenges of NIR-VIS face recognition are the gap caused by cross-modality and the lack of sufficient paired NIR-VI...Near infrared-visible(NIR-VIS)face recognition is to match an NIR face image to a VIS image.The main challenges of NIR-VIS face recognition are the gap caused by cross-modality and the lack of sufficient paired NIR-VIS face images to train models.This paper focuses on the generation of paired NIR-VIS face images and proposes a dual variational generator based on ResNeSt(RS-DVG).RS-DVG can generate a large number of paired NIR-VIS face images from noise,and these generated NIR-VIS face images can be used as the training set together with the real NIR-VIS face images.In addition,a triplet loss function is introduced and a novel triplet selection method is proposed specifically for the training of the current face recognition model,which maximizes the inter-class distance and minimizes the intra-class distance in the input face images.The method proposed in this paper was evaluated on the datasets CASIA NIR-VIS 2.0 and BUAA-VisNir,and relatively good results were obtained.展开更多
Remote sensing images often need to be merged into a larger mosaic image to support analysis on large areas in many applications.However,the performance of the mosaic imagery may be severely restricted if there are ma...Remote sensing images often need to be merged into a larger mosaic image to support analysis on large areas in many applications.However,the performance of the mosaic imagery may be severely restricted if there are many areas with cloud coverage or if these images used for merging have a long-time span.Therefore,this paper proposes a method of image selection for full coverage image(i.e.a mosaic image with no cloud-contaminated pixels)generation.Specifically,a novel High-Frequency-Aware(HFA)-Net based on Swin-Transformer for region quality grading is presented to provide a data basis for image selection.Spatiotemporal constraints are presented to optimize the image selection.In the temporal dimension,the shortest-time-span constraint shortens the time span of the selected images,obviously improving the timeliness of the image selection results(i.e.with a shorter time span).In the spatial dimension,a spatial continuity constraint is proposed to select data with better quality and larger area,thus improving the radiometric continuity of the results.Experiments on the GF-1 images indicate that the proposed method reduces the averages by 76.1%and 38.7%in terms of the shortest time span compared to the Improved Coverage-oriented Retrieval algorithm(MICR)and Retrieval Method based on Grid Compensation(RMGC)methods,respectively.Moreover,the proposed method also reduces the residual cloud amount by an average of 91.2%,89.8%,and 83.4%when compared to the MICR,RMGC,and Pixel-based Time-series Synthesis Method(PTSM)methods,respectively.展开更多
基金supported by National Natural Science Foundation of China(62072250).
文摘With the development of computer graphics,realistic computer graphics(CG)have become more and more common in our field of vision.This rendered image is invisible to the naked eye.How to effectively identify CG and natural images(NI)has been become a new issue in the field of digital forensics.In recent years,a series of deep learning network frameworks have shown great advantages in the field of images,which provides a good choice for us to solve this problem.This paper aims to track the latest developments and applications of deep learning in the field of CG and NI forensics in a timely manner.Firstly,it introduces the background of deep learning and the knowledge of convolutional neural networks.The purpose is to understand the basic model structure of deep learning applications in the image field,and then outlines the mainstream framework;secondly,it briefly introduces the application of deep learning in CG and NI forensics,and finally points out the problems of deep learning in this field and the prospects for the future.
基金National Key Research and Development Program of China(2022YFC3502302)Graduate Research Innovation Program of Jiangsu Province(KYCX25_2269)。
文摘Objective To develop a facial image generation method based on a facial color-preserving generative adversarial network(FCP-GAN)that effectively decouples identity features from diagnostic facial complexion characteristics in traditional Chinese medicine(TCM)inspection,thereby addressing the critical challenge of privacy preservation in medical image analysis.Methods A facial image dataset was constructed from participants at Nanjing University of Chinese Medicine between April 23 and June 10,2023,using a TCM full-body inspection data acquisition equipment under controlled illumination.The proposed FCP-GAN model was designed to achieve the dual objectives of removing identity features and preserving colors through three key components:(i)a multi-space combination module that comprehensively extracts color attributes from red,green,blue(RGB),hue,saturation,value(HSV),and Lab spaces;(ii)a generator incorporating efficient channel attention(ECA)mechanism to enhance the representation of diagnostically critical color channels;and(iii)a dual-loss function that combines adversarial loss for de-identification with a dedicated color preservation loss.The model was trained and evaluated using a stratified 5-fold cross-validation strategy and evaluated against four baseline generative models:conditional GAN(CGAN),deep convolutional GAN(DCGAN),dual discriminator CGAN(DDCGAN),and medical GAN(MedGAN).Performance was assessed in terms of image quality[peak signal-to-noise ratio(PSNR)and structural similarity(SSIM)],distribution similarity[Fréchet inception distance(FID)],privacy protection(face recognition accuracy),and diagnostic consistency[mean squared error(MSE)and Pearson correlation coefficient(PCC)].Results The final analysis included facial images from 216 participants.Compared with baseline models,FCP-GAN achieved superior performance,with PSNR=31.02 dB and SSIM=0.908,representing an improvement of 1.21 dB and 0.034 in SSIM over the strongest baseline(MedGAN).The FID value(23.45)was also the lowest among all models,indicating superior distributional similarity to real images.The multi-space feature fusion and the ECA mechanism contributed significantly to these performance gains,as evidenced by ablation studies.The stratified 5-fold cross-validation confirmed the model’s robustness,with results reported as mean±standard deviation(SD)across all folds.The model effectively protected privacy by reducing face recognition accuracy from 95.2%(original images)to 60.1%(generated images).Critically,it maintained high diagnostic fidelity,as evidenced by a low MSE(<0.051)and a high PCC(>0.98)for key TCM facial features between original and generated images.Conclusion The FCP-GAN model provides an effective technical solution for ensuring privacy in TCM diagnostic imaging,successfully having removed identity features while preserving clinically vital facial color features.This study offers significant value for developing intelligent and secure TCM telemedicine systems.
基金financially supported by the National Science Fund for Distinguished Young Scholars,China(No.52025041)the National Natural Science Foundation of China(Nos.52450003,U2341267,and 52174294)+1 种基金the National Postdoctoral Program for Innovative Talents,China(No.BX20240437)the Fundamental Research Funds for the Central Universities,China(Nos.FRF-IDRY-23-037 and FRF-TP-20-02C2)。
文摘The rapid advancements in computer vision(CV)technology have transformed the traditional approaches to material microstructure analysis.This review outlines the history of CV and explores the applications of deep-learning(DL)-driven CV in four key areas of materials science:microstructure-based performance prediction,microstructure information generation,microstructure defect detection,and crystal structure-based property prediction.The CV has significantly reduced the cost of traditional experimental methods used in material performance prediction.Moreover,recent progress made in generating microstructure images and detecting microstructural defects using CV has led to increased efficiency and reliability in material performance assessments.The DL-driven CV models can accelerate the design of new materials with optimized performance by integrating predictions based on both crystal and microstructural data,thereby allowing for the discovery and innovation of next-generation materials.Finally,the review provides insights into the rapid interdisciplinary developments in the field of materials science and future prospects.
基金National Key R&D Program of China(No.2019YFC1521300)。
文摘During the image generation phase,the parserfree Flow-Style-VTON model(PF-Flow-Style-VTON),which utilizes distilled appearance flows,faces two main challenges:blurring,deformation,occlusion,or loss of the arm or palm regions in the generated image when these regions of the person occlude the garment;blurring and deformation in the generated image when the person performs large pose movements and the target garment is complex with detailed patterns.To solve these two problems,an improved virtual try-on network model,denoted as IPF-Flow-Style-VTON,is proposed.Firstly,a target warped garment mask refinement module(M-RM)is introduced to refine the warped garment mask and remove erroneous information in the arm and palm regions,thereby improving the quality of subsequent image generation.Secondly,an improved global attention module(GAM)is integrated into the original image generation network,enhancing the ResUNet’s understanding of global context and optimizing the fusion of local features and global information,thereby further improving image generation quality.Finally,the UniPose model is used to provide the pose keypoint information of the target person image,guiding the task execution during the image generation phase.Experiments conducted on the VITON dataset show that the proposed method outperforms the original method,Flow-Style-VTON,by 5.4%,0.3%,6.7%,and 2.2%in Frchet inception distance(FID),structural similarity index measure(SSIM),learned perceptual image patch similarity(LPIPS),and peak signal-to-noise ratio(PSNR),respectively.Overall,the proposed method effectively improves upon the shortcomings of the original network and achieves better visual results.
基金Supported by Applied Brand Course of Mianyang Teacher's College(Investigation and Monitoring of Natural Resources).
文摘With the rapid development of image-generative AI (artificial intelligence) technology, its application in undergraduate Landscape Architecture education has demonstrated significant potential. Based on this, the present study explores the implications of integrating image-generative AI into Landscape Architecture courses from three perspectives: stimulating students creative design potential, expanding approaches to form and concept generation, and enhancing the visualization of spatial scenes. Furthermore, it discusses application strategies from three dimensions: AI-assisted conceptual generation, human-machine collaboration for design refinement, and optimization of scheme presentation and evaluation. This paper aims to provide relevant educators with insights and references.
基金supported by the Technology Development Program(S3344882)funded by the Ministry of SMEs and Startups(MSS,Korea).
文摘In the context of high compression rates applied to Joint Photographic Experts Group(JPEG)images through lossy compression techniques,image-blocking artifacts may manifest.This necessitates the restoration of the image to its original quality.The challenge lies in regenerating significantly compressed images into a state in which these become identifiable.Therefore,this study focuses on the restoration of JPEG images subjected to substantial degradation caused by maximum lossy compression using Generative Adversarial Networks(GAN).The generator in this network is based on theU-Net architecture.It features a newhourglass structure that preserves the characteristics of the deep layers.In addition,the network incorporates two loss functions to generate natural and high-quality images:Low Frequency(LF)loss and High Frequency(HF)loss.HF loss uses a pretrained VGG-16 network and is configured using a specific layer that best represents features.This can enhance the performance in the high-frequency region.In contrast,LF loss is used to handle the low-frequency region.The two loss functions facilitate the generation of images by the generator,which can mislead the discriminator while accurately generating high-and low-frequency regions.Consequently,by removing the blocking effects frommaximum lossy compressed images,images inwhich identities could be recognized are generated.This study represents a significant improvement over previous research in terms of the image resolution performance.
文摘For traffic object detection in foggy environment based on convolutional neural network(CNN),data sets in fog-free environment are generally used to train the network directly.As a result,the network cannot learn the object characteristics in the foggy environment in the training set,and the detection effect is not good.To improve the traffic object detection in foggy environment,we propose a method of generating foggy images on fog-free images from the perspective of data set construction.First,taking the KITTI objection detection data set as an original fog-free image,we generate the depth image of the original image by using improved Monodepth unsupervised depth estimation method.Then,a geometric prior depth template is constructed to fuse the image entropy taken as weight with the depth image.After that,a foggy image is acquired from the depth image based on the atmospheric scattering model.Finally,we take two typical object-detection frameworks,that is,the two-stage object-detection Fster region-based convolutional neural network(Faster-RCNN)and the one-stage object-detection network YOLOv4,to train the original data set,the foggy data set and the mixed data set,respectively.According to the test results on RESIDE-RTTS data set in the outdoor natural foggy environment,the model under the training on the mixed data set shows the best effect.The mean average precision(mAP)values are increased by 5.6%and by 5.0%under the YOLOv4 model and the Faster-RCNN network,respectively.It is proved that the proposed method can effectively improve object identification ability foggy environment.
基金supported by the National Natural Science Foundation of China under Grant 62371409Fujian Provincial Natural Science Foundation of China under Grant 2023J01005.
文摘In pathological examinations,tissue must first be stained to meet specific diagnostic requirements,a meticulous process demanding significant time and expertise from specialists.With advancements in deep learning,this staining process can now be achieved through computational methods known as virtual staining.This technique replicates the visual effects of traditional histological staining in pathological imaging,enhancing efficiency and reducing costs.Extensive research in virtual staining for pathology has already demonstrated its effectiveness in generating clinically relevant stained images across a variety of diagnostic scenarios.Unlike previous reviews that broadly cover the clinical applications of virtual staining,this paper focuses on the technical methodologies,encompassing current models,datasets,and evaluation methods.It highlights the unique challenges of virtual staining compared to traditional image translation,discusses limitations in existing work,and explores future perspectives.Adopting a macro perspective,we avoid overly intricate technical details to make the content accessible to clinical experts.Additionally,we provide a brief introduction to the purpose of virtual staining from a medical standpoint,which may inspire algorithm-focused researchers.This paper aims to promote a deeper understanding of interdisciplinary knowledge between algorithm developers and clinicians,fostering the integration of technical solutions and medical expertise in the development of virtual staining models.This collaboration seeks to create more efficient,generalized,and versatile virtual staining models for a wide range of clinical applications.
基金supported by the National Natural Science Foundation of China(61772179)Hunan Provincial Natural Science Foundation of China(2022JJ50016,2023JJ50095)+1 种基金the Science and Technology Plan Project of Hunan Province(2016TP1020)Double First-Class University Project of Hunan Province(Xiangjiaotong[2018]469,[2020]248).
文摘The objective of image-based virtual try-on is to seamlessly integrate clothing onto a target image, generating a realistic representation of the character in the specified attire. However, existing virtual try-on methods frequently encounter challenges, including misalignment between the body and clothing, noticeable artifacts, and the loss of intricate garment details. To overcome these challenges, we introduce a two-stage high-resolution virtual try-on framework that integrates an attention mechanism, comprising a garment warping stage and an image generation stage. During the garment warping stage, we incorporate a channel attention mechanism to effectively retain the critical features of the garment, addressing challenges such as the loss of patterns, colors, and other essential details commonly observed in virtual try-on images produced by existing methods. During the image generation stage, with the aim of maximizing the utilization of the information proffered by the input image, the input features undergo double sampling within the normalization procedure, thereby enhancing the detail fidelity and clothing alignment efficacy of the output image. Experimental evaluations conducted on high-resolution datasets validate the effectiveness of the proposed method. Results demonstrate significant improvements in preserving garment details, reducing artifacts, and achieving superior alignment between the clothing and body compared to baseline methods, establishing its advantage in generating realistic and high-quality virtual try-on images.
基金supported by the Interdisciplinary project of Dalian University DLUXK-2023-ZD-001.
文摘As a form of discrete representation learning,Vector Quantized Variational Autoencoders(VQ-VAE)have increasingly been applied to generative and multimodal tasks due to their ease of embedding and representative capacity.However,existing VQ-VAEs often perform quantization in the spatial domain,ignoring global structural information and potentially suffering from codebook collapse and information coupling issues.This paper proposes a frequency quantized variational autoencoder(FQ-VAE)to address these issues.The proposed method transforms image features into linear combinations in the frequency domain using a 2D fast Fourier transform(2D-FFT)and performs adaptive quantization on these frequency components to preserve image’s global relationships.The codebook is dynamically optimized to avoid collapse and information coupling issue by considering the usage frequency and dependency of code vectors.Furthermore,we introduce a post-processing module based on graph convolutional networks to further improve reconstruction quality.Experimental results on four public datasets demonstrate that the proposed method outperforms state-of-the-art approaches in terms of Structural Similarity Index(SSIM),Learned Perceptual Image Patch Similarity(LPIPS),and Reconstruction Fréchet Inception Distance(rFID).In the experiments on the CIFAR-10 dataset,compared to the baselinemethod VQ-VAE,the proposedmethod improves the abovemetrics by 4.9%,36.4%,and 52.8%,respectively.
基金supported by the Natural Science Foundation of China,Grant No.62103052.
文摘Drone swarm systems,equipped with photoelectric imaging and intelligent target perception,are essential for reconnaissance and strike missions in complex and high-risk environments.They excel in information sharing,anti-jamming capabilities,and combat performance,making them critical for future warfare.However,varied perspectives in collaborative combat scenarios pose challenges to object detection,hindering traditional detection algorithms and reducing accuracy.Limited angle-prior data and sparse samples further complicate detection.This paper presents the Multi-View Collaborative Detection System,which tackles the challenges of multi-view object detection in collaborative combat scenarios.The system is designed to enhance multi-view image generation and detection algorithms,thereby improving the accuracy and efficiency of object detection across varying perspectives.First,an observation model for three-dimensional targets through line-of-sight angle transformation is constructed,and a multi-view image generation algorithm based on the Pix2Pix network is designed.For object detection,YOLOX is utilized,and a deep feature extraction network,BA-RepCSPDarknet,is developed to address challenges related to small target scale and feature extraction challenges.Additionally,a feature fusion network NS-PAFPN is developed to mitigate the issue of deep feature map information loss in UAV images.A visual attention module(BAM)is employed to manage appearance differences under varying angles,while a feature mapping module(DFM)prevents fine-grained feature loss.These advancements lead to the development of BA-YOLOX,a multi-view object detection network model suitable for drone platforms,enhancing accuracy and effectively targeting small objects.
基金funded by theDeanship of Research andGraduate Studies at King Khalid University through Large Research Project under grant number RGP2/417/46.
文摘A phase-aware cross-modal framework is presented that synthesizes UWF_FA from non-invasive UWF_RI for diabetic retinopathy(DR)stratification.A curated cohort of 1198 patients(2915 UWF_RI and 17,854 UWF_FA images)with strict registration quality supports training across three angiographic phases(initial,mid,final).The generator is based on a modified pix2pixHD with an added Gradient Variance Loss to better preserve microvasculature,and is evaluated using MAE,PSNR,SSIM,and MS-SSIM on held-out pairs.Quantitatively,the mid phase achieves the lowestMAE(98.76±42.67),while SSIM remains high across phases.Expert reviewshows substantial agreement(Cohen's κ=0.78–0.82)and Turing-stylemisclassification of 50%–70%of synthetic images as real,indicating strong perceptual realism.For downstream DR stratification,fusing multi-phase synthetic UWF_FA with UWF_RI in a Swin Transformer classifier yields significant gains over a UWF_RI-only baseline,with the full-phase setting(Set D)reaching AUC=0.910 and accuracy=0.829.These results support synthetic UWF_FA as a scalable,non-invasive complement to dye-based angiography that enhances screening accuracy while avoiding injection-related risks.
基金funded by the National Natural Science Foundation of China(Grant/Award Numbers 62075135 and 61975126)the Science and Technology Innovation Commission of Shenzhen(Grant/Award Numbers JCYJ20190808174819083 and JCYJ20190808175201640)Shenzhen Science and Technology Planning Project(ZDSYS 20210623092006020).
文摘Measurement of bloodflow velocity is key to understanding physiology and pathology in vivo.While most measurements are performed at the middle of the blood vessel,little research has been done on characterizing the instantaneous bloodflow velocity distribution.This is mainly due to the lack of measurement technology with high spatial and temporal resolution.Here,we tackle this problem with our recently developed dual-wavelength line-scan third-harmonic generation(THG)imaging technology.Simultaneous acquisition of dual-wavelength THG line-scanning signals enables measurement of bloodflow velocities at two radially symmetric positions in both venules and arterioles in mouse brain in vivo.Our results clearly show that the instantaneous bloodflow velocity is not symmetric under general conditions.
基金supported in part by the Science and Technology Development Fund,Macao S.A.R(FDCT)0028/2023/RIA1,in part by Leading Talents in Gusu Innovation and Entrepreneurship Grant ZXL2023170in part by the TCL Science and Technology Innovation Fund under Grant D5140240118in part by the Guangdong Basic and Applied Basic Research Foundation under Grant 2021A1515110079.
文摘Generative adversarial networks(GANs)with gaming abilities have been widely applied in image generation.However,gamistic generators and discriminators may reduce the robustness of the obtained GANs in image generation under varying scenes.Enhancing the relation of hierarchical information in a generation network and enlarging differences of different network architectures can facilitate more structural information to improve the generation effect for image generation.In this paper,we propose an enhanced GAN via improving a generator for image generation(EIGGAN).EIGGAN applies a spatial attention to a generator to extract salient information to enhance the truthfulness of the generated images.Taking into relation the context account,parallel residual operations are fused into a generation network to extract more structural information from the different layers.Finally,a mixed loss function in a GAN is exploited to make a tradeoff between speed and accuracy to generate more realistic images.Experimental results show that the proposed method is superior to popular methods,i.e.,Wasserstein GAN with gradient penalty(WGAN-GP)in terms of many indexes,i.e.,Frechet Inception Distance,Learned Perceptual Image Patch Similarity,Multi-Scale Structural Similarity Index Measure,Kernel Inception Distance,Number of Statistically-Different Bins,Inception Score and some visual images for image generation.
文摘In this paper an evaluation of the influence of luminance L* at the L*a*b* color space during color segmentation is presented. A comparative study is made between the behavior of segmentation in color images using only the Euclidean metric of a* and b* and an adaptive color similarity function defined as a product of Gaussian functions in a modified HSI color space. For the evaluation synthetic images were particularly designed to accurately assess the performance of the color segmentation. The testing system can be used either to explore the behavior of a similarity function (or metric) in different color spaces or to explore different metrics (or similarity functions) in the same color space. From the results is obtained that the color parameters a* and b* are not independent of the luminance parameter L* as one might initially assume.
文摘A new nonlinear optical third\|harmonic imaging technology in reflected fashion in bio\|tissues by using cascading effect, a process whereby the second\|order effects combine to contribute to a third\|order nonlinear process, has been analyzed. The performance of the reflected optical third harmonic imaging enhanced by cascading effect in bio\|tissues is analyzed with the semi\|classical theory. The microscopic understanding of the enhancement of cascaded optical third\|harmonic imaging in reflected manner in bio\|tissues has been discussed.Some i deas for further enhancement is given.
基金supported by the Scientific Research Foundation of Hangzhou City University under Grant No.X-202203the Zhejiang Provincial Natural Science Foundation of China under Grant No.LTGY24F030002.
文摘Chinese shadow puppetry has been recognized as a world intangible cultural heritage.However,it faces substantial challenges in its preservation and advancement due to the intricate and labor-intensive nature of crafting shadow puppets.To ensure the inheritance and development of this cultural heritage,it is imperative to enable traditional art to flourish in the digital era.This paper presents an Interactive Collaborative Creation System for shadow puppets,designed to facilitate the creation of high-quality shadow puppet images with greater ease.The system comprises four key functions:Image contour extraction,intelligent reference recommendation,generation network,and color adjustment,all aimed at assisting users in various aspects of the creative process,including drawing,inspiration,and content generation.Additionally,we propose an enhanced algorithm called Smooth Generative Adversarial Networks(SmoothGAN),which exhibits more stable gradient training and a greater capacity for generating high-resolution shadow puppet images.Furthermore,we have built a new dataset comprising high-quality shadow puppet images to train the shadow puppet generation model.Both qualitative and quantitative experimental results demonstrate that SmoothGAN significantly improves the quality of image generation,while our system efficiently assists users in creating high-quality shadow puppet images,with a SUS scale score of 84.4.This study provides a valuable theoretical and practical reference for the digital creation of shadow puppet art.
基金funded by the Science Committee of the Ministry of Science and Higher Education of the Republic of Kazakhstan(Grant No.AP13068032-Development of Methods and Algorithms for Machine Learning for Predicting Pathologies of the Cardiovascular System Based on Echocardiography and Electrocardiography).
文摘The applications of machine learning(ML)in the medical domain are often hindered by the limited availability of high-quality data.To address this challenge,we explore the synthetic generation of echocardiography images(echoCG)using state-of-the-art generative models.We conduct a comprehensive evaluation of three prominent methods:Cycle-consistent generative adversarial network(CycleGAN),Contrastive Unpaired Translation(CUT),and Stable Diffusion 1.5 with Low-Rank Adaptation(LoRA).Our research presents the data generation methodol-ogy,image samples,and evaluation strategy,followed by an extensive user study involving licensed cardiologists and surgeons who assess the perceived quality and medical soundness of the generated images.Our findings indicate that Stable Diffusion outperforms both CycleGAN and CUT in generating images that are nearly indistinguishable from real echoCG images,making it a promising tool for augmenting medical datasets.However,we also identify limitations in the synthetic images generated by CycleGAN and CUT,which are easily distinguishable as non-realistic by medical professionals.This study highlights the potential of diffusion models in medical imaging and their applicability in addressing data scarcity,while also outlining the areas for future improvement.
基金National Natural Science Foundation of China(No.62006039)National Key Research and Development Program of China(No.2019YFE0190500)。
文摘Near infrared-visible(NIR-VIS)face recognition is to match an NIR face image to a VIS image.The main challenges of NIR-VIS face recognition are the gap caused by cross-modality and the lack of sufficient paired NIR-VIS face images to train models.This paper focuses on the generation of paired NIR-VIS face images and proposes a dual variational generator based on ResNeSt(RS-DVG).RS-DVG can generate a large number of paired NIR-VIS face images from noise,and these generated NIR-VIS face images can be used as the training set together with the real NIR-VIS face images.In addition,a triplet loss function is introduced and a novel triplet selection method is proposed specifically for the training of the current face recognition model,which maximizes the inter-class distance and minimizes the intra-class distance in the input face images.The method proposed in this paper was evaluated on the datasets CASIA NIR-VIS 2.0 and BUAA-VisNir,and relatively good results were obtained.
基金supported by the National Natural Science Foundation of China[grant numbers 41971422 and 42090010]the Fundamental Research Funds for the Central Universities,China[grant number 2042022dx0001].
文摘Remote sensing images often need to be merged into a larger mosaic image to support analysis on large areas in many applications.However,the performance of the mosaic imagery may be severely restricted if there are many areas with cloud coverage or if these images used for merging have a long-time span.Therefore,this paper proposes a method of image selection for full coverage image(i.e.a mosaic image with no cloud-contaminated pixels)generation.Specifically,a novel High-Frequency-Aware(HFA)-Net based on Swin-Transformer for region quality grading is presented to provide a data basis for image selection.Spatiotemporal constraints are presented to optimize the image selection.In the temporal dimension,the shortest-time-span constraint shortens the time span of the selected images,obviously improving the timeliness of the image selection results(i.e.with a shorter time span).In the spatial dimension,a spatial continuity constraint is proposed to select data with better quality and larger area,thus improving the radiometric continuity of the results.Experiments on the GF-1 images indicate that the proposed method reduces the averages by 76.1%and 38.7%in terms of the shortest time span compared to the Improved Coverage-oriented Retrieval algorithm(MICR)and Retrieval Method based on Grid Compensation(RMGC)methods,respectively.Moreover,the proposed method also reduces the residual cloud amount by an average of 91.2%,89.8%,and 83.4%when compared to the MICR,RMGC,and Pixel-based Time-series Synthesis Method(PTSM)methods,respectively.