Objective To develop a facial image generation method based on a facial color-preserving generative adversarial network(FCP-GAN)that effectively decouples identity features from diagnostic facial complexion characteri...Objective To develop a facial image generation method based on a facial color-preserving generative adversarial network(FCP-GAN)that effectively decouples identity features from diagnostic facial complexion characteristics in traditional Chinese medicine(TCM)inspection,thereby addressing the critical challenge of privacy preservation in medical image analysis.Methods A facial image dataset was constructed from participants at Nanjing University of Chinese Medicine between April 23 and June 10,2023,using a TCM full-body inspection data acquisition equipment under controlled illumination.The proposed FCP-GAN model was designed to achieve the dual objectives of removing identity features and preserving colors through three key components:(i)a multi-space combination module that comprehensively extracts color attributes from red,green,blue(RGB),hue,saturation,value(HSV),and Lab spaces;(ii)a generator incorporating efficient channel attention(ECA)mechanism to enhance the representation of diagnostically critical color channels;and(iii)a dual-loss function that combines adversarial loss for de-identification with a dedicated color preservation loss.The model was trained and evaluated using a stratified 5-fold cross-validation strategy and evaluated against four baseline generative models:conditional GAN(CGAN),deep convolutional GAN(DCGAN),dual discriminator CGAN(DDCGAN),and medical GAN(MedGAN).Performance was assessed in terms of image quality[peak signal-to-noise ratio(PSNR)and structural similarity(SSIM)],distribution similarity[Fréchet inception distance(FID)],privacy protection(face recognition accuracy),and diagnostic consistency[mean squared error(MSE)and Pearson correlation coefficient(PCC)].Results The final analysis included facial images from 216 participants.Compared with baseline models,FCP-GAN achieved superior performance,with PSNR=31.02 dB and SSIM=0.908,representing an improvement of 1.21 dB and 0.034 in SSIM over the strongest baseline(MedGAN).The FID value(23.45)was also the lowest among all models,indicating superior distributional similarity to real images.The multi-space feature fusion and the ECA mechanism contributed significantly to these performance gains,as evidenced by ablation studies.The stratified 5-fold cross-validation confirmed the model’s robustness,with results reported as mean±standard deviation(SD)across all folds.The model effectively protected privacy by reducing face recognition accuracy from 95.2%(original images)to 60.1%(generated images).Critically,it maintained high diagnostic fidelity,as evidenced by a low MSE(<0.051)and a high PCC(>0.98)for key TCM facial features between original and generated images.Conclusion The FCP-GAN model provides an effective technical solution for ensuring privacy in TCM diagnostic imaging,successfully having removed identity features while preserving clinically vital facial color features.This study offers significant value for developing intelligent and secure TCM telemedicine systems.展开更多
Facial expression generation from pure textual descriptions is widely applied in human-computer interaction,computer-aided design,assisted education,etc.However,this task is challenging due to the intricate facial str...Facial expression generation from pure textual descriptions is widely applied in human-computer interaction,computer-aided design,assisted education,etc.However,this task is challenging due to the intricate facial structure and the complex mapping between texts and images.Existing methods face limitations in generating high-resolution images or capturing diverse facial expressions.In this study,we propose a novel generation approach,named FaceCLIP,to tackle these problems.The proposed method utilizes a CLIP-based multi-stage generative adversarial model to produce vivid facial expressions with high resolutions.With strong semantic priors from multi-modal textual and visual cues,the proposed method effectively disentangles facial attributes,enabling attribute editing and semantic reasoning.To facilitate text-toexpression generation,we build a new dataset called the FET dataset,which contains facial expression images and corresponding textual descriptions.Experiments on the dataset demonstrate improved image quality and semantic consistency compared with state-of-the-art methods.展开更多
A facial expression emotion recognition based human-robot interaction(FEER-HRI) system is proposed, for which a four-layer system framework is designed. The FEERHRI system enables the robots not only to recognize huma...A facial expression emotion recognition based human-robot interaction(FEER-HRI) system is proposed, for which a four-layer system framework is designed. The FEERHRI system enables the robots not only to recognize human emotions, but also to generate facial expression for adapting to human emotions. A facial emotion recognition method based on2D-Gabor, uniform local binary pattern(LBP) operator, and multiclass extreme learning machine(ELM) classifier is presented,which is applied to real-time facial expression recognition for robots. Facial expressions of robots are represented by simple cartoon symbols and displayed by a LED screen equipped in the robots, which can be easily understood by human. Four scenarios,i.e., guiding, entertainment, home service and scene simulation are performed in the human-robot interaction experiment, in which smooth communication is realized by facial expression recognition of humans and facial expression generation of robots within 2 seconds. As a few prospective applications, the FEERHRI system can be applied in home service, smart home, safe driving, and so on.展开更多
To generate realistic three-dimensional animation of virtual character,capturing real facial expression is the primary task.Due to diverse facial expressions and complex background,facial landmarks recognized by exist...To generate realistic three-dimensional animation of virtual character,capturing real facial expression is the primary task.Due to diverse facial expressions and complex background,facial landmarks recognized by existing strategies have the problem of deviations and low accuracy.Therefore,a method for facial expression capture based on two-stage neural network is proposed in this paper which takes advantage of improved multi-task cascaded convolutional networks(MTCNN)and high-resolution network.Firstly,the convolution operation of traditional MTCNN is improved.The face information in the input image is quickly filtered by feature fusion in the first stage and Octave Convolution instead of the original ones is introduced into in the second stage to enhance the feature extraction ability of the network,which further rejects a large number of false candidates.The model outputs more accurate facial candidate windows for better landmarks recognition and locates the faces.Then the images cropped after face detection are input into high-resolution network.Multi-scale feature fusion is realized by parallel connection of multi-resolution streams,and rich high-resolution heatmaps of facial landmarks are obtained.Finally,the changes of facial landmarks recognized are tracked in real-time.The expression parameters are extracted and transmitted to Unity3D engine to drive the virtual character’s face,which can realize facial expression synchronous animation.Extensive experimental results obtained on the WFLW database demonstrate the superiority of the proposed method in terms of accuracy and robustness,especially for diverse expressions and complex background.The method can accurately capture facial expression and generate three-dimensional animation effects,making online entertainment and social interaction more immersive in shared virtual space.展开更多
基金National Key Research and Development Program of China(2022YFC3502302)Graduate Research Innovation Program of Jiangsu Province(KYCX25_2269)。
文摘Objective To develop a facial image generation method based on a facial color-preserving generative adversarial network(FCP-GAN)that effectively decouples identity features from diagnostic facial complexion characteristics in traditional Chinese medicine(TCM)inspection,thereby addressing the critical challenge of privacy preservation in medical image analysis.Methods A facial image dataset was constructed from participants at Nanjing University of Chinese Medicine between April 23 and June 10,2023,using a TCM full-body inspection data acquisition equipment under controlled illumination.The proposed FCP-GAN model was designed to achieve the dual objectives of removing identity features and preserving colors through three key components:(i)a multi-space combination module that comprehensively extracts color attributes from red,green,blue(RGB),hue,saturation,value(HSV),and Lab spaces;(ii)a generator incorporating efficient channel attention(ECA)mechanism to enhance the representation of diagnostically critical color channels;and(iii)a dual-loss function that combines adversarial loss for de-identification with a dedicated color preservation loss.The model was trained and evaluated using a stratified 5-fold cross-validation strategy and evaluated against four baseline generative models:conditional GAN(CGAN),deep convolutional GAN(DCGAN),dual discriminator CGAN(DDCGAN),and medical GAN(MedGAN).Performance was assessed in terms of image quality[peak signal-to-noise ratio(PSNR)and structural similarity(SSIM)],distribution similarity[Fréchet inception distance(FID)],privacy protection(face recognition accuracy),and diagnostic consistency[mean squared error(MSE)and Pearson correlation coefficient(PCC)].Results The final analysis included facial images from 216 participants.Compared with baseline models,FCP-GAN achieved superior performance,with PSNR=31.02 dB and SSIM=0.908,representing an improvement of 1.21 dB and 0.034 in SSIM over the strongest baseline(MedGAN).The FID value(23.45)was also the lowest among all models,indicating superior distributional similarity to real images.The multi-space feature fusion and the ECA mechanism contributed significantly to these performance gains,as evidenced by ablation studies.The stratified 5-fold cross-validation confirmed the model’s robustness,with results reported as mean±standard deviation(SD)across all folds.The model effectively protected privacy by reducing face recognition accuracy from 95.2%(original images)to 60.1%(generated images).Critically,it maintained high diagnostic fidelity,as evidenced by a low MSE(<0.051)and a high PCC(>0.98)for key TCM facial features between original and generated images.Conclusion The FCP-GAN model provides an effective technical solution for ensuring privacy in TCM diagnostic imaging,successfully having removed identity features while preserving clinically vital facial color features.This study offers significant value for developing intelligent and secure TCM telemedicine systems.
基金supported by the Natural Science Foundation of Shandong Province of China under Grant No.ZR2023MF041the National Natural Science Foundation of China under Grant No.62072469+1 种基金Shandong Data Open Innovative Application Laboratory,the Spanish Ministry of Economy and Competitiveness(MINECO)the European Regional Development Fund(ERDF)under Project No.PID2020-120611RBI00/AEI/10.13039/501100011033.
文摘Facial expression generation from pure textual descriptions is widely applied in human-computer interaction,computer-aided design,assisted education,etc.However,this task is challenging due to the intricate facial structure and the complex mapping between texts and images.Existing methods face limitations in generating high-resolution images or capturing diverse facial expressions.In this study,we propose a novel generation approach,named FaceCLIP,to tackle these problems.The proposed method utilizes a CLIP-based multi-stage generative adversarial model to produce vivid facial expressions with high resolutions.With strong semantic priors from multi-modal textual and visual cues,the proposed method effectively disentangles facial attributes,enabling attribute editing and semantic reasoning.To facilitate text-toexpression generation,we build a new dataset called the FET dataset,which contains facial expression images and corresponding textual descriptions.Experiments on the dataset demonstrate improved image quality and semantic consistency compared with state-of-the-art methods.
基金supported by the National Natural Science Foundation of China(61403422,61273102)the Hubei Provincial Natural Science Foundation of China(2015CFA010)+1 种基金the Ⅲ Project(B17040)the Fundamental Research Funds for National University,China University of Geosciences(Wuhan)
文摘A facial expression emotion recognition based human-robot interaction(FEER-HRI) system is proposed, for which a four-layer system framework is designed. The FEERHRI system enables the robots not only to recognize human emotions, but also to generate facial expression for adapting to human emotions. A facial emotion recognition method based on2D-Gabor, uniform local binary pattern(LBP) operator, and multiclass extreme learning machine(ELM) classifier is presented,which is applied to real-time facial expression recognition for robots. Facial expressions of robots are represented by simple cartoon symbols and displayed by a LED screen equipped in the robots, which can be easily understood by human. Four scenarios,i.e., guiding, entertainment, home service and scene simulation are performed in the human-robot interaction experiment, in which smooth communication is realized by facial expression recognition of humans and facial expression generation of robots within 2 seconds. As a few prospective applications, the FEERHRI system can be applied in home service, smart home, safe driving, and so on.
基金This research was funded by College Student Innovation and Entrepreneurship Training Program,grant number 2021055Z and S202110082031the Special Project for Cultivating Scientific and Technological Innovation Ability of College and Middle School Students in Hebei Province,Grant Number 2021H011404.
文摘To generate realistic three-dimensional animation of virtual character,capturing real facial expression is the primary task.Due to diverse facial expressions and complex background,facial landmarks recognized by existing strategies have the problem of deviations and low accuracy.Therefore,a method for facial expression capture based on two-stage neural network is proposed in this paper which takes advantage of improved multi-task cascaded convolutional networks(MTCNN)and high-resolution network.Firstly,the convolution operation of traditional MTCNN is improved.The face information in the input image is quickly filtered by feature fusion in the first stage and Octave Convolution instead of the original ones is introduced into in the second stage to enhance the feature extraction ability of the network,which further rejects a large number of false candidates.The model outputs more accurate facial candidate windows for better landmarks recognition and locates the faces.Then the images cropped after face detection are input into high-resolution network.Multi-scale feature fusion is realized by parallel connection of multi-resolution streams,and rich high-resolution heatmaps of facial landmarks are obtained.Finally,the changes of facial landmarks recognized are tracked in real-time.The expression parameters are extracted and transmitted to Unity3D engine to drive the virtual character’s face,which can realize facial expression synchronous animation.Extensive experimental results obtained on the WFLW database demonstrate the superiority of the proposed method in terms of accuracy and robustness,especially for diverse expressions and complex background.The method can accurately capture facial expression and generate three-dimensional animation effects,making online entertainment and social interaction more immersive in shared virtual space.