期刊文献+
共找到8篇文章
< 1 >
每页显示 20 50 100
Super-Resolution Generative Adversarial Network with Pyramid Attention Module for Face Generation
1
作者 Parvathaneni Naga Srinivasu G.JayaLakshmi +4 位作者 Sujatha Canavoy Narahari Victor Hugo C.de Albuquerque Muhammad Attique Khan Hee-Chan Cho Byoungchol Chang 《Computers, Materials & Continua》 2025年第10期2117-2139,共23页
The generation of high-quality,realistic face generation has emerged as a key field of research in computer vision.This paper proposes a robust approach that combines a Super-Resolution Generative Adversarial Network(... The generation of high-quality,realistic face generation has emerged as a key field of research in computer vision.This paper proposes a robust approach that combines a Super-Resolution Generative Adversarial Network(SRGAN)with a Pyramid Attention Module(PAM)to enhance the quality of deep face generation.The SRGAN framework is designed to improve the resolution of generated images,addressing common challenges such as blurriness and a lack of intricate details.The Pyramid Attention Module further complements the process by focusing on multi-scale feature extraction,enabling the network to capture finer details and complex facial features more effectively.The proposed method was trained and evaluated over 100 epochs on the CelebA dataset,demonstrating consistent improvements in image quality and a marked decrease in generator and discriminator losses,reflecting the model’s capacity to learn and synthesize high-quality images effectively,given adequate computational resources.Experimental outcome demonstrates that the SRGAN model with PAM module has outperformed,yielding an aggregate discriminator loss of 0.055 for real,0.043 for fake,and a generator loss of 10.58 after training for 100 epochs.The model has yielded an structural similarity index measure of 0.923,that has outperformed the other models that are considered in the current study for analysis. 展开更多
关键词 Artificial intelligence generative adversarial network pyramid attention module face generation deep learning
在线阅读 下载PDF
Dual Variational Generation Based ResNeSt for Near Infrared-Visible Face Recognition
2
作者 DING Xiangwu LIU Chao QIN Yanxia 《Journal of Donghua University(English Edition)》 CAS 2022年第2期156-162,共7页
Near infrared-visible(NIR-VIS)face recognition is to match an NIR face image to a VIS image.The main challenges of NIR-VIS face recognition are the gap caused by cross-modality and the lack of sufficient paired NIR-VI... Near infrared-visible(NIR-VIS)face recognition is to match an NIR face image to a VIS image.The main challenges of NIR-VIS face recognition are the gap caused by cross-modality and the lack of sufficient paired NIR-VIS face images to train models.This paper focuses on the generation of paired NIR-VIS face images and proposes a dual variational generator based on ResNeSt(RS-DVG).RS-DVG can generate a large number of paired NIR-VIS face images from noise,and these generated NIR-VIS face images can be used as the training set together with the real NIR-VIS face images.In addition,a triplet loss function is introduced and a novel triplet selection method is proposed specifically for the training of the current face recognition model,which maximizes the inter-class distance and minimizes the intra-class distance in the input face images.The method proposed in this paper was evaluated on the datasets CASIA NIR-VIS 2.0 and BUAA-VisNir,and relatively good results were obtained. 展开更多
关键词 near infrared-visible face recognition face image generation ResNeSt triplet loss function attention mechanism
在线阅读 下载PDF
The Real⁃time and High⁃resolution Interactive Digital Human
3
作者 Haiquan Fang Dian Yu 《Journal of Harbin Institute of Technology(New Series)》 2025年第5期41-51,共11页
Synthesizing a real⁃time,high⁃resolution,and lip⁃sync digital human is a challenging task.Although the Wav2Lip model represents a remarkable advancement in real⁃time lip⁃sync,its clarity is still limited.To address th... Synthesizing a real⁃time,high⁃resolution,and lip⁃sync digital human is a challenging task.Although the Wav2Lip model represents a remarkable advancement in real⁃time lip⁃sync,its clarity is still limited.To address this,we enhanced the Wav2Lip model in this study and trained it on a high⁃resolution video dataset produced in our laboratory.Experimental results indicate that the improved Wav2Lip model produces digital humans with greater clarity than the original model,while maintaining its real⁃time performance and accurate lip⁃sync.We implemented the improved Wav2Lip model in a government interface application,generating a government digital human.Testing revealed that this government digital human can interact seamlessly with users in real⁃time,delivering clear visuals and synthesized speech that closely resembles a human voice. 展开更多
关键词 digital human lip⁃sync high⁃resolution video generation talking face generation
在线阅读 下载PDF
Effect of Richardson number on entropy generation over backward facing step
4
作者 陈胜 《Applied Mathematics and Mechanics(English Edition)》 SCIE EI 2012年第11期1431-1440,共10页
The flow over a backward facing step (BFS) has been taken as a useful proto- type to investigate intrinsic mechanisms of separated flow with heat transfer. However, to date, the open literature on the effect of Rich... The flow over a backward facing step (BFS) has been taken as a useful proto- type to investigate intrinsic mechanisms of separated flow with heat transfer. However, to date, the open literature on the effect of Richardson number on entropy generation over the BFS is absent yet, although the flow pattern and heat transfer characteristic both will receive significant influence caused by the variation of Richardson number in many prac- tical applications, such as in microelectromechanical systems and aerocrafts. The effect of Richardson number on entropy generation in the BFS flow is reported in this paper for the first time. The entropy generation analysis is conducted through numerically solving the entropy generation equation. The velocity and temperature, which are the inputs of the entropy generation equation, are evaluated by the lattice Boltzmann method. It is found that the distributions of local entropy generation number and Bejan number are significantly influenced by the variation of Richardson number. The total entropy gen- eration number is a monotonic decreasing function of Richardson number, whereas the average Bejan number is a monotonic increasing function of Richardson number. 展开更多
关键词 entropy generation backward facing step Richardson number latticeBoltzmann method
在线阅读 下载PDF
DialogueNeRF:towards realistic avatar face-to-face conversation video generation 被引量:1
5
作者 Yichao Yan Zanwei Zhou +2 位作者 Zi Wang Jingnan Gao Xiaokang Yang 《Visual Intelligence》 2024年第1期282-296,共15页
Conversation is an essential component of virtual avatar activities in the metaverse.With the development of natural language processing,significant breakthroughs have been made in text and voice conversation generati... Conversation is an essential component of virtual avatar activities in the metaverse.With the development of natural language processing,significant breakthroughs have been made in text and voice conversation generation.However,face-to-face conversations account for the vast majority of daily conversations,while most existing methods focused on single-person talking head generation.In this work,we take a step further and consider generating realistic face-to-face conversation videos.Conversation generation is more challenging than single-person talking head generation,because it requires not only the generation of photo-realistic individual talking heads,but also the listener’s response to the speaker.In this paper,we propose a novel unified framework based on the neural radiance field(NeRF)to address these challenges.Specifically,we model both the speaker and the listener with a NeRF framework under different conditions to control individual expressions.The speaker is driven by the audio signal,while the response of the listener depends on both visual and acoustic information.In this way,face-to-face conversation videos are generated between human avatars,with all the interlocutors modeled within the same network.Moreover,to facilitate future research on this task,we also collected a new human conversation dataset containing 34 video clips.Quantitative and qualitative experiments evaluate our method in different aspects,e.g.,image quality,pose sequence trend,and natural rendering of the scene in the generated videos.Experimental results demonstrate that the avatars in the resulting videos are able to carry on a realistic conversation,and maintain individual styles. 展开更多
关键词 Talking face generation Neural radiance field face reenactment Conversation generation
在线阅读 下载PDF
Mask guided diverse face image synthesis
6
作者 Song SUN Bo ZHAO +2 位作者 Muhammad MATEEN Xin CHEN Junhao WEN 《Frontiers of Computer Science》 SCIE EI CSCD 2022年第3期67-75,共9页
Recent studies have shown remarkable success in face image generation task.However,existing approaches have limited diversity,quality and controllability in generating results.To address these issues,we propose a nove... Recent studies have shown remarkable success in face image generation task.However,existing approaches have limited diversity,quality and controllability in generating results.To address these issues,we propose a novel end-to-end learning framework to generate diverse,realistic and controllable face images guided by face masks.The face mask provides a good geometric constraint for a face by specifying the size and location of different components of the face,such as eyes,nose and mouse.The framework consists of four components:style encoder,style decoder,generator and discriminator.The style encoder generates a style code which represents the style of the result face;the generator translate the input face mask into a real face based on the style code;the style decoder learns to reconstruct the style code from the generated face image;and the discriminator classifies an input face image as real or fake.With the style code,the proposed model can generate different face images matching the input face mask,and by manipulating the face mask,we can finely control the generated face image.We empirically demonstrate the effectiveness of our approach on mask guided face image synthesis task. 展开更多
关键词 face image generation image translation generative adversarial networks
原文传递
MagicTalk: Implicit and explicit correlation learning for diffusion-based emotional talking face generation
7
作者 Chenxu Zhang Chao Wang +7 位作者 Jianfeng Zhang Hongyi Xu Guoxian Song You Xie Linjie Luo Yapeng Tian Jiashi Feng Xiaohu Guo 《Computational Visual Media》 2025年第4期763-779,共17页
Generating emotional talking faces from a single portrait image remains a significant challenge. The simultaneous achievement of expressive emotional talking and accurate lip-sync is particularly difficult, as express... Generating emotional talking faces from a single portrait image remains a significant challenge. The simultaneous achievement of expressive emotional talking and accurate lip-sync is particularly difficult, as expressiveness is often compromised for lip-sync accuracy. Prevailing generative works usually struggle to juggle to generate subtle variations of emotional expression and lip-synchronized talking. To address these challenges, we suggest modeling the implicit and explicit correlations between audio and emotional talking faces with a unified framework. As human emotional expressions usually present subtle and implicit relations with speech audio, we propose incorporating audio and emotional style embeddings into the diffusion-based generation process, for realistic generation while concentrating on emotional expressions. We then propose lip-based explicit correlation learning to construct a strong mapping of audio to lip motions, assuring lip-audio synchronization. Furthermore, we deploy a video-to-video rendering module to transfer expressions and lip motions from a proxy 3D avatar to an arbitrary portrait. Both quantitatively and qualitatively, MagicTalk outperforms state-of-the-art methods in terms of expressiveness, lip-sync, and perceptual quality. 展开更多
关键词 emotions talking face generation diffusion model images implicit and explicit correlation learning
原文传递
MILG:Realistic lip-sync video generation with audio-modulated image inpainting
8
作者 Han Bao Xuhong Zhang +4 位作者 Qinying Wang Kangming Liang Zonghui Wang Shouling Ji Wenzhi Chen 《Visual Informatics》 EI 2024年第3期71-81,共11页
Existing lip synchronization(lip-sync)methods generate accurately synchronized mouths and faces in a generated video.However,they still confront the problem of artifacts in regions of non-interest(RONI),e.g.,backgroun... Existing lip synchronization(lip-sync)methods generate accurately synchronized mouths and faces in a generated video.However,they still confront the problem of artifacts in regions of non-interest(RONI),e.g.,background and other parts of a face,which decreases the overall visual quality.To solve these problems,we innovatively introduce diverse image inpainting to lip-sync generation.We propose Modulated Inpainting Lip-sync GAN(MILG),an audio-constraint inpainting network to predict synchronous mouths.MILG utilizes prior knowledge of RONI and audio sequences to predict lip shape instead of image generation,which can keep the RONI consistent.Specifically,we integrate modulated spatially probabilistic diversity normalization(MSPD Norm)in our inpainting network,which helps the network generate fine-grained diverse mouth movements guided by the continuous audio features.Furthermore,to lower the training overhead,we modify the contrastive loss in lipsync to support small-batch-size and few-sample training.Extensive experiments demonstrate that our approach outperforms the existing state-of-the-art of image quality and authenticity while keeping lip-sync. 展开更多
关键词 Lip-sync Image inpainting face generation Modulated SPD normalization
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部