期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
MagicTalk: Implicit and explicit correlation learning for diffusion-based emotional talking face generation
1
作者 Chenxu Zhang Chao Wang +7 位作者 Jianfeng Zhang Hongyi Xu Guoxian Song You Xie Linjie Luo Yapeng Tian Jiashi Feng Xiaohu Guo 《Computational Visual Media》 2025年第4期763-779,共17页
Generating emotional talking faces from a single portrait image remains a significant challenge. The simultaneous achievement of expressive emotional talking and accurate lip-sync is particularly difficult, as express... Generating emotional talking faces from a single portrait image remains a significant challenge. The simultaneous achievement of expressive emotional talking and accurate lip-sync is particularly difficult, as expressiveness is often compromised for lip-sync accuracy. Prevailing generative works usually struggle to juggle to generate subtle variations of emotional expression and lip-synchronized talking. To address these challenges, we suggest modeling the implicit and explicit correlations between audio and emotional talking faces with a unified framework. As human emotional expressions usually present subtle and implicit relations with speech audio, we propose incorporating audio and emotional style embeddings into the diffusion-based generation process, for realistic generation while concentrating on emotional expressions. We then propose lip-based explicit correlation learning to construct a strong mapping of audio to lip motions, assuring lip-audio synchronization. Furthermore, we deploy a video-to-video rendering module to transfer expressions and lip motions from a proxy 3D avatar to an arbitrary portrait. Both quantitatively and qualitatively, MagicTalk outperforms state-of-the-art methods in terms of expressiveness, lip-sync, and perceptual quality. 展开更多
关键词 emotions talking face generation diffusion model images implicit and explicit correlation learning
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部