After Denoising Diffusion Probabilistic Models(DDPM)outperformed Generative Adversarial Networks(GANs),diffusion models have evolved into the backbone of text-guided visual generation,with Stable Diffusion and DALL...After Denoising Diffusion Probabilistic Models(DDPM)outperformed Generative Adversarial Networks(GANs),diffusion models have evolved into the backbone of text-guided visual generation,with Stable Diffusion and DALL·E 2 alleviating key technical constraints.Despite remarkable advances in Text-to-Image(T2I)and Text-to-Video(T2V)tasks,critical gaps remain unaddressed.This paper conducts a systematic review of diffusion-based T2I and T2V technologies,synthesises the latest advances in related technologies,and proposes a"Technical Module-Application-Evaluation"framework to link technical breakthroughs with real-world applications.It also highlights under-researched fields and corresponding evaluation benchmarks,offering an integrated technical landscape to guide the equitable and reliable industrialisation of text-driven visual generation technologies.展开更多
文摘After Denoising Diffusion Probabilistic Models(DDPM)outperformed Generative Adversarial Networks(GANs),diffusion models have evolved into the backbone of text-guided visual generation,with Stable Diffusion and DALL·E 2 alleviating key technical constraints.Despite remarkable advances in Text-to-Image(T2I)and Text-to-Video(T2V)tasks,critical gaps remain unaddressed.This paper conducts a systematic review of diffusion-based T2I and T2V technologies,synthesises the latest advances in related technologies,and proposes a"Technical Module-Application-Evaluation"framework to link technical breakthroughs with real-world applications.It also highlights under-researched fields and corresponding evaluation benchmarks,offering an integrated technical landscape to guide the equitable and reliable industrialisation of text-driven visual generation technologies.