期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
Text to image generation with bidirectional Multiway Transformers
1
作者 Hangbo Bao Li Dong +1 位作者 Songhao Piao Furu Wei 《Computational Visual Media》 2025年第2期405-422,共18页
In this study,we explore the potential of Multiway Transformers for text-to-image generation to achieve performance improvements through a concise and efficient decoupled model design and the inference efficiency prov... In this study,we explore the potential of Multiway Transformers for text-to-image generation to achieve performance improvements through a concise and efficient decoupled model design and the inference efficiency provided by bidirectional encoding.We propose a method for improving the image tokenizer using pretrained Vision Transformers.Next,we employ bidirectional Multiway Transformers to restore the masked visual tokens combined with the unmasked text tokens.On the MS-COCO benchmark,our Multiway Transformers outperform vanilla Transformers,achieving superior FID scores and confirming the efficacy of the modality-specific parameter computation design.Ablation studies reveal that the fusion of visual and text tokens in bidirectional encoding contributes to improved model performance.Additionally,our proposed tokenizer outperforms VQGAN in image reconstruction quality and enhances the text-to-image generation results.By incorporating the additional CC-3M dataset for intermediate finetuning on our model with 688M parameters,we achieve competitive results with a finetuned FID score of 4.98 on MS-COCO. 展开更多
关键词 text to image generation VQ-VAE TRANSFORMER generative models
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部