期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
Integrating Speech-to-Text for Image Generation Using Generative Adversarial Networks
1
作者 Smita Mahajan Shilpa Gite +5 位作者 Biswajeet Pradhan Abdullah Alamri Shaunak Inamdar deva shriyansh Akshat Ashish Shah Shruti Agarwal 《Computer Modeling in Engineering & Sciences》 2025年第5期2001-2026,共26页
The development of generative architectures has resulted in numerous novel deep-learning models that generate images using text inputs.However,humans naturally use speech for visualization prompts.Therefore,this paper... The development of generative architectures has resulted in numerous novel deep-learning models that generate images using text inputs.However,humans naturally use speech for visualization prompts.Therefore,this paper proposes an architecture that integrates speech prompts as input to image-generation Generative Adversarial Networks(GANs)model,leveraging Speech-to-Text translation along with the CLIP+VQGAN model.The proposed method involves translating speech prompts into text,which is then used by the Contrastive Language-Image Pretraining(CLIP)+Vector Quantized Generative Adversarial Network(VQGAN)model to generate images.This paper outlines the steps required to implement such a model and describes in detail the methods used for evaluating the model.The GAN model successfully generates artwork from descriptions using speech and text prompts.Experimental outcomes of synthesized images demonstrate that the proposed methodology can produce beautiful abstract visuals containing elements from the input prompts.The model achieved a Frechet Inception Distance(FID)score of 28.75,showcasing its capability to produce high-quality and diverse images.The proposed model can find numerous applications in educational,artistic,and design spaces due to its ability to generate images using speech and the distinct abstract artistry of the output images.This capability is demonstrated by giving the model out-of-the-box prompts to generate never-before-seen images with plausible realistic qualities. 展开更多
关键词 Generative adversarial networks speech-to-image translation visualization transformers prompt engineering
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部