期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
基于神经网络的图像描述实现
1
作者 张大任 艾山·吾买尔 《现代计算机》 2021年第19期117-123,共7页
图像描述作为结合图像和文本领域的深度学习任务,拥有广泛的应用场景,现有图像描述模型都是针对大型英文数据集,在小型数据集及其他语言数据集上没有太多关注,本文使用多种神经网络模型实现图像描述,同时验证不同图像描述模型在其他语... 图像描述作为结合图像和文本领域的深度学习任务,拥有广泛的应用场景,现有图像描述模型都是针对大型英文数据集,在小型数据集及其他语言数据集上没有太多关注,本文使用多种神经网络模型实现图像描述,同时验证不同图像描述模型在其他语言上的有效性。实验证明,使用预训练Res Net101后的自适应注意力图像描述模型在多语言Flickr8k数据集上拥有较好效果,Bleu-4值达到20.4,CIDEr值达到54,且模型在汉语及俄语等非英语语言上同样有效。 展开更多
关键词 神经网络 图像描述 注意力 flickr8k 多语言
在线阅读 下载PDF
Image Captioning Using Multimodal Deep Learning Approach
2
作者 Rihem Farkh Ghislain Oudinet Yasser Foued 《Computers, Materials & Continua》 SCIE EI 2024年第12期3951-3968,共18页
The process of generating descriptive captions for images has witnessed significant advancements in last years,owing to the progress in deep learning techniques.Despite significant advancements,the task of thoroughly ... The process of generating descriptive captions for images has witnessed significant advancements in last years,owing to the progress in deep learning techniques.Despite significant advancements,the task of thoroughly grasping image content and producing coherent,contextually relevant captions continues to pose a substantial challenge.In this paper,we introduce a novel multimodal method for image captioning by integrating three powerful deep learning architectures:YOLOv8(You Only Look Once)for robust object detection,EfficientNetB7 for efficient feature extraction,and Transformers for effective sequence modeling.Our proposed model combines the strengths of YOLOv8 in detecting objects,the superior feature representation capabilities of EfficientNetB7,and the contextual understanding and sequential generation abilities of Transformers.We conduct extensive experiments on standard benchmark datasets to evaluate the effectiveness of our approach,demonstrating its ability to generate informative and semantically rich captions for diverse images.The experimental results showcase the synergistic benefits of integrating YOLOv8,EfficientNetB7,and Transformers in advancing the state-of-the-art in image captioning tasks.The proposed multimodal approach has yielded impressive outcomes,generating informative and semantically rich captions for a diverse range of images.By combining the strengths of YOLOv8,EfficientNetB7,and Transformers,the model has achieved state-of-the-art results in image captioning tasks.The significance of this approach lies in its ability to address the challenging task of generating coherent and contextually relevant captions while achieving a comprehensive understanding of image content.The integration of three powerful deep learning architectures demonstrates the synergistic benefits of multimodal fusion in advancing the state-of-the-art in image captioning.Furthermore,this approach has a profound impact on the field,opening up new avenues for research in multimodal deep learning and paving the way for more sophisticated and context-aware image captioning systems.These systems have the potential to make significant contributions to various fields,encompassing human-computer interaction,computer vision and natural language processing. 展开更多
关键词 Image caption multimodelmethods YOLOv8 efficientNetB7 features extration TRANSFORMERS ENCODER DECODER flickr8k
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部