期刊文献+

基于双注意模型的图像描述生成方法研究 被引量:9

Research on Image Captioning Based on Double Attention Model
在线阅读 下载PDF
导出
摘要 现有图像描述生成方法的注意模型通常采用单词级注意,从图像中提取局部特征作为生成当前单词的视觉信息输入,缺乏准确的图像全局信息指导.针对这个问题,提出基于语句级注意的图像描述生成方法,通过自注意机制从图像中提取语句级的注意信息,来表示生成语句所需的图像全局信息.在此基础上,结合语句级注意和单词级注意进一步提出了双注意模型,以此来生成更准确的图像描述.通过在模型的中间阶段实施监督和优化,以解决信息间的干扰问题.此外,将强化学习应用于两阶段的训练来优化模型的评估度量.通过在MSCOCO和Flickr30K两个基准数据集上的实验评估,结果表明本文提出的方法能够生成更加准确和丰富的描述语句,并且在各项评价指标上优于现有的多种基于注意机制的方法. The attention model of existing image captioning approaches usually adopt word-level attention,which extracts local features from images.The local features are used as the visual information input to generate the current word,lacking accurate image global information guidance.To solve this problem,this paper proposed image captioning approach based on sentence-level attention.The approach employs the self-attention mechanism to extract the sentence-level attention information from the image,which is used to represent the global image information needed to generate sentences.On this basis,we further proposes a double attention model which combines sentence-level attention with word-level attention to generate more accurate description.We implement supervision and optimization in the intermediate stage of the model to solve the problem of information interference.In addition,reinforcement learning is applied in two-stage training to optimize the evaluation metric of the model.Finally,we evaluated our approach on two baseline datasets,i.e.MSCOCO and Flickr30K.Experimental results show that the proposed approach can generate more accurate and richer captions.Hence it outperforms many state-of-the-art image captioning approaches based on attention mechanism in various evaluation metrics.
作者 卓亚琦 魏家辉 李志欣 ZHUO Ya-qi;WEI Jia-hui;LI Zhi-xin(College of Science,Guilin University of Technology,Guilin,Guangxi 541004,China;Guangxi Key Lab of Multi-source Information Mining and Security,Guangxi Normal University,Guilin,Guangxi 541004,China)
出处 《电子学报》 EI CAS CSCD 北大核心 2022年第5期1123-1130,共8页 Acta Electronica Sinica
基金 国家自然科学基金(No.61966004,No.61866004) 广西自然科学基金(No.2019GXNSFDA245018) 广西研究生教育创新计划(No.XY-CBZ2021002)。
关键词 图像描述生成 编码器-解码器架构 单词级注意 语句级注意 双注意模型 强化学习 image captioning encoder-decoder architecture word-level attention sentence-level attention double attention model reinforcement learning
  • 相关文献

参考文献1

二级参考文献5

共引文献7

同被引文献38

引证文献9

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部