摘要
图像语义描述模型通常采用编码器-解码器方式实现图像语义描述,模型存在对图像特征利用不充分,图像目标的位置信息提取不足等问题。针对此问题,提出在编码器部分融合注意力机制的图像语义描述算法,通过解码器上下文信息对不同图像特征的注意力权重分配,从而提高图像语义描述的表达能力。并在Flickr30k和MSCOCO数据集上进行了验证,模型在BLEU-4评价指标上分别提升了1.9%和0.8%,实验证明了本文算法的有效性。
The image semantic description model usually adopts the encoder-decoder method to realize the image semantic description.The model has problems such as insufficient utilization of image features and insufficient location information extraction of image objects.In response to this problem,an image semantic description algorithm is proposed that integrates the attention mechanism in the encoder part,and the attention weight of different image features is allocated through the context information of the decoder,thereby improving the expressive ability of image semantic description.And verified on the Flickr30 kand MSCOCO data sets,the model improves the BLEU-4 evaluation index by 1.9% and 0.8%,respectively.The experiment proves the effectiveness of the proposed algorithm.
作者
郭列
张团善
孙威振
郭杰龙
Guo Lie;Zhang Tuanshan;Sun Weizhen;Guo Jielong(Xi′an Key Laboratory of Modern Intelligent Textile Equipment,College of Mechanical and Electrical Engineering,Xi′an Polytechnic University,Xi′an,Shaanxi 710600,China;Quanzhou Institute of Equipment Manufacturing,Haixi Institutes,Chinses Academy of Science,Quanzhou,Fujian362216,China)
出处
《激光与光电子学进展》
CSCD
北大核心
2021年第12期313-322,共10页
Laser & Optoelectronics Progress
基金
国家自然科学基金青年基金(61806186)
机器人与系统国家重点实验室(SKLRS-2019-KF-15)
“福建省智能物流产业技术研究院建设”项目(2018H2001)
泉州市科技计划项目(2019C112,2019STS08)。
关键词
图像处理
注意力机制
深度卷积神经网络
长短时记忆网络
image processing
attention mechanism
deep convolutional neural network
long-short term memory