摘要
近年来编码器和解码器组成的深度神经网络在图像描述任务中取得了很好的表现,一般编码器采用深度卷积神经网络,解码器采用循环神经网络.针对循环神经网络存在的梯度消失问题,在图像描述任务中表现为循环神经网络后续时间片生成的单词缺乏先前的信息引导,提出了记忆助手的方法,并给出了一种面向大规模中文数据集的多模态神经网络模型.该模型采用深度卷积神经网络(Inception-v4、Inception-ResNet-v2)和注意力机制提取图像视觉特征,在循环神经网络中引入记忆助手来引导句子的生成.实验证明,在AI CHALLENGER测试集中,这种模型显著地提高了各项评价指标.
In recent years,deep neural networks composed of encoders and decoders have achieved good performance in image caption tasks.The general encoder uses a deep convolutional neural network,and the decoder uses a recurrent neural network.Aiming at the problem of gradient disappearance in the recurrent neural network,the words generated by the subsequent time slices of the recurrent neural network in the image caption task lack the previous information guidance and the method of memory aid is proposed,and a multimodal neural network model for a large-scale Chinese dataset is given.The model uses deep convolutional neural network(Inception-v4,Inception-ResNet-v2)and attention mechanism to extract image visual features,and introduces memory aid in the recurrent neural network to guide the generation of sentences.Experiments have shown that this model significantly improved the various evaluation indicators in the AI CHALLENGER test set.
作者
郭淑涛
赵德新
GUO Shu-tao;ZHAO De-xin(School of Computer Science and Engineering,Tianjin University of Technology,Tianjin 300384,China)
出处
《天津理工大学学报》
2020年第3期30-35,共6页
Journal of Tianjin University of Technology
基金
国家自然科学基金(61571328).
关键词
中文图像描述
深度学习
卷积神经网络
递归神经网络
Chinese image caption
deep learning
convolutional neural networks
recurrent neural networks