摘要
图像标题生成在人机交互、多媒体搜索以及图像自动标注等领域具有广泛的应用前景。文章提出基于藏文音节的图像标题生成方法。首先,Encoder将输入的图像数据通过多层残差卷积层提取图像特征;其次,通过Attention机制来准确获取Encoder中的特征向量,进行加权求和,增强特征提取;最后,采用LSTM的解码器对藏文音节特征向量进行解码,生成图像标题。该方法在Flickr8K测试集上和Flickr30K测试集上BLEU_4值分别达到了20.6和24.4,比紧缩格的切分方法分别提高了2.3和4.2。生成的标题语言表达流畅,符合语法规则,能较好地描述图像的核心意义。
mage captioning has a wide range of applications in the fields such as human-computer interaction,multimedia search,and automatic image annotation.An image captioning method based on Tibetan syllables is proposed in this paper.Firstly,through multi-layer residual convolutional layers,image features are extracted from the input image data using the Encoder.Secondly,the feature vectors captured accurately from the Encoder using the attention mechanism are weighted and summed to enhance feature extraction.Finally,the Tibetan syllable feature vectors are decoded using the LSTM decoder to generate image captions.The BLEU_4 values of the method proposed in this paper give 20.6 on the Flickr8K test set and 24.4 on the Flickr30K test set,respectively,which are 2.3 and 4.2 higher than those of the abbreviated case-auxiliary words method.The generated caption language is fluent,grammatically correct,and can effectively describe the core meaning of the image.
作者
华却才让
白颖
周子琦
才让当知
完么措
Huaque-Cairang;BAI Ying;ZHOU Ziqi;Cairang-Dangzhi;Wanme-Cuo(School of Computer Science and Technology,Qinghai Normal University,Xining 810008,China;The State Key Laboratory of Tibetan Intelligent Information Processing and Application,Qinghai Normal University,Xining 810008,China;Tibetan Information Processing and Machine Translation Key Laboratory of Qinghai Province,Xining 810008,China)
出处
《高原科学研究》
CSCD
2024年第3期102-109,共8页
Plateau Science Research
基金
国家自然科学基金项目(62166034)
藏语智能信息处理及应用国家重点实验室项目(2020-ZJ-Y05)
青海省基础研究计划项目(2020-0301-ZJC-0042)
青海省应用基础研究计划项目(2021-ZJ-727).
关键词
图像
标题
藏文音节
注意力机制
image
caption
Tibetan syllables
attention mechanism