摘要
为了有效解决现阶段视觉问答(Visual Question Answering,VQA)模型难以处理需要额外知识才能解答的问题,文中提出了一种问题引导的外部知识查询机制(Question-Guided Mechanism for Querying External Knowledge,QGK),旨在集成关键知识以丰富问题文本,从而提高VQA模型的准确率。首先,开发了一种问题引导的外部知识查询机制(QGK),以扩充模型内的文本特征表示并增强其处理复杂问题的能力。其中包含了多阶段处理流程,包括关键词提取、查询构造、知识筛选和提炼步骤。其次,还引入了视觉常识特征以验证所提方法的有效性。实验结果表明,所提出的查询机制能够有效提供重要的外部知识,显著提升模型在VQA v2.0数据集上的准确率。当将查询机制单独加入基线模型时,准确率提升至71.05%;而将视觉常识特征与外部知识查询机制相结合时,模型的准确率进一步提高至71.38%。这些结果验证了所提方法对于提升VQA模型性能的显著效果。
To address the limitation of current visual question answering(VQA)models in handling questions that require external knowledge,this paper proposes a question-guided mechanism for querying external knowledge(QGK).The aim is to integrate key knowledge to enrich question text,thereby improving the accuracy of VQA models.We develop a question-guided external knowledge query mechanism to expand the text feature representation within the model and enhance its ability to handle complex problems.This mechanism includes a multi-stage processing method with steps for keyword extraction,query construction,and knowledge screening and refining.Besides,we introduce visual common sense features to validate the effectiveness of the proposed method.Experimental results demonstrate that the proposed query mechanism effectively provides crucial external knowledge and significantly improves model accuracy on the VQA v2.0 dataset.When the query mechanism is integrated into the baseline model,the accuracy increases to 71.05%.Furthermore,combining visual common sense features with the external knowledge query mechanism boosts the model’s accuracy to 71.38%.These results confirm the significant impact of the proposed method on enhancing VQA model performance.
作者
徐钰涛
汤守国
XU Yutao;TANG Shouguo(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650504,China;Yunnan Key Laboratory of Computer Technologies Application,Kunming 650504,China)
出处
《计算机科学》
北大核心
2025年第S1期247-254,共8页
Computer Science
基金
云南省基础研究专项(202201AS070029)
云南省重大专项计划(202302AD080002)。
关键词
视觉问答
外部知识库
查询机制
长短时记忆网络
文本特征
Visual question answering
External knowledge base
Query mechanism
Long-short term memory network
Text feature