期刊文献+

基于知识增强的输电线路场景图像描述

Image Description of Transmission Line Scenes Based on Knowledge Enhancement
在线阅读 下载PDF
导出
摘要 针对输电线路场景中缺乏对潜在风险的刻画以及缺少领域知识支持等问题,提出一种知识图谱增强的跨模态图像描述模型。以电力塔、树木、车辆、绝缘子等4种常见的场景目标物为基础,通过人工标注和图像增强方法构建图像—英文描述数据集。在编码阶段,基于该数据集构建特定领域知识图谱,以增强CLIP跨模态模型的视觉特征提取能力。为增强模态间的交互能力,设计轻量化LLM-Adapter模块,通过门控交叉注意力机制动态融合视觉与文本特征。将获取到的语言感知视觉特征作为前缀输入到大语言模型中,生成最终描述语句。实验结果表明,在自制数据集和MS COCO 2014公开数据集上,所提模型的BLEU-4、CIDEr、SPICE等评估指标相较对照模型均有所提升,其中BLEU-4值达到0.45,生成的描述文本更连贯、内容更丰富。 A knowledge graph enhanced cross modal image description model is proposed to address the lack of characterization of potential risks and domain knowledge support in transmission line scenarios.Based on four common scene objects including power towers,trees,vehicles,and insulators,an image English description dataset is constructed through manual annotation and image enhancement methods.During the encoding phase,a domain specific knowledge graph is constructed based on this dataset to enhance the visual feature extraction capability of the CLIP cross modal model.To enhance the interaction ability between modalities,a lightweight LLM Adapter module is designed to dynamically fuse visual and textual features through a gated cross attention mechanism.Input the obtained language perception visual features as prefixes into a large language model to generate the final descriptive statement.The experimental results show that the proposed model has improved evaluation indicators such as BLEU-4,CIDEr,and SPICE compared to the control model on both the self-made dataset and the MS COCO 2014 public dataset.The BLEU-4 value reached 0.45,and the generated descriptive text was more coherent and rich in content.
作者 于平平 周晶晶 张立岩 苏鹤 YU Pingping;ZHOU Jingjing;ZHANG Liyan;SU He(College of Information Science and Engineering,Hebei University of Science and Technology,Shijiazhuang 050018,China;College of Electrical Engineering,Hebei University of Technology,Tianjin 300131,China)
出处 《软件导刊》 2025年第9期206-212,共7页 Software Guide
基金 河北省教育厅青年基金项目(QN2024052)。
关键词 图像描述 跨模态 门控交叉注意力 大语言模型 知识图谱 image description cross-modal gated cross-attention large language model knowledge graph
  • 相关文献

参考文献4

二级参考文献31

共引文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部