摘要
为解决供电系统中跨模态数据图像块特征与文本特征无法对齐的问题,提出一种基于大模型的多层级特征对齐的图文检索方法(TFPN)。首先,确定图文检索的目标函数;然后以ViT模型作为视觉基础,采用单独抽取的全局特征对特征学习的文本信息进行引导,之后采用分流拼接注意力机制进行跨模态数据协同与特征交互;最后基于全局图像特征与句子级文本特征的相似度进行图文检索。结果表明,在CCKS2018_Task3数据集下,本方法在文本检索和图像检索中的R@1和R@10指标分别为77.44、95.18和55.27、96.01,均高于基于位置信息图推理和文本指导特征选择的图文检索方法、Transformer-GAN的图文检索方法和基于模态内细粒度特征关系提取的图像文本检索方法。综合分析说明,本方法可实现多层级文本特征金字塔和图像块特征的多阶段融合,从而进一步提升供电系统跨模态数据协同分析效果和可视化交互能力,具备有效性。
To solve the problem of misalignment between cross modal data image block features and text features in power supply systems, a multi-level feature alignment based image text retrieval method (TFPN) based on large models is proposed. Firstly, determine the objective function for image and text retrieval;Then, using the ViT model as the visual foundation, the text information learned from feature learning is guided by separately extracted global features. Afterwards, a split stitching attention mechanism is used for cross modal data collaboration and feature interaction;Finally, image text retrieval is performed based on the similarity between global image features and sentence level text features. The results showed that in the CCKS2018_Task3 dataset, the R @ 1 and R @ 10 indicators of this method in text retrieval and image retrieval were 77.44 and 95.18, respectively, and 55.27 and 96.01, respectively, which were higher than the image text retrieval methods based on positional information graph inference and text guided feature selection, Transformer GAN image text retrieval methods, and image text retrieval methods based on fine-grained feature relationship extraction within the modality. Comprehensive analysis shows that this method can achieve multi-stage fusion of multi-level text feature pyramids and image block features, thereby further improving the collaborative analysis effect and visual interaction ability of cross modal data in power supply systems, and has effectiveness.
作者
余芸
张喜铭
林志达
梁寿愚
赵翔宇
YUN Yu;ZHANG Ximing;LIN Zhida;LIANG Shouyu;ZHAO Xiangyu(China Southern Power Grid Company Limited,Guangzhou 510663,China;China Southern Power Grid Artificial Intelligence Technology Co.,Ltd.,Guangzhou 510663,China)
出处
《自动化与仪器仪表》
2026年第1期336-340,共5页
Automation & Instrumentation
关键词
跨模态数据
图文检索
分流拼接注意力机制
协同分析
可视化交互
Cross-modal data
image-text retrieval
diversion and splicing attention mechanism
collaborative analysis
visual interaction