摘要
针对变电站工作人员操作行为违规检测问题,研究基于人物交互关系的视觉语言模型,采用目标文本语义向量与图像特征向量的深度融合,对人、物体以及2者的交互关系进行准确的识别。首先,对人物交互行为进行主谓宾三元组的形式进行标注,然后,采用预训练文本编码器、图像编码器将要识别的人物交互关系目标进行编码,再使用迭代式多模态融合机制来帮助模型进行变电站工作人员违规行为的检测。最后通过实验对比人物交互视觉语言模型在违规行为检测任务中与传统目标检测模型的效果,实验结果表明:采用基于人物交互视觉语言模型在识别精度上提升了10%,召回率降低4%,且在迭代层数为6时整体模型性能达到最优,验证了本方法在变电站复杂违规行为检测任务上效果优越。
Aiming at the problem of violation detection of substation staff's operation behavior,a visual language model based on human object interaction relationship is studied.The deep fusion of target text semantic vector and image feature vector is used to accurately identify people,objects and their interaction relationship.Firstly,the hu⁃man object interaction behavior is marked in the form of subject-predicate-object triples.Then,the pre-trained text encoder and image encoder are used to encode the human object interaction relationship target to be identified,and then the iterative multi-modal fusion mechanism is used to help the model detect the violation behavior of substa⁃tion staff.Finally,the effect of the human object interaction visual language model in the violation behavior detec⁃tion task is compared with the traditional target detection model through experiments.The experimental results show that the recognition accuracy based on the human-interactive visual language model is improved by 10%,the recall rate is reduced by 4%,and the overall model performance is optimal when the number of iterations is 6,which verifies that this method has superior performance in the detection task of complex violations in substations.
作者
刘志鹏
赵天成
LIU Zhipeng;ZHAO Tiancheng(State Grid Hubei Electric Power Co.,Ltd.Ultra High Voltage Company,Wuhan 430050,China;Binjiang Institute of Zhejiang University,Hangzhou 310053,China)
出处
《粘接》
2025年第8期177-180,共4页
Adhesion
关键词
人物交互
视觉语言模型
变电站
违规行为检测
多模态
human interaction
visual language model
substation
violation detection
multimodal