摘要
开放世界目标检测旨在在动态环境中同时识别已知与未知类别,并在收到未知类别的标签后逐步实现对新增类别的识别能力.然而,现有方法因缺乏未知类别的语义表征能力,已知与未知类别间的指导信息相互耦合,导致检测性能受限.为此,本文提出一种基于因果提示蒸馏的开放世界目标检测方法.该方法创新性地将视觉-语言模型与因果推理结合,以解决开放场景中的类别间存在的语义偏差问题.具体而言,本文通过构建结构因果模型,从因果视角揭示了已知类别与未知类别间的语义干扰路径;接着提出了因果提示学习,通过生成未知类别的语义向量,显式引入开放场景的语义先验以增强模型对未知目标的感知能力;最后针对知识传递中的语义偏差问题提出因果蒸馏机制,利用双重蒸馏损失解耦教师模型对已知类别与未知类别的指导信息.实验结果表明,该方法在多个数据集上取得了良好效果,已知类别的平均检测精度(mAP)提升了1.3%,未知类别的召回率(U-Recall)提升了6.5%,这些结果验证了本文方法的有效性.
Open world object detection aims to simultaneously identify both known and unknown categories in dy⁃namic environments,while enabling incremental learning of new categories.However,due to the lack of semantic represen⁃tation ability of unknown categories,the guidance information between known and unknown categories is mutually cou⁃pled,resulting in limited detection performance.To solve this problem,this paper proposes an open world object detection based on causal prompt distillation,which innovatively combines visual-language model with causal inference to solve the problem of semantic bias between categories in open scenes.Specifically,by constructing a structural causal model,this pa⁃per reveals the semantic interference path between known and unknown categories from the perspective of causality.Then,causal prompt learning is proposed,which explicitly introduces the semantic prior of the open scene by generating semantic vectors of unknown categories to enhance the model’s perception of unknown objects.Finally,in order to solve the problem of semantic bias in knowledge transfer,a causal distillation mechanism is proposed,and the guidance information of the known and unknown categories is decoupled by the double distillation loss decoupling teacher model.Experimental results demonstrate that this method has achieved good effects on multiple datasets,with an improvement in mean average preci⁃sion(mAP)for known categories by 1.3%and a rise in recall rate(U-Recall)for unknown categories by 6.5%.These results validate the effectiveness and robustness of the proposed approach.
作者
赵佳琦
王平安
周勇
杜文亮
姚睿
刘兵
ZHAO Jia-qi;WANG Ping-an;ZHOU Yong;DU Wen-liang;YAO Rui;LIU Bing(School of Computer Science and Technology,China University of Mining and Technology,Xuzhou,Jiangsu 221116,China;Mine Digitization Engineering Research Center of the Ministry of Education,Xuzhou,Jiangsu 221116,China)
出处
《电子学报》
北大核心
2025年第6期2079-2089,共11页
Acta Electronica Sinica
基金
国家自然科学基金(No.62272461,No.62172417,No.62276266,No.62277046)
中国矿业大学自主创新与社会服务双一流项目(No.2022ZZCX06)
江苏省六大人才高峰项目(No.2015-DZXX-010,No.2018-XYDXX-044)。
关键词
提示学习
知识蒸馏
因果干预
开放世界
目标检测
计算机视觉
prompt learning
knowledge distillation
causal intervention
open world
object detection
computer vision