期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
Concept-Guided Open-Vocabulary Temporal Action Detection
1
作者 Song-Miao Wang rui-ze han Wei Feng 《Journal of Computer Science & Technology》 2025年第5期1270-1284,共15页
Vision-language models(VLMs)have shown strong open-vocabulary learning abilities in various video understanding tasks.However,when applied to open-vocabulary temporal action detection(OV-TAD),existing OV-TAD methods o... Vision-language models(VLMs)have shown strong open-vocabulary learning abilities in various video understanding tasks.However,when applied to open-vocabulary temporal action detection(OV-TAD),existing OV-TAD methods often face challenges in generalizing to unseen action categories due to their reliance on visual features,resulting in limited generalization.In this paper,we propose a novel framework,Concept-Guided Semantic Projection(CSP),to enhance the generalization ability of OV-TAD methods.By projecting video features into a unified action concept space,CSP enables the use of abstracted action concepts for action detection,rather than solely relying on visual details.To further improve feature consistency across action categories,we introduce a mutual contrastive loss(MCL),ensuring semantic coherence and better feature discrimination.Extensive experiments on the ActivityNet and THUMOS14 benchmarks demonstrate that our method outperforms state-of-the-art OV-TAD methods.Code and data are available at Concept-Guided-OV-TAD. 展开更多
关键词 open-vocabulary temporal action detection(TAD) visual-language model
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部