期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
DyLoRA-TAD:Dynamic Low-Rank Adapter for End-to-End Temporal Action Detection
1
作者 Jixin Wu Mingtao Zhou +3 位作者 Di Wu Wenqi Ren Jiatian Mei Shu Zhang 《Computers, Materials & Continua》 2026年第3期2146-2162,共17页
End-to-end Temporal Action Detection(TAD)has achieved remarkable progress in recent years,driven by innovations in model architectures and the emergence of Video Foundation Models(VFMs).However,existing TAD methods th... End-to-end Temporal Action Detection(TAD)has achieved remarkable progress in recent years,driven by innovations in model architectures and the emergence of Video Foundation Models(VFMs).However,existing TAD methods that perform full fine-tuning of pretrained video models often incur substantial computational costs,which become particularly pronounced when processing long video sequences.Moreover,the need for precise temporal boundary annotations makes data labeling extremely expensive.In low-resource settings where annotated samples are scarce,direct fine-tuning tends to cause overfitting.To address these challenges,we introduce Dynamic LowRank Adapter(DyLoRA),a lightweight fine-tuning framework tailored specifically for the TAD task.Built upon the Low-Rank Adaptation(LoRA)architecture,DyLoRA adapts only the key layers of the pretrained model via low-rank decomposition,reducing the number of trainable parameters to less than 5%of full fine-tuning methods.This significantly lowers memory consumption and mitigates overfitting in low-resource settings.Notably,DyLoRA enhances the temporal modeling capability of pretrained models by optimizing temporal dimension weights,thereby alleviating the representation misalignment of temporal features.Experimental results demonstrate that DyLoRA-TAD achieves impressive performance,with 73.9%mAP on THUMOS14,39.52%on ActivityNet-1.3,and 28.2%on Charades,substantially surpassing the best traditional feature-based methods. 展开更多
关键词 temporal action detection end-to-end training dynamic low-rank adapter parameter-efficient finetuning video understanding
在线阅读 下载PDF
Concept-Guided Open-Vocabulary Temporal Action Detection
2
作者 Song-Miao Wang Rui-Ze Han Wei Feng 《Journal of Computer Science & Technology》 2025年第5期1270-1284,共15页
Vision-language models(VLMs)have shown strong open-vocabulary learning abilities in various video understanding tasks.However,when applied to open-vocabulary temporal action detection(OV-TAD),existing OV-TAD methods o... Vision-language models(VLMs)have shown strong open-vocabulary learning abilities in various video understanding tasks.However,when applied to open-vocabulary temporal action detection(OV-TAD),existing OV-TAD methods often face challenges in generalizing to unseen action categories due to their reliance on visual features,resulting in limited generalization.In this paper,we propose a novel framework,Concept-Guided Semantic Projection(CSP),to enhance the generalization ability of OV-TAD methods.By projecting video features into a unified action concept space,CSP enables the use of abstracted action concepts for action detection,rather than solely relying on visual details.To further improve feature consistency across action categories,we introduce a mutual contrastive loss(MCL),ensuring semantic coherence and better feature discrimination.Extensive experiments on the ActivityNet and THUMOS14 benchmarks demonstrate that our method outperforms state-of-the-art OV-TAD methods.Code and data are available at Concept-Guided-OV-TAD. 展开更多
关键词 open-vocabulary temporal action detection(TAD) visual-language model
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部