期刊文献+
共找到5篇文章
< 1 >
每页显示 20 50 100
Concept-Guided Open-Vocabulary Temporal Action Detection
1
作者 Song-Miao Wang Rui-Ze Han Wei Feng 《Journal of Computer Science & Technology》 2025年第5期1270-1284,共15页
Vision-language models(VLMs)have shown strong open-vocabulary learning abilities in various video understanding tasks.However,when applied to open-vocabulary temporal action detection(OV-TAD),existing OV-TAD methods o... Vision-language models(VLMs)have shown strong open-vocabulary learning abilities in various video understanding tasks.However,when applied to open-vocabulary temporal action detection(OV-TAD),existing OV-TAD methods often face challenges in generalizing to unseen action categories due to their reliance on visual features,resulting in limited generalization.In this paper,we propose a novel framework,Concept-Guided Semantic Projection(CSP),to enhance the generalization ability of OV-TAD methods.By projecting video features into a unified action concept space,CSP enables the use of abstracted action concepts for action detection,rather than solely relying on visual details.To further improve feature consistency across action categories,we introduce a mutual contrastive loss(MCL),ensuring semantic coherence and better feature discrimination.Extensive experiments on the ActivityNet and THUMOS14 benchmarks demonstrate that our method outperforms state-of-the-art OV-TAD methods.Code and data are available at Concept-Guided-OV-TAD. 展开更多
关键词 open-vocabulary temporal action detection(TAD) visual-language model
原文传递
Open environments-aware SLAM based on YOLO-enhanced open-vocabulary object detection
2
作者 Chengqun SONG Fuxiang WU +4 位作者 Xiangyang GAO Jun CHENG Mengjie YANG Qiao LIU Lei WANG 《Science China(Technological Sciences)》 2025年第11期239-252,共14页
Simultaneous localization and mapping(SLAM)is a pivotal challenge in mobile robotics.Traditional SLAM solutions primarily focus on achieving rapid and accurate localization and mapping while typically neglecting envir... Simultaneous localization and mapping(SLAM)is a pivotal challenge in mobile robotics.Traditional SLAM solutions primarily focus on achieving rapid and accurate localization and mapping while typically neglecting environmental object identification.This paper introduces an innovative SLAM system enhanced with YOLO-based open-vocabulary object detection.It leverages visual-language alignment to identify both known and novel objects using extensive image-text pairs.Our approach employs YOLOv8 as a teacher model,balancing speed and accuracy for object detection and bounding box prediction.These predictions are processed via CLIP encoders to generate high-dimensional vectors,teaching a student model robust image and text embeddings.Novel loss functions align augmented embeddings with supervisory signals,greatly enhancing detection accuracy and generalization.Additionally,the system integrates depth map-based scale extraction,3D mapping of target object positions,and efficient relative pose estimation for loop detection.The direct method used improves accuracy and robustness,especially in poorly textured environments.Extensive ablation studies show significant improvements in precision and recall metrics.Our advanced SLAM system not only ensures accurate localization and mapping but also enables mobile robots to recognize and interact with a wide variety of objects,making it ideal for practical applications in complex environments. 展开更多
关键词 SLAM open-vocabulary object detection binocular vision mobile robotics
原文传递
An Analysis of OpenSeeD for Video Semantic Labeling
3
作者 Jenny Zhu 《Journal of Computer and Communications》 2025年第1期59-71,共13页
Semantic segmentation is a core task in computer vision that allows AI models to interact and understand their surrounding environment. Similarly to how humans subconsciously segment scenes, this ability is crucial fo... Semantic segmentation is a core task in computer vision that allows AI models to interact and understand their surrounding environment. Similarly to how humans subconsciously segment scenes, this ability is crucial for scene understanding. However, a challenge many semantic learning models face is the lack of data. Existing video datasets are limited to short, low-resolution videos that are not representative of real-world examples. Thus, one of our key contributions is a customized semantic segmentation version of the Walking Tours Dataset that features hour-long, high-resolution, real-world data from tours of different cities. Additionally, we evaluate the performance of open-vocabulary, semantic model OpenSeeD on our own custom dataset and discuss future implications. 展开更多
关键词 Semantic Segmentation Detection LABELING OpenSeeD open-vocabulary Walking Tours Dataset VIDEOS
在线阅读 下载PDF
Traversability Analysis for Tracked Vehicles in Unstructured Environments:An Approach Employing Vehicle Traversing Capability and Terrain’s Multi-modal Data
4
作者 Haodong Wang Biao Ma +3 位作者 Liang Yu Man Chen Liyong Wang Heyan Li 《Automotive Innovation》 2025年第2期264-280,共17页
The principal objective of autonomous navigation involves terrain traversability analysis,where traversability refers to the suitability of a given terrain for driving over.It is difficult to infer the traversability ... The principal objective of autonomous navigation involves terrain traversability analysis,where traversability refers to the suitability of a given terrain for driving over.It is difficult to infer the traversability cost from the semantic types or geometric properties of the terrain independently.Robots may develop a false perception of high grass and rugged dirt.To address these challenges,this paper proposes a method for local traversability map generation,which uses onboard LiDAR and cameras to generate local traversability maps.One of the key ideas to achieve this is to build the interaction between the geometric properties and the types of terrain.This relationship represents the sensitivity of traversability to geometry under certain types of terrain,and it will be used in conjunction with semantics and geometry to reason about traversability.Further,to prevent side or longitudinal slipping that exceeds the capacity of the traffic system,vehicle classes and design factors are also incorporated into the calculation of traversability costs.Real-world experimental results demonstrate that the proposed method can generate traversability maps in unstructured environments.Ablation studies substantiate the method's efficacy.Compared to existing methods,our approach provides more reasonable analysis results when dealing with complex environments featuring diverse terrains. 展开更多
关键词 Terrain traversability Unstructured environments Tracked vehicles open-vocabulary semantic segmentation Elevation mapping
原文传递
A Survey of Zero-Shot Object Detection
5
作者 Weipeng Cao Xuyang Yao +3 位作者 Zhiwu Xu Ye Liu Yinghui Pan Zhong Ming 《Big Data Mining and Analytics》 2025年第3期726-750,共25页
Zero-Shot object Detection(ZSD),one of the most challenging problems in the field of object detection,aims to accurately identify new categories that are not encountered during training.Recent advancements in deep lea... Zero-Shot object Detection(ZSD),one of the most challenging problems in the field of object detection,aims to accurately identify new categories that are not encountered during training.Recent advancements in deep learning and increased computational power have led to significant improvements in object detection systems,achieving high recognition accuracy on benchmark datasets.However,these systems remain limited in real-world applications due to the scarcity of labeled training samples,making it difficult to detect unseen classes.To address this,researchers have explored various approaches,yielding promising progress.This article provides a comprehensive review of the current state of ZSD,distinguishing four related methods—zero-shot,open-vocabulary,open-set,and open-world approaches—based on task objectives and data usage.We highlight representative methods,discuss the technical challenges within each framework,and summarize the commonly used evaluation metrics,benchmark datasets,and experimental results.Our review aims to offer readers a clear overview of the latest developments and performance trends in ZSD. 展开更多
关键词 Zero-Shot object Detection(ZSD) open-vocabulary object detection open-set object detection open-world object detection
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部