期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
Open environments-aware SLAM based on YOLO-enhanced open-vocabulary object detection
1
作者 Chengqun SONG Fuxiang WU +4 位作者 Xiangyang GAO Jun CHENG Mengjie YANG Qiao LIU Lei WANG 《Science China(Technological Sciences)》 2025年第11期239-252,共14页
Simultaneous localization and mapping(SLAM)is a pivotal challenge in mobile robotics.Traditional SLAM solutions primarily focus on achieving rapid and accurate localization and mapping while typically neglecting envir... Simultaneous localization and mapping(SLAM)is a pivotal challenge in mobile robotics.Traditional SLAM solutions primarily focus on achieving rapid and accurate localization and mapping while typically neglecting environmental object identification.This paper introduces an innovative SLAM system enhanced with YOLO-based open-vocabulary object detection.It leverages visual-language alignment to identify both known and novel objects using extensive image-text pairs.Our approach employs YOLOv8 as a teacher model,balancing speed and accuracy for object detection and bounding box prediction.These predictions are processed via CLIP encoders to generate high-dimensional vectors,teaching a student model robust image and text embeddings.Novel loss functions align augmented embeddings with supervisory signals,greatly enhancing detection accuracy and generalization.Additionally,the system integrates depth map-based scale extraction,3D mapping of target object positions,and efficient relative pose estimation for loop detection.The direct method used improves accuracy and robustness,especially in poorly textured environments.Extensive ablation studies show significant improvements in precision and recall metrics.Our advanced SLAM system not only ensures accurate localization and mapping but also enables mobile robots to recognize and interact with a wide variety of objects,making it ideal for practical applications in complex environments. 展开更多
关键词 SLAM open-vocabulary object detection binocular vision mobile robotics
原文传递
Concept-Guided Open-Vocabulary Temporal Action Detection
2
作者 Song-Miao Wang Rui-Ze Han Wei Feng 《Journal of Computer Science & Technology》 2025年第5期1270-1284,共15页
Vision-language models(VLMs)have shown strong open-vocabulary learning abilities in various video understanding tasks.However,when applied to open-vocabulary temporal action detection(OV-TAD),existing OV-TAD methods o... Vision-language models(VLMs)have shown strong open-vocabulary learning abilities in various video understanding tasks.However,when applied to open-vocabulary temporal action detection(OV-TAD),existing OV-TAD methods often face challenges in generalizing to unseen action categories due to their reliance on visual features,resulting in limited generalization.In this paper,we propose a novel framework,Concept-Guided Semantic Projection(CSP),to enhance the generalization ability of OV-TAD methods.By projecting video features into a unified action concept space,CSP enables the use of abstracted action concepts for action detection,rather than solely relying on visual details.To further improve feature consistency across action categories,we introduce a mutual contrastive loss(MCL),ensuring semantic coherence and better feature discrimination.Extensive experiments on the ActivityNet and THUMOS14 benchmarks demonstrate that our method outperforms state-of-the-art OV-TAD methods.Code and data are available at Concept-Guided-OV-TAD. 展开更多
关键词 open-vocabulary temporal action detection(TAD) visual-language model
原文传递
An Analysis of OpenSeeD for Video Semantic Labeling
3
作者 Jenny Zhu 《Journal of Computer and Communications》 2025年第1期59-71,共13页
Semantic segmentation is a core task in computer vision that allows AI models to interact and understand their surrounding environment. Similarly to how humans subconsciously segment scenes, this ability is crucial fo... Semantic segmentation is a core task in computer vision that allows AI models to interact and understand their surrounding environment. Similarly to how humans subconsciously segment scenes, this ability is crucial for scene understanding. However, a challenge many semantic learning models face is the lack of data. Existing video datasets are limited to short, low-resolution videos that are not representative of real-world examples. Thus, one of our key contributions is a customized semantic segmentation version of the Walking Tours Dataset that features hour-long, high-resolution, real-world data from tours of different cities. Additionally, we evaluate the performance of open-vocabulary, semantic model OpenSeeD on our own custom dataset and discuss future implications. 展开更多
关键词 Semantic Segmentation Detection LABELING OpenSeeD open-vocabulary Walking Tours Dataset VIDEOS
在线阅读 下载PDF
A Survey of Zero-Shot Object Detection
4
作者 Weipeng Cao Xuyang Yao +3 位作者 Zhiwu Xu Ye Liu Yinghui Pan Zhong Ming 《Big Data Mining and Analytics》 2025年第3期726-750,共25页
Zero-Shot object Detection(ZSD),one of the most challenging problems in the field of object detection,aims to accurately identify new categories that are not encountered during training.Recent advancements in deep lea... Zero-Shot object Detection(ZSD),one of the most challenging problems in the field of object detection,aims to accurately identify new categories that are not encountered during training.Recent advancements in deep learning and increased computational power have led to significant improvements in object detection systems,achieving high recognition accuracy on benchmark datasets.However,these systems remain limited in real-world applications due to the scarcity of labeled training samples,making it difficult to detect unseen classes.To address this,researchers have explored various approaches,yielding promising progress.This article provides a comprehensive review of the current state of ZSD,distinguishing four related methods—zero-shot,open-vocabulary,open-set,and open-world approaches—based on task objectives and data usage.We highlight representative methods,discuss the technical challenges within each framework,and summarize the commonly used evaluation metrics,benchmark datasets,and experimental results.Our review aims to offer readers a clear overview of the latest developments and performance trends in ZSD. 展开更多
关键词 Zero-Shot object Detection(ZSD) open-vocabulary object detection open-set object detection open-world object detection
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部