期刊文献+
共找到14,678篇文章
< 1 2 250 >
每页显示 20 50 100
Enhanced BEV Scene Segmentation:De-Noise Channel Attention for Resource-Constrained Environments
1
作者 Argho Dey Yunfei Yin +3 位作者 Zheng Yuan ZhiwenZeng Xianjian Bao Md Minhazul Islam 《Computers, Materials & Continua》 2026年第4期2161-2180,共20页
Autonomous vehicles rely heavily on accurate and efficient scene segmentation for safe navigation and efficient operations.Traditional Bird’s Eye View(BEV)methods on semantic scene segmentation,which leverage multimo... Autonomous vehicles rely heavily on accurate and efficient scene segmentation for safe navigation and efficient operations.Traditional Bird’s Eye View(BEV)methods on semantic scene segmentation,which leverage multimodal sensor fusion,often struggle with noisy data and demand high-performance GPUs,leading to sensor misalignment and performance degradation.This paper introduces an Enhanced Channel Attention BEV(ECABEV),a novel approach designed to address the challenges under insufficient GPU memory conditions.ECABEV integrates camera and radar data through a de-noise enhanced channel attention mechanism,which utilizes global average and max pooling to effectively filter out noise while preserving discriminative features.Furthermore,an improved fusion approach is proposed to efficiently merge categorical data across modalities.To reduce computational overhead,a bilinear interpolation layer normalizationmethod is devised to ensure spatial feature fidelity.Moreover,a scalable crossentropy loss function is further designed to handle the imbalanced classes with less computational efficiency sacrifice.Extensive experiments on the nuScenes dataset demonstrate that ECABEV achieves state-of-the-art performance with an IoU of 39.961,using a lightweight ViT-B/14 backbone and lower resolution(224×224).Our approach highlights its cost-effectiveness and practical applicability,even on low-end devices.The code is publicly available at:https://github.com/YYF-CQU/ECABEV.git. 展开更多
关键词 Autonomous vehicle BEV attention mechanism sensor fusion scene segmentation
在线阅读 下载PDF
DKP-SLAM:A Visual SLAM for Dynamic Indoor Scenes Based on Object Detection and Region Probability 被引量:1
2
作者 Menglin Yin Yong Qin Jiansheng Peng 《Computers, Materials & Continua》 SCIE EI 2025年第1期1329-1347,共19页
In dynamic scenarios,visual simultaneous localization and mapping(SLAM)algorithms often incorrectly incorporate dynamic points during camera pose computation,leading to reduced accuracy and robustness.This paper prese... In dynamic scenarios,visual simultaneous localization and mapping(SLAM)algorithms often incorrectly incorporate dynamic points during camera pose computation,leading to reduced accuracy and robustness.This paper presents a dynamic SLAM algorithm that leverages object detection and regional dynamic probability.Firstly,a parallel thread employs the YOLOX object detectionmodel to gather 2D semantic information and compensate for missed detections.Next,an improved K-means++clustering algorithm clusters bounding box regions,adaptively determining the threshold for extracting dynamic object contours as dynamic points change.This process divides the image into low dynamic,suspicious dynamic,and high dynamic regions.In the tracking thread,the dynamic point removal module assigns dynamic probability weights to the feature points in these regions.Combined with geometric methods,it detects and removes the dynamic points.The final evaluation on the public TUM RGB-D dataset shows that the proposed dynamic SLAM algorithm surpasses most existing SLAM algorithms,providing better pose estimation accuracy and robustness in dynamic environments. 展开更多
关键词 Visual SLAM dynamic scene YOLOX K-means++clustering dynamic probability
在线阅读 下载PDF
MEET:A Million-Scale Dataset for Fine-Grained Geospatial Scene Classification With Zoom-Free Remote Sensing Imagery 被引量:1
3
作者 Yansheng Li Yuning Wu +9 位作者 Gong Cheng Chao Tao Bo Dang Yu Wang Jiahao Zhang Chuge Zhang Yiting Liu Xu Tang Jiayi Ma Yongjun Zhang 《IEEE/CAA Journal of Automatica Sinica》 2025年第5期1004-1023,共20页
Accurate fine-grained geospatial scene classification using remote sensing imagery is essential for a wide range of applications.However,existing approaches often rely on manually zooming remote sensing images at diff... Accurate fine-grained geospatial scene classification using remote sensing imagery is essential for a wide range of applications.However,existing approaches often rely on manually zooming remote sensing images at different scales to create typical scene samples.This approach fails to adequately support the fixed-resolution image interpretation requirements in real-world scenarios.To address this limitation,we introduce the million-scale fine-grained geospatial scene classification dataset(MEET),which contains over 1.03 million zoom-free remote sensing scene samples,manually annotated into 80 fine-grained categories.In MEET,each scene sample follows a scene-in-scene layout,where the central scene serves as the reference,and auxiliary scenes provide crucial spatial context for fine-grained classification.Moreover,to tackle the emerging challenge of scene-in-scene classification,we present the context-aware transformer(CAT),a model specifically designed for this task,which adaptively fuses spatial context to accurately classify the scene samples.CAT adaptively fuses spatial context to accurately classify the scene samples by learning attentional features that capture the relationships between the center and auxiliary scenes.Based on MEET,we establish a comprehensive benchmark for fine-grained geospatial scene classification,evaluating CAT against 11 competitive baselines.The results demonstrate that CAT significantly outperforms these baselines,achieving a 1.88%higher balanced accuracy(BA)with the Swin-Large backbone,and a notable 7.87%improvement with the Swin-Huge backbone.Further experiments validate the effectiveness of each module in CAT and show the practical applicability of CAT in the urban functional zone mapping.The source code and dataset will be publicly available at https://jerrywyn.github.io/project/MEET.html. 展开更多
关键词 Fine-grained geospatial scene classification(FGSC) million-scale dataset remote sensing imagery(RSI) scene-in-scene transformer
在线阅读 下载PDF
From"Spatial Reconstruction"to"Scene Construction":Analysis on the Design Pathway of Waterfront Space in Tourism Cities from the Perspective of Scene Theory:A Case Study of the Xuan en Night Banquet Project in Enshi
4
作者 Shuyi SHEN 《Meteorological and Environmental Research》 2025年第4期16-19,25,共5页
With the upgrading of tourism consumption patterns,the traditional renovation models of waterfront recreational spaces centered on landscape design can no longer meet the commercial and humanistic demands of modern cu... With the upgrading of tourism consumption patterns,the traditional renovation models of waterfront recreational spaces centered on landscape design can no longer meet the commercial and humanistic demands of modern cultural and tourism development.Based on scene theory as the analytical framework and taking the Xuan en Night Banquet Project in Enshi as a case study,this paper explores the design pathway for transforming waterfront areas in tourism cities from"spatial reconstruction"to"scene construction".The study argues that waterfront space renewal should transcend mere physical renovation.By implementing three core strategies:spatial narrative framework,ecological industry creation,and cultural empowerment,it is possible to construct integrated scenarios that blend cultural value,consumption spaces,and lifestyle elements.This approach ultimately fosters sustained vitality in waterfront areas and promotes the high-quality development of cultural and tourism industry. 展开更多
关键词 scene theory Tourism city Comforts scene construction Waterfront space
在线阅读 下载PDF
Accreditation of Crime Scene Investigation under ISO17020:2012 Standard in Hong Kong,china
5
作者 Duen-yee Luk Terence Hok-man Cheung +4 位作者 Wai-nang Cheng Wai-kit Sze Man-hung Lo Joseph Sze-wai Wong Chi-keung Li 《刑事技术》 2025年第3期314-318,共5页
Crime scene investigation(CSI)is an important link in the criminal justice system as it serves as a bridge between establishing the happenings during an incident and possibly identifying the accountable persons,provid... Crime scene investigation(CSI)is an important link in the criminal justice system as it serves as a bridge between establishing the happenings during an incident and possibly identifying the accountable persons,providing light in the dark.The International Organization for Standardization(ISO)and the International Electrotechnical Commission(IEC)collaborated to develop the ISO/IEC 17020:2012 standard to govern the quality of CSI,a branch of inspection activity.These protocols include the impartiality and competence of the crime scene investigators involved,contemporary recording of scene observations and data obtained,the correct use of resources during scene processing,forensic evidence collection and handling procedures,and the confidentiality and integrity of any scene information obtained from other parties etc.The preparatory work,the accreditation processes involved and the implementation of new quality measures to the existing quality management system in order to achieve the ISO/IE 17020:2012 accreditation at the Forensic Science Division of the Government Laboratory in Hong Kong are discussed in this paper. 展开更多
关键词 ISO/IEC 17020 crime scene investigation on-site monitoring critical findings check independent check scene of crime officer SOCO
在线阅读 下载PDF
Fusion Prototypical Network for 3D Scene Graph Prediction
6
作者 Jiho Bae Bogyu Choi +1 位作者 Sumin Yeon Suwon Lee 《Computer Modeling in Engineering & Sciences》 2025年第6期2991-3003,共13页
Scene graph prediction has emerged as a critical task in computer vision,focusing on transforming complex visual scenes into structured representations by identifying objects,their attributes,and the relationships amo... Scene graph prediction has emerged as a critical task in computer vision,focusing on transforming complex visual scenes into structured representations by identifying objects,their attributes,and the relationships among them.Extending this to 3D semantic scene graph(3DSSG)prediction introduces an additional layer of complexity because it requires the processing of point-cloud data to accurately capture the spatial and volumetric characteristics of a scene.A significant challenge in 3DSSG is the long-tailed distribution of object and relationship labels,causing certain classes to be severely underrepresented and suboptimal performance in these rare categories.To address this,we proposed a fusion prototypical network(FPN),which combines the strengths of conventional neural networks for 3DSSG with a Prototypical Network.The former are known for their ability to handle complex scene graph predictions while the latter excels in few-shot learning scenarios.By leveraging this fusion,our approach enhances the overall prediction accuracy and substantially improves the handling of underrepresented labels.Through extensive experiments using the 3DSSG dataset,we demonstrated that the FPN achieves state-of-the-art performance in 3D scene graph prediction as a single model and effectively mitigates the impact of the long-tailed distribution,providing a more balanced and comprehensive understanding of complex 3D environments. 展开更多
关键词 3D scene graph prediction prototypical network 3D scene understanding
在线阅读 下载PDF
ERSNet:Lightweight Attention-Guided Network for Remote Sensing Scene Image Classification
7
作者 LIU Yunyu YUAN Jinpeng 《Journal of Geodesy and Geoinformation Science》 2025年第1期30-46,共17页
Remote sensing scene image classification is a prominent research area within remote sensing.Deep learningbased methods have been extensively utilized and have shown significant advancements in this field.Recent progr... Remote sensing scene image classification is a prominent research area within remote sensing.Deep learningbased methods have been extensively utilized and have shown significant advancements in this field.Recent progress in these methods primarily focuses on enhancing feature representation capabilities to improve performance.The challenge lies in the limited spatial resolution of small-sized remote sensing images,as well as image blurring and sparse data.These factors contribute to lower accuracy in current deep learning models.Additionally,deeper networks with attention-based modules require a substantial number of network parameters,leading to high computational costs and memory usage.In this article,we introduce ERSNet,a lightweight novel attention-guided network for remote sensing scene image classification.ERSNet is constructed using a deep separable convolutional network and incorporates an attention mechanism.It utilizes spatial attention,channel attention,and channel self-attention to enhance feature representation and accuracy,while also reducing computational complexity and memory usage.Experimental results indicate that,compared to existing state-of-the-art methods,ERSNet has a significantly lower parameter count of only 1.2 M and reduced Flops.It achieves the highest classification accuracy of 99.14%on the EuroSAT dataset,demonstrating its suitability for application on mobile terminal devices.Furthermore,experimental results from the UCMerced land use dataset and the Brazilian coffee scene also confirm the strong generalization ability of this method. 展开更多
关键词 deep learning remote sensing scene classification CNN
在线阅读 下载PDF
A Communication Scene Recognition Framework Based on Deep Learning with Multi-Sensor Fusion
8
作者 Feng Yufei Zhong Xiaofeng +1 位作者 Chen Xinwei Zhou Shidong 《China Communications》 2025年第4期174-201,共28页
This paper presents a comprehensive framework that enables communication scene recognition through deep learning and multi-sensor fusion.This study aims to address the challenge of current communication scene recognit... This paper presents a comprehensive framework that enables communication scene recognition through deep learning and multi-sensor fusion.This study aims to address the challenge of current communication scene recognition methods that struggle to adapt in dynamic environments,as they typically rely on post-response mechanisms that fail to detect scene changes before users experience latency.The proposed framework leverages data from multiple smartphone sensors,including acceleration sensors,gyroscopes,magnetic field sensors,and orientation sensors,to identify different communication scenes,such as walking,running,cycling,and various modes of transportation.Extensive experimental comparative analysis with existing methods on the open-source SHL-2018 dataset confirmed the superior performance of our approach in terms of F1 score and processing speed.Additionally,tests using a Microsoft Surface Pro tablet and a self-collected Beijing-2023 dataset have validated the framework's efficiency and generalization capability.The results show that our framework achieved an F1 score of 95.15%on SHL-2018and 94.6%on Beijing-2023,highlighting its robustness across different datasets and conditions.Furthermore,the levels of computational complexity and power consumption associated with the algorithm are moderate,making it suitable for deployment on mobile devices. 展开更多
关键词 communication scene recognition deep learning sensor fusion SHL smartphone-based applications
在线阅读 下载PDF
HybridLSTM:An Innovative Method for Road Scene Categorization Employing Hybrid Features
9
作者 Sanjay P.Pande Sarika Khandelwal +4 位作者 Ganesh K.Yenurkar Rakhi D.Wajgi Vincent O.Nyangaresi Pratik R.Hajare Poonam T.Agarkar 《Computers, Materials & Continua》 2025年第9期5937-5975,共39页
Recognizing road scene context from a single image remains a critical challenge for intelligent autonomous driving systems,particularly in dynamic and unstructured environments.While recent advancements in deep learni... Recognizing road scene context from a single image remains a critical challenge for intelligent autonomous driving systems,particularly in dynamic and unstructured environments.While recent advancements in deep learning have significantly enhanced road scene classification,simultaneously achieving high accuracy,computational efficiency,and adaptability across diverse conditions continues to be difficult.To address these challenges,this study proposes HybridLSTM,a novel and efficient framework that integrates deep learning-based,object-based,and handcrafted feature extraction methods within a unified architecture.HybridLSTM is designed to classify four distinct road scene categories—crosswalk(CW),highway(HW),overpass/tunnel(OP/T),and parking(P)—by leveraging multiple publicly available datasets,including Places-365,BDD100K,LabelMe,and KITTI,thereby promoting domain generalization.The framework fuses object-level features extracted using YOLOv5 and VGG19,scene-level global representations obtained from a modified VGG19,and fine-grained texture features captured through eight handcrafted descriptors.This hybrid feature fusion enables the model to capture both semantic context and low-level visual cues,which are critical for robust scene understanding.To model spatial arrangements and latent sequential dependencies present even in static imagery,the combined features are processed through a Long Short-Term Memory(LSTM)network,allowing the extraction of discriminative patterns across heterogeneous feature spaces.Extensive experiments conducted on 2725 annotated road scene images,with an 80:20 training-to-testing split,validate the effectiveness of the proposed model.HybridLSTM achieves a classification accuracy of 96.3%,a precision of 95.8%,a recall of 96.1%,and an F1-score of 96.0%,outperforming several existing state-of-the-art methods.These results demonstrate the robustness,scalability,and generalization capability of HybridLSTM across varying environments and scene complexities.Moreover,the framework is optimized to balance classification performance with computational efficiency,making it highly suitable for real-time deployment in embedded autonomous driving systems.Future work will focus on extending the model to multi-class detection within a single frame and optimizing it further for edge-device deployments to reduce computational overhead in practical applications. 展开更多
关键词 HybridLSTM autonomous vehicles road scene classification critical requirement global features handcrafted features
在线阅读 下载PDF
Monocular visual estimation for autonomous aircraft landing guidance in unknown structured scenes
10
作者 Zhuo ZHANG Quanrui CHEN +2 位作者 Qiufu WANG Xiaoliang SUN Qifeng YU 《Chinese Journal of Aeronautics》 2025年第9期365-382,共18页
The autonomous landing guidance of fixed-wing aircraft in unknown structured scenes presents a substantial technological challenge,particularly regarding the effectiveness of solutions for monocular visual relative po... The autonomous landing guidance of fixed-wing aircraft in unknown structured scenes presents a substantial technological challenge,particularly regarding the effectiveness of solutions for monocular visual relative pose estimation.This study proposes a novel airborne monocular visual estimation method based on structured scene features to address this challenge.First,a multitask neural network model is established for segmentation,depth estimation,and slope estimation on monocular images.And a monocular image comprehensive three-dimensional information metric is designed,encompassing length,span,flatness,and slope information.Subsequently,structured edge features are leveraged to filter candidate landing regions adaptively.By leveraging the three-dimensional information metric,the optimal landing region is accurately and efficiently identified.Finally,sparse two-dimensional key point is used to parameterize the optimal landing region for the first time and a high-precision relative pose estimation is achieved.Additional measurement information is introduced to provide the autonomous landing guidance information between the aircraft and the optimal landing region.Experimental results obtained from both synthetic and real data demonstrate the effectiveness of the proposed method in monocular pose estimation for autonomous aircraft landing guidance in unknown structured scenes. 展开更多
关键词 Automatic landing Image processing Monocular camera Pose measurement Unknown structured scene
原文传递
Unsupervised Monocular Depth Estimation with Edge Enhancement for Dynamic Scenes
11
作者 Peicheng Shi Yueyue Tang +3 位作者 Yi Li Xinlong Dong Yu Sun Aixi Yang 《Computers, Materials & Continua》 2025年第8期3321-3343,共23页
In the dynamic scene of autonomous vehicles,the depth estimation of monocular cameras often faces the problem of inaccurate edge depth estimation.To solve this problem,we propose an unsupervised monocular depth estima... In the dynamic scene of autonomous vehicles,the depth estimation of monocular cameras often faces the problem of inaccurate edge depth estimation.To solve this problem,we propose an unsupervised monocular depth estimation model based on edge enhancement,which is specifically aimed at the depth perception challenge in dynamic scenes.The model consists of two core networks:a deep prediction network and a motion estimation network,both of which adopt an encoder-decoder architecture.The depth prediction network is based on the U-Net structure of ResNet18,which is responsible for generating the depth map of the scene.The motion estimation network is based on the U-Net structure of Flow-Net,focusing on the motion estimation of dynamic targets.In the decoding stage of the motion estimation network,we innovatively introduce an edge-enhanced decoder,which integrates a convolutional block attention module(CBAM)in the decoding process to enhance the recognition ability of the edge features of moving objects.In addition,we also designed a strip convolution module to improve the model’s capture efficiency of discrete moving targets.To further improve the performance of the model,we propose a novel edge regularization method based on the Laplace operator,which effectively accelerates the convergence process of themodel.Experimental results on the KITTI and Cityscapes datasets show that compared with the current advanced dynamic unsupervised monocular model,the proposed model has a significant improvement in depth estimation accuracy and convergence speed.Specifically,the rootmean square error(RMSE)is reduced by 4.8%compared with the DepthMotion algorithm,while the training convergence speed is increased by 36%,which shows the superior performance of the model in the depth estimation task in dynamic scenes. 展开更多
关键词 Dynamic scenes unsupervised learning monocular depth edge enhancement
在线阅读 下载PDF
BSDNet:Semantic Information Distillation-Based for Bilateral-Branch Real-Time Semantic Segmentation on Street Scene Image
12
作者 Huan Zeng Jianxun Zhang +1 位作者 Hongji Chen Xinwei Zhu 《Computers, Materials & Continua》 2025年第11期3879-3896,共18页
Semantic segmentation in street scenes is a crucial technology for autonomous driving to analyze the surrounding environment.In street scenes,issues such as high image resolution caused by a large viewpoints and diffe... Semantic segmentation in street scenes is a crucial technology for autonomous driving to analyze the surrounding environment.In street scenes,issues such as high image resolution caused by a large viewpoints and differences in object scales lead to a decline in real-time performance and difficulties in multi-scale feature extraction.To address this,we propose a bilateral-branch real-time semantic segmentationmethod based on semantic information distillation(BSDNet)for street scene images.The BSDNet consists of a Feature Conversion Convolutional Block(FCB),a Semantic Information Distillation Module(SIDM),and a Deep Aggregation Atrous Convolution Pyramid Pooling(DASP).FCB reduces the semantic gap between the backbone and the semantic branch.SIDM extracts high-quality semantic information fromthe Transformer branch to reduce computational costs.DASP aggregates information lost in atrous convolutions,effectively capturingmulti-scale objects.Extensive experiments conducted on Cityscapes,CamVid,and ADE20K,achieving an accuracy of 81.7% Mean Intersection over Union(mIoU)at 70.6 Frames Per Second(FPS)on Cityscapes,demonstrate that our method achieves a better balance between accuracy and inference speed. 展开更多
关键词 Street scene understanding real-time semantic segmentation knowledge distillation multi-scale feature extraction
在线阅读 下载PDF
Navigating with Spatial Intelligence:A Survey of Scene Graph-Based Object Goal Navigation
13
作者 GUO Chi LI Aolin MENG Yiyue 《Wuhan University Journal of Natural Sciences》 2025年第5期405-426,共22页
Today,autonomous mobile robots are widely used in all walks of life.Autonomous navigation,as a basic capability of robots,has become a research hotspot.Classical navigation techniques,which rely on pre-built maps,stru... Today,autonomous mobile robots are widely used in all walks of life.Autonomous navigation,as a basic capability of robots,has become a research hotspot.Classical navigation techniques,which rely on pre-built maps,struggle to cope with complex and dynamic environments.With the development of artificial intelligence,learning-based navigation technology have emerged.Instead of relying on pre-built maps,the agent perceives the environment and make decisions through visual observation,enabling end-to-end navigation.A key challenge is to enhance the generalization ability of the agent in unfamiliar environments.To tackle this challenge,it is necessary to endow the agent with spatial intelligence.Spatial intelligence refers to the ability of the agent to transform visual observations into insights,in-sights into understanding,and understanding into actions.To endow the agent with spatial intelligence,relevant research uses scene graph to represent the environment.We refer to this method as scene graph-based object goal navigation.In this paper,we concentrate on scene graph,offering formal description,computational framework of object goal navigation.We provide a comprehensive summary of the meth-ods for constructing and applying scene graph.Additionally,we present experimental evidence that highlights the critical role of scene graph in improving navigation success.This paper also delineates promising research directions,all aimed at sharpening the focus on scene graph.Overall,this paper shows how scene graph endows the agent with spatial intelligence,aiming to promote the importance of scene graph in the field of intelligent navigation. 展开更多
关键词 object goal navigation scene graph spatial intelligence deep reinforcement learning
原文传递
Self-Supervised Monocular Depth Estimation with Scene Dynamic Pose
14
作者 Jing He Haonan Zhu +1 位作者 Chenhao Zhao Minrui Zhao 《Computers, Materials & Continua》 2025年第6期4551-4573,共23页
Self-supervised monocular depth estimation has emerged as a major research focus in recent years,primarily due to the elimination of ground-truth depth dependence.However,the prevailing architectures in this domain su... Self-supervised monocular depth estimation has emerged as a major research focus in recent years,primarily due to the elimination of ground-truth depth dependence.However,the prevailing architectures in this domain suffer from inherent limitations:existing pose network branches infer camera ego-motion exclusively under static-scene and Lambertian-surface assumptions.These assumptions are often violated in real-world scenarios due to dynamic objects,non-Lambertian reflectance,and unstructured background elements,leading to pervasive artifacts such as depth discontinuities(“holes”),structural collapse,and ambiguous reconstruction.To address these challenges,we propose a novel framework that integrates scene dynamic pose estimation into the conventional self-supervised depth network,enhancing its ability to model complex scene dynamics.Our contributions are threefold:(1)a pixel-wise dynamic pose estimation module that jointly resolves the pose transformations of moving objects and localized scene perturbations;(2)a physically-informed loss function that couples dynamic pose and depth predictions,designed to mitigate depth errors arising from high-speed distant objects and geometrically inconsistent motion profiles;(3)an efficient SE(3)transformation parameterization that streamlines network complexity and temporal pre-processing.Extensive experiments on the KITTI and NYU-V2 benchmarks show that our framework achieves state-of-the-art performance in both quantitative metrics and qualitative visual fidelity,significantly improving the robustness and generalization of monocular depth estimation under dynamic conditions. 展开更多
关键词 Monocular depth estimation self-supervised learning scene dynamic pose estimation dynamic-depth constraint pixel-wise dynamic pose
在线阅读 下载PDF
Video action recognition meets vision-language models exploring human factors in scene interaction: a review
15
作者 GUO Yuping GAO Hongwei +3 位作者 YU Jiahui GE Jinchao HAN Meng JU Zhaojie 《Optoelectronics Letters》 2025年第10期626-640,共15页
Video action recognition(VAR)aims to analyze dynamic behaviors in videos and achieve semantic understanding.VAR faces challenges such as temporal dynamics,action-scene coupling,and the complexity of human interactions... Video action recognition(VAR)aims to analyze dynamic behaviors in videos and achieve semantic understanding.VAR faces challenges such as temporal dynamics,action-scene coupling,and the complexity of human interactions.Existing methods can be categorized into motion-level,event-level,and story-level ones based on spatiotemporal granularity.However,single-modal approaches struggle to capture complex behavioral semantics and human factors.Therefore,in recent years,vision-language models(VLMs)have been introduced into this field,providing new research perspectives for VAR.In this paper,we systematically review spatiotemporal hierarchical methods in VAR and explore how the introduction of large models has advanced the field.Additionally,we propose the concept of“Factor”to identify and integrate key information from both visual and textual modalities,enhancing multimodal alignment.We also summarize various multimodal alignment methods and provide in-depth analysis and insights into future research directions. 展开更多
关键词 human factors video action recognition vision language models analyze dynamic behaviors spatiotemporal granularity video action recognition var aims multimodal alignment scene interaction
原文传递
Open-Vocabulary 3D Scene Segmentation via Dual-Modal Interaction
16
作者 Wuyang Luan Lei Pan +2 位作者 Junhui Li Yuan Zheng Chang Xu 《IEEE/CAA Journal of Automatica Sinica》 2025年第10期2156-2158,共3页
Dear Editor,This letter proposes an innovative open-vocabulary 3D scene understanding model based on visual-language model.By efficiently integrating 3D point cloud data,image data,and text data,our model effectively ... Dear Editor,This letter proposes an innovative open-vocabulary 3D scene understanding model based on visual-language model.By efficiently integrating 3D point cloud data,image data,and text data,our model effectively overcomes the segmentation problem[1],[2]of traditional models dealing with unknown categories[3].By deeply learning the deep semantic mapping between vision and language,the network significantly improves its ability to recognize unlabeled categories and exceeds current state-of-the-art methods in the task of scene understanding in open-vocabulary. 展开更多
关键词 segmentation problem open vocabulary recognize unlabeled categories deeply learning deep semantic mapping traditional models D scene segmentation text dataour visual language model
在线阅读 下载PDF
基于颜色特征量化和改进YOLO v8的番茄成熟度分级检测方法
17
作者 张领先 周沁 +4 位作者 姚天雨 裴鑫达 赵立群 满杰 钱井 《农业机械学报》 北大核心 2026年第2期193-202,224,共11页
番茄的成熟度与其品质密切相关,是生产中采摘和分拣等环节的重要依据。针对作物成熟度分级检测系统功能简单,人工升级系统成本较大的问题,本文以番茄为例,采集并构建自然场景下番茄图像数据集,设计以番茄果实成熟度分级算法为基础的番... 番茄的成熟度与其品质密切相关,是生产中采摘和分拣等环节的重要依据。针对作物成熟度分级检测系统功能简单,人工升级系统成本较大的问题,本文以番茄为例,采集并构建自然场景下番茄图像数据集,设计以番茄果实成熟度分级算法为基础的番茄图像半自动标注算法对采集后的数据进行标注,在YOLO v8模型基础上,将FPN结构替换为BiFPN结构实现更高效的多尺度特征融合,利用SE注意力机制对空间和通道进行融合特征提取,引入Focal SIoU损失函数对预测框与真实框之间的角度差异进行度量,构建基于颜色特征量化和改进YOLO v8的番茄成熟度分级检测模型YOLO v8BFS,识别番茄生长过程的5个不同成熟度。试验结果表明,本文模型较好地解决了自然复杂场景下番茄成熟度分级检测的错漏检问题,在模型浮点运算量(FLOPs)、参数量(Params)和内存占用量有少量增加的条件下,本文模型的平均精度均值为94.10%相较原模型YOLO v8提高3.0个百分点。通过与Faster R-CNN-Resnet50、YOLO v5、YOLO v7-tiny、YOLO v8、YOLO v10和YOLO 11目标检测模型对比,本文在检测精度具有显著优势,为番茄成熟度的检测提供了一种可靠的方法。 展开更多
关键词 番茄成熟度 自然场景 颜色特征量化 YOLO v8
在线阅读 下载PDF
分段分层的立交桥建模方法
18
作者 应申 漆璇 +3 位作者 李玉 王润泽 鲁月新 陶璐 《测绘通报》 北大核心 2026年第1期122-129,共8页
道路是城市交通系统的核心要素,是连接各区域、承载交通流量的基础设施。立交桥作为城市道路网络的枢纽结构,是交通分流的核心工具,因此,立交桥结构的模型构建对于城市交通仿真、导航定位等领域具有重要作用。然而,传统立交桥建模方法... 道路是城市交通系统的核心要素,是连接各区域、承载交通流量的基础设施。立交桥作为城市道路网络的枢纽结构,是交通分流的核心工具,因此,立交桥结构的模型构建对于城市交通仿真、导航定位等领域具有重要作用。然而,传统立交桥建模方法存在语义几何关联较弱与三维拓扑表达不足的问题。为此,本文通过对立交桥结构的分析,提出了一种普适的分段分层立交桥模型构建方法,实现了高程数据受限的低数据成本下的城市复杂交通场景的三维立体构建。此方法能够对具有复杂结构的不同类型的立交桥进行建模,提升了立交桥建模效率,精准刻画了立交桥的空间层次关系,可有效支撑道路导航、自动驾驶、城市虚拟场景模拟等仿真应用。 展开更多
关键词 立交桥 交通场景 分段分层 道路仿真 自动驾驶
原文传递
基于ALD—YOLO的自然场景下苹果叶片病害检测方法研究
19
作者 刘霞 周家横 古庆辉 《中国农机化学报》 北大核心 2026年第1期52-61,共10页
针对自然场景下苹果叶片病害检测存在背景噪声、病害表征相似及小尺度目标多等因素导致模型检测精度不足的挑战,提出一种ALD—YOLO算法,旨在实现自然场景下苹果叶片病害的精准检测。首先,提出一种CBAM—G注意力机制并嵌入主干网络,使网... 针对自然场景下苹果叶片病害检测存在背景噪声、病害表征相似及小尺度目标多等因素导致模型检测精度不足的挑战,提出一种ALD—YOLO算法,旨在实现自然场景下苹果叶片病害的精准检测。首先,提出一种CBAM—G注意力机制并嵌入主干网络,使网络在特征提取阶段聚焦病害关键特征并降低图片形变产生的影响。其次,将多层次特征信息输入OCR模块并反馈给头部网络,使模型获取更为丰富的像素级语义信息从而精准分类。最后,提出一种动态赋权IoU损失函数,通过为小尺度目标动态赋权来提高小尺度目标损失,进而提升对小尺度目标的检测精度。结果表明,该算法在自制数据集上的精确率为84.93%、召回率为72.88%、平均精度均值为77.48%、处理速度达24.74帧/s。 展开更多
关键词 苹果 叶片病害检测 YOLOX 自然场景 OCR模块
在线阅读 下载PDF
无信号灯控制交叉路口车-车交互场景基元库构建
20
作者 胡林 陈旭东 +1 位作者 王兴华 杨非凡 《汽车工程》 北大核心 2026年第2期332-341,477,共11页
虚拟测试为自动驾驶系统的功能性和安全性评估与验证提供了一种强可控、低成本、高效的方案,但现有虚拟测试场景忽略了车辆速度与航向角等特征在交互过程中的描述,导致场景的真实性和覆盖度不足,降低了测试的可靠性。准确识别并表述车-... 虚拟测试为自动驾驶系统的功能性和安全性评估与验证提供了一种强可控、低成本、高效的方案,但现有虚拟测试场景忽略了车辆速度与航向角等特征在交互过程中的描述,导致场景的真实性和覆盖度不足,降低了测试的可靠性。准确识别并表述车-车交互过程的特征对于提升测试场景的真实性具有关键作用,因此本文基于inD数据集,构建了无信号灯控制交叉路口场景下车-车的交互场景基元库。首先,采用非参数贝叶斯模型对两车交互序列进行场景基元划分,并将场景基元表征的多维连续变量离散化为具备语义区分的分类变量;其次,通过独热编码和汉明距离构建距离矩阵,结合K-Medoids算法进行聚类分析,识别出25类具有不同运动特征的两车交互场景基元,系统揭示了真实交通环境中车-车交互行为的多样性。该交互场景基元库可进一步提高自动驾驶虚拟测试场景的真实性和多样性。 展开更多
关键词 无信号灯控制交叉路口 车-车交互 非参数贝叶斯 场景基元库
在线阅读 下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部