期刊文献+
共找到494篇文章
< 1 2 25 >
每页显示 20 50 100
DCA-YOLO:Detection Algorithm for YOLOv8 Pulmonary Nodules Based on Attention Mechanism Optimization 被引量:1
1
作者 SONG Yongsheng LIU Guohua 《Journal of Donghua University(English Edition)》 2025年第1期78-87,共10页
Pulmonary nodules represent an early manifestation of lung cancer.However,pulmonary nodules only constitute a small portion of the overall image,posing challenges for physicians in image interpretation and potentially... Pulmonary nodules represent an early manifestation of lung cancer.However,pulmonary nodules only constitute a small portion of the overall image,posing challenges for physicians in image interpretation and potentially leading to false positives or missed detections.To solve these problems,the YOLOv8 network is enhanced by adding deformable convolution and atrous spatial pyramid pooling(ASPP),along with the integration of a coordinate attention(CA)mechanism.This allows the network to focus on small targets while expanding the receptive field without losing resolution.At the same time,context information on the target is gathered and feature expression is enhanced by attention modules in different directions.It effectively improves the positioning accuracy and achieves good results on the LUNA16 dataset.Compared with other detection algorithms,it improves the accuracy of pulmonary nodule detection to a certain extent. 展开更多
关键词 pulmonary nodule YOLOv8 network object detection deformable convolution atrous spatial pyramid pooling(ASPP) coordinate attention(CA)mechanism
在线阅读 下载PDF
GFRF R-CNN:Object Detection Algorithm for Transmission Lines
2
作者 Xunguang Yan Wenrui Wang +3 位作者 Fanglin Lu Hongyong Fan Bo Wu Jianfeng Yu 《Computers, Materials & Continua》 SCIE EI 2025年第1期1439-1458,共20页
To maintain the reliability of power systems,routine inspections using drones equipped with advanced object detection algorithms are essential for preempting power-related issues.The increasing resolution of drone-cap... To maintain the reliability of power systems,routine inspections using drones equipped with advanced object detection algorithms are essential for preempting power-related issues.The increasing resolution of drone-captured images has posed a challenge for traditional target detection methods,especially in identifying small objects in high-resolution images.This study presents an enhanced object detection algorithm based on the Faster Regionbased Convolutional Neural Network(Faster R-CNN)framework,specifically tailored for detecting small-scale electrical components like insulators,shock hammers,and screws in transmission line.The algorithm features an improved backbone network for Faster R-CNN,which significantly boosts the feature extraction network’s ability to detect fine details.The Region Proposal Network is optimized using a method of guided feature refinement(GFR),which achieves a balance between accuracy and speed.The incorporation of Generalized Intersection over Union(GIOU)and Region of Interest(ROI)Align further refines themodel’s accuracy.Experimental results demonstrate a notable improvement in mean Average Precision,reaching 89.3%,an 11.1%increase compared to the standard Faster R-CNN.This highlights the effectiveness of the proposed algorithm in identifying electrical components in high-resolution aerial images. 展开更多
关键词 Faster R-CNN transmission line object detection GIOU GFR
在线阅读 下载PDF
MSFNet:A Network for Lunar Impact Crater Detection Based on Enhanced Feature Fusion with Digital Elevation Model
3
作者 HE Weidong LAI Jialong +3 位作者 ZHONG Zhicheng CUI Feifei XU Yi ZHANG Xiaoping 《深空探测学报(中英文)》 北大核心 2025年第2期190-204,共15页
Lunar impact crater detection is crucial for lunar surface studies and spacecraft landing missions,yet deep learning still struggles with accurately detecting small craters,especially when relying on incomplete catalo... Lunar impact crater detection is crucial for lunar surface studies and spacecraft landing missions,yet deep learning still struggles with accurately detecting small craters,especially when relying on incomplete catalogs.In this work,we integrate Digital Elevation Model(DEM)data to construct a high-quality dataset enriched with slope information,enabling a detailed analysis of crater features and effectively improving detection performance in complex terrains and low-contrast areas.Based on this foundation,we propose a novel two-stage detection network,MSFNet,which leverages multi-scale adaptive feature fusion and multisize ROI pooling to enhance the recognition of craters across various scales.Experimental results demonstrate that MSFNet achieves an F1 score of 74.8%on Test Region1 and a recall rate of 87%for craters with diameters larger than 2 km.Moreover,it shows exceptional performance in detecting sub-kilometer craters by successfully identifying a large number of high-confidence,previously unlabeled targets with a low false detection rate confirmed through manual review.This approach offers an efficient and reliable deep learning solution for lunar impact crater detection. 展开更多
关键词 object detection deep learning impact crater DEM
在线阅读 下载PDF
Wheat Pest Detection Based on PSA-YOLO11n
4
作者 KANG JiChang ZHAO LianJun 《农业大数据学报》 2025年第3期294-306,共13页
To address the challenges of low detection accuracy caused by the diverse species,significant size variations,and complex growth environments of wheat pests in natural settings,a PSA-YOLO11n algorithm is proposed to e... To address the challenges of low detection accuracy caused by the diverse species,significant size variations,and complex growth environments of wheat pests in natural settings,a PSA-YOLO11n algorithm is proposed to enhance detection precision.Building upon the YOLO11n framework,the proposed improvements include three key components:1)SimCSPSPPF in Backbone:An improved Spatial Pyramid Pooling-Fast(SPPF)module,SimCSPSPPF,is integrated into the Backbone to reduce the number of channels in the hidden layers,thereby accelerating model training.2)PEC in Neck:The standard convolution layers in the Neck are replaced with Perception Enhancement Convolutions(PEC)to improve multi-scale feature extraction capabilities,enhancing detection speed.3)AWIoU Loss Function:The regression loss function is replaced with Adequate Wise IoU(AWIoU),addressing issues of bounding box distortion caused by the diversity in pest species and size variations,thereby improving the precision of bounding box localization.Experimental evaluations on the IP102 dataset demonstrate that PSA-YOLO11n achieves a mean Average Precision(mAP)of 89.10%,surpassing YOLO11n by 0.8%.Comparisons with other mainstream algorithms,including Faster R-CNN,RetinaNet,YOLOv5s,YOLOv8n,YOLOv10n,and YOLO11n,confirm that PSA-YOLO11n outperforms all baselines in terms of detection performance.These results highlight the algorithm’s capability to significantly improve the detection accuracy of multi-scale wheat pests in natural environments,providing an effective solution for pest management in wheat production. 展开更多
关键词 agricultural pests object detection YOLO11 SimCSPSPPF PEC AWIoU
在线阅读 下载PDF
Deep Learning-Based Faulty Wood Detection with Area Attention
5
作者 Vinh Truong Hoang Viet-Tuan Le +4 位作者 Nghia Dinh Kiet Tran-Trung Bay Nguyen Van Ha Duong Thi Hong Thien Ho Huong 《Computers, Materials & Continua》 2025年第10期1495-1514,共20页
Improving consumer satisfaction with the appearance and surface quality of wood-based products requires inspection methods that are both accurate and efficient.The adoption of artificial intelligence(AI)for surface ev... Improving consumer satisfaction with the appearance and surface quality of wood-based products requires inspection methods that are both accurate and efficient.The adoption of artificial intelligence(AI)for surface evaluation has emerged as a promising solution.Since the visual appeal of wooden products directly impacts their market value and overall business success,effective quality control is crucial.However,conventional inspection techniques often fail to meet performance requirements due to limited accuracy and slow processing times.To address these shortcomings,the authors propose a real-time deep learning-based system for evaluating surface appearance quality.The method integrates object detection and classification within an area attention framework and leverages R-ELAN for advanced fine-tuning.This architecture supports precise identification and classification of multiple objects,even under ambiguous or visually complex conditions.Furthermore,the model is computationally efficient and well-suited to moderate or domain-specific datasets commonly found in industrial inspection tasks.Experimental validation on the Zenodo dataset shows that the model achieves an average precision(AP)of 60.6%,outperforming the current state-of-the-art YOLOv12 model(55.3%),with a fast inference time of approximately 70 milliseconds.These results underscore the potential of AI-powered methods to enhance surface quality inspection in the wood manufacturing sector. 展开更多
关键词 Object detection deep learning R-ELAN multi-head wood defect computer vision
在线阅读 下载PDF
Point-voxel dual transformer for LiDAR 3D object detection
6
作者 TONG Jigang YANG Fanhang +1 位作者 YANG Sen DU Shengzhi 《Optoelectronics Letters》 2025年第9期547-554,共8页
In this paper,a two-stage light detection and ranging(LiDAR) three-dimensional(3D) object detection framework is presented,namely point-voxel dual transformer(PV-DT3D),which is a transformer-based method.In the propos... In this paper,a two-stage light detection and ranging(LiDAR) three-dimensional(3D) object detection framework is presented,namely point-voxel dual transformer(PV-DT3D),which is a transformer-based method.In the proposed PV-DT3D,point-voxel fusion features are used for proposal refinement.Specifically,keypoints are sampled from entire point cloud scene and used to encode representative scene features via a proposal-aware voxel set abstraction module.Subsequently,following the generation of proposals by the region proposal networks(RPN),the internal encoded keypoints are fed into the dual transformer encoder-decoder architecture.In 3D object detection,the proposed PV-DT3D takes advantage of both point-wise transformer and channel-wise architecture to capture contextual information from the spatial and channel dimensions.Experiments conducted on the highly competitive KITTI 3D car detection leaderboard show that the PV-DT3D achieves superior detection accuracy among state-of-the-art point-voxel-based methods. 展开更多
关键词 proposal refinement encode representative scene features point voxel dual transformer object detection LIDAR d object detection generation proposals proposal refinementspecificallykeypoints
原文传递
Infrared road object detection algorithm based on spatial depth channel attention network and improved YOLOv8
7
作者 LI Song SHI Tao +1 位作者 JING Fangke CUI Jie 《Optoelectronics Letters》 2025年第8期491-498,共8页
Aiming at the problems of low detection accuracy and large model size of existing object detection algorithms applied to complex road scenes,an improved you only look once version 8(YOLOv8)object detection algorithm f... Aiming at the problems of low detection accuracy and large model size of existing object detection algorithms applied to complex road scenes,an improved you only look once version 8(YOLOv8)object detection algorithm for infrared images,F-YOLOv8,is proposed.First,a spatial-to-depth network replaces the traditional backbone network's strided convolution or pooling layer.At the same time,it combines with the channel attention mechanism so that the neural network focuses on the channels with large weight values to better extract low-resolution image feature information;then an improved feature pyramid network of lightweight bidirectional feature pyramid network(L-BiFPN)is proposed,which can efficiently fuse features of different scales.In addition,a loss function of insertion of union based on the minimum point distance(MPDIoU)is introduced for bounding box regression,which obtains faster convergence speed and more accurate regression results.Experimental results on the FLIR dataset show that the improved algorithm can accurately detect infrared road targets in real time with 3%and 2.2%enhancement in mean average precision at 50%IoU(mAP50)and mean average precision at 50%—95%IoU(mAP50-95),respectively,and 38.1%,37.3%and 16.9%reduction in the number of model parameters,the model weight,and floating-point operations per second(FLOPs),respectively.To further demonstrate the detection capability of the improved algorithm,it is tested on the public dataset PASCAL VOC,and the results show that F-YOLO has excellent generalized detection performance. 展开更多
关键词 feature pyramid network infrared road object detection infrared imagesf yolov backbone networks channel attention mechanism spatial depth channel attention network object detection improved YOLOv
原文传递
FastSECOND:Real-Time 3D Detection via Swin-Transformer Enhanced SECOND with Geometry-Aware Learning
8
作者 Xinyu Li Gang Wan +4 位作者 Xinyang Chen Liyue Qie Xinnan Fan Pengfei Shi Jin Wan 《Computer Modeling in Engineering & Sciences》 2025年第7期1071-1090,共20页
The inherent limitations of 2D object detection,such as inadequate spatial reasoning and susceptibility to environmental occlusions,pose significant risks to the safety and reliability of autonomous driving systems.To... The inherent limitations of 2D object detection,such as inadequate spatial reasoning and susceptibility to environmental occlusions,pose significant risks to the safety and reliability of autonomous driving systems.To address these challenges,this paper proposes an enhanced 3D object detection framework(FastSECOND)based on an optimized SECOND architecture,designed to achieve rapid and accurate perception in autonomous driving scenarios.Key innovations include:(1)Replacing the Rectified Linear Unit(ReLU)activation functions with the Gaussian Error Linear Unit(GELU)during voxel feature encoding and region proposal network stages,leveraging partial convolution to balance computational efficiency and detection accuracy;(2)Integrating a Swin-Transformer V2 module into the voxel backbone network to enhance feature extraction capabilities in sparse data;and(3)Introducing an optimized position regression loss combined with a geometry-aware Focal-EIoU loss function,which incorporates bounding box geometric correlations to accelerate network convergence.While this study currently focuses exclusively on the detection of the Car category,with experiments conducted on the Car class of the KITTI dataset,future work will extend to other categories such as Pedestrian and Cyclist to more comprehensively evaluate the generalization capability of the proposed framework.Extensive experimental results demonstrate that our framework achieves a more effective trade-off between detection accuracy and speed.Compared to the baseline SECOND model,it achieves a 21.9%relative improvement in 3D bounding box detection accuracy on the hard subset,while reducing inference time by 14 ms.These advancements underscore the framework’s potential for enabling real-time,high-precision perception in autonomous driving applications. 展开更多
关键词 3D object detection automatic driving Deep Learning SECOND geometry-aware learning
在线阅读 下载PDF
Hybrid receptive field network for small object detection on drone view
9
作者 Zhaodong CHEN Hongbing JI +2 位作者 Yongquan ZHANG Wenke LIU Zhigang ZHU 《Chinese Journal of Aeronautics》 2025年第2期322-338,共17页
Drone-based small object detection is of great significance in practical applications such as military actions, disaster rescue, transportation, etc. However, the severe scale differences in objects captured by drones... Drone-based small object detection is of great significance in practical applications such as military actions, disaster rescue, transportation, etc. However, the severe scale differences in objects captured by drones and lack of detail information for small-scale objects make drone-based small object detection a formidable challenge. To address these issues, we first develop a mathematical model to explore how changing receptive fields impacts the polynomial fitting results. Subsequently, based on the obtained conclusions, we propose a simple but effective Hybrid Receptive Field Network (HRFNet), whose modules include Hybrid Feature Augmentation (HFA), Hybrid Feature Pyramid (HFP) and Dual Scale Head (DSH). Specifically, HFA employs parallel dilated convolution kernels of different sizes to extend shallow features with different receptive fields, committed to improving the multi-scale adaptability of the network;HFP enhances the perception of small objects by capturing contextual information across layers, while DSH reconstructs the original prediction head utilizing a set of high-resolution features and ultrahigh-resolution features. In addition, in order to train HRFNet, the corresponding dual-scale loss function is designed. Finally, comprehensive evaluation results on public benchmarks such as VisDrone-DET and TinyPerson demonstrate the robustness of the proposed method. Most impressively, the proposed HRFNet achieves a mAP of 51.0 on VisDrone-DET with 29.3 M parameters, which outperforms the extant state-of-the-art detectors. HRFNet also performs excellently in complex scenarios captured by drones, achieving the best performance on the CS-Drone dataset we built. 展开更多
关键词 Drone remote sensing Object detection on drone view Small object detector Hybrid receptive field Feature pyramid network Feature augmentation Multi-scale object detection
原文传递
Lightweight real-time micro-object detection framework
10
作者 GE Haitao ZHANG Mingyao +3 位作者 WEI Yonggeng ZHANG Hongshi CAO Xinxin SHI Yong 《黑龙江大学工程学报(中英俄文)》 2025年第2期56-66,共11页
Accurate defect detection plays a critical role in ensuring product quality and equipment reliability.Small-object detection poses unique challenges due to weak feature representation and significant background interf... Accurate defect detection plays a critical role in ensuring product quality and equipment reliability.Small-object detection poses unique challenges due to weak feature representation and significant background interference.To address these issues,this study incorporates three key innovations into the YOLOv8 framework:the use of GhostNet convolution for lightweight and efficient feature extraction,the addition of a P2 detection layer to enhance small-object detection capabilities,and the integration of the Triplet Attention mechanism to capture comprehensive spatial and channel dependencies.These improvements collectively optimize detection performance for small objects while reducing computational complexity.Experimental results demonstrate that the enhanced model achieves a mean average precision(mAP@0.5)of 97.46%and a mAP@0.5∶0.95 of 61.84%,representing a performance improvement of 1.9%and 3.2%,respectively,compared to the baseline YOLOv8 model.Additionally,the model achieves a frame rate of 158 FPS,maintaining real-time detection capabilities while reducing the parameter count by 50%,further underscoring its efficiency and suitability for smallobject detection in complex scenarios. 展开更多
关键词 GhostNet P2 detection layer Triplet Attention YOLOv8 small object detection
在线阅读 下载PDF
Coupling the Power of YOLOv9 with Transformer for Small Object Detection in Remote-Sensing Images
11
作者 Mohammad Barr 《Computer Modeling in Engineering & Sciences》 2025年第4期593-616,共24页
Recent years have seen a surge in interest in object detection on remote sensing images for applications such as surveillance andmanagement.However,challenges like small object detection,scale variation,and the presen... Recent years have seen a surge in interest in object detection on remote sensing images for applications such as surveillance andmanagement.However,challenges like small object detection,scale variation,and the presence of closely packed objects in these images hinder accurate detection.Additionally,the motion blur effect further complicates the identification of such objects.To address these issues,we propose enhanced YOLOv9 with a transformer head(YOLOv9-TH).The model introduces an additional prediction head for detecting objects of varying sizes and swaps the original prediction heads for transformer heads to leverage self-attention mechanisms.We further improve YOLOv9-TH using several strategies,including data augmentation,multi-scale testing,multi-model integration,and the introduction of an additional classifier.The cross-stage partial(CSP)method and the ghost convolution hierarchical graph(GCHG)are combined to improve detection accuracy by better utilizing feature maps,widening the receptive field,and precisely extracting multi-scale objects.Additionally,we incorporate the E-SimAM attention mechanism to address low-resolution feature loss.Extensive experiments on the VisDrone2021 and DIOR datasets demonstrate the effectiveness of YOLOv9-TH,showing good improvement in mAP compared to the best existing methods.The YOLOv9-TH-e achieved 54.2% of mAP50 on the VisDrone2021 dataset and 92.3% of mAP on the DIOR dataset.The results confirmthemodel’s robustness and suitability for real-world applications,particularly for small object detection in remote sensing images. 展开更多
关键词 Remote sensing images YOLOv9-TH multi-scale object detection transformer heads VisDrone2021 dataset
在线阅读 下载PDF
Salient Object Detection Based on Multi-Strategy Feature Optimization
12
作者 Libo Han Sha Tao +3 位作者 Wen Xia Weixin Sun Li Yan Wanlin Gao 《Computers, Materials & Continua》 2025年第2期2431-2449,共19页
At present, salient object detection (SOD) has achieved considerable progress. However, the methods that perform well still face the issue of inadequate detection accuracy. For example, sometimes there are problems of... At present, salient object detection (SOD) has achieved considerable progress. However, the methods that perform well still face the issue of inadequate detection accuracy. For example, sometimes there are problems of missed and false detections. Effectively optimizing features to capture key information and better integrating different levels of features to enhance their complementarity are two significant challenges in the domain of SOD. In response to these challenges, this study proposes a novel SOD method based on multi-strategy feature optimization. We propose the multi-size feature extraction module (MSFEM), which uses the attention mechanism, the multi-level feature fusion, and the residual block to obtain finer features. This module provides robust support for the subsequent accurate detection of the salient object. In addition, we use two rounds of feature fusion and the feedback mechanism to optimize the features obtained by the MSFEM to improve detection accuracy. The first round of feature fusion is applied to integrate the features extracted by the MSFEM to obtain more refined features. Subsequently, the feedback mechanism and the second round of feature fusion are applied to refine the features, thereby providing a stronger foundation for accurately detecting salient objects. To improve the fusion effect, we propose the feature enhancement module (FEM) and the feature optimization module (FOM). The FEM integrates the upper and lower features with the optimized features obtained by the FOM to enhance feature complementarity. The FOM uses different receptive fields, the attention mechanism, and the residual block to more effectively capture key information. Experimental results demonstrate that our method outperforms 10 state-of-the-art SOD methods. 展开更多
关键词 Salient object detection multi-strategy feature optimization feedback mechanism
在线阅读 下载PDF
DAFPN-YOLO: An Improved UAV-Based Object Detection Algorithm Based on YOLOv8s
13
作者 Honglin Wang Yaolong Zhang Cheng Zhu 《Computers, Materials & Continua》 2025年第5期1929-1949,共21页
UAV-based object detection is rapidly expanding in both civilian and military applications,including security surveillance,disaster assessment,and border patrol.However,challenges such as small objects,occlusions,comp... UAV-based object detection is rapidly expanding in both civilian and military applications,including security surveillance,disaster assessment,and border patrol.However,challenges such as small objects,occlusions,complex backgrounds,and variable lighting persist due to the unique perspective of UAV imagery.To address these issues,this paper introduces DAFPN-YOLO,an innovative model based on YOLOv8s(You Only Look Once version 8s).Themodel strikes a balance between detection accuracy and speed while reducing parameters,making itwell-suited for multi-object detection tasks from drone perspectives.A key feature of DAFPN-YOLO is the enhanced Drone-AFPN(Adaptive Feature Pyramid Network),which adaptively fuses multi-scale features to optimize feature extraction and enhance spatial and small-object information.To leverage Drone-AFPN’smulti-scale capabilities fully,a dedicated 160×160 small-object detection head was added,significantly boosting detection accuracy for small targets.In the backbone,the C2f_Dual(Cross Stage Partial with Cross-Stage Feature Fusion Dual)module and SPPELAN(Spatial Pyramid Pooling with Enhanced LocalAttentionNetwork)modulewere integrated.These components improve feature extraction and information aggregationwhile reducing parameters and computational complexity,enhancing inference efficiency.Additionally,Shape-IoU(Shape Intersection over Union)is used as the loss function for bounding box regression,enabling more precise shape-based object matching.Experimental results on the VisDrone 2019 dataset demonstrate the effectiveness ofDAFPN-YOLO.Compared to YOLOv8s,the proposedmodel achieves a 5.4 percentage point increase inmAP@0.5,a 3.8 percentage point improvement in mAP@0.5:0.95,and a 17.2%reduction in parameter count.These results highlight DAFPN-YOLO’s advantages in UAV-based object detection,offering valuable insights for applying deep learning to UAV-specific multi-object detection tasks. 展开更多
关键词 YOLOv8 UAV-based object detection AFPN small-object detection head SPPELAN DualConv loss function
在线阅读 下载PDF
Improving Hornet Detection with the YOLOv7-Tiny Model:A Case Study on Asian Hornets
14
作者 Yung-Hsiang Hung Chuen-Kai Fan Wen-Pai Wang 《Computers, Materials & Continua》 2025年第5期2323-2349,共27页
Bees play a crucial role in the global food chain,pollinating over 75% of food and producing valuable products such as bee pollen,propolis,and royal jelly.However,theAsian hornet poses a serious threat to bee populati... Bees play a crucial role in the global food chain,pollinating over 75% of food and producing valuable products such as bee pollen,propolis,and royal jelly.However,theAsian hornet poses a serious threat to bee populations by preying on them and disrupting agricultural ecosystems.To address this issue,this study developed a modified YOLOv7tiny(You Only Look Once)model for efficient hornet detection.The model incorporated space-to-depth(SPD)and squeeze-and-excitation(SE)attention mechanisms and involved detailed annotation of the hornet’s head and full body,significantly enhancing the detection of small objects.The Taguchi method was also used to optimize the training parameters,resulting in optimal performance.Data for this study were collected from the Roboflow platformusing a 640×640 resolution dataset.The YOLOv7tinymodel was trained on this dataset.After optimizing the training parameters using the Taguchi method,significant improvements were observed in accuracy,precision,recall,F1 score,andmean average precision(mAP)for hornet detection.Without the hornet head label,incorporating the SPD attentionmechanism resulted in a peakmAP of 98.7%,representing an 8.58%increase over the original YOLOv7tiny.By including the hornet head label and applying the SPD attention mechanism and Soft-CIOU loss function,themAP was further enhanced to 97.3%,a 7.04% increase over the original YOLOv7tiny.Furthermore,the Soft-CIOU Loss function contributed to additional performance enhancements during the validation phase. 展开更多
关键词 Computer vision object detection YOLOv7tiny SE SPD Asian hornet
在线阅读 下载PDF
Syn-Aug:An Effective and General Synchronous Data Augmentation Framework for 3D Object Detection
15
作者 Huaijin Liu Jixiang Du +2 位作者 Yong Zhang Hongbo Zhang Jiandian Zeng 《CAAI Transactions on Intelligence Technology》 2025年第3期912-928,共17页
Data augmentation plays an important role in boosting the performance of 3D models,while very few studies handle the 3D point cloud data with this technique.Global augmentation and cut-paste are commonly used augmenta... Data augmentation plays an important role in boosting the performance of 3D models,while very few studies handle the 3D point cloud data with this technique.Global augmentation and cut-paste are commonly used augmentation techniques for point clouds,where global augmentation is applied to the entire point cloud of the scene,and cut-paste samples objects from other frames into the current frame.Both types of data augmentation can improve performance,but the cut-paste technique cannot effectively deal with the occlusion relationship between the foreground object and the background scene and the rationality of object sampling,which may be counterproductive and may hurt the overall performance.In addition,LiDAR is susceptible to signal loss,external occlusion,extreme weather and other factors,which can easily cause object shape changes,while global augmentation and cut-paste cannot effectively enhance the robustness of the model.To this end,we propose Syn-Aug,a synchronous data augmentation framework for LiDAR-based 3D object detection.Specifically,we first propose a novel rendering-based object augmentation technique(Ren-Aug)to enrich training data while enhancing scene realism.Second,we propose a local augmentation technique(Local-Aug)to generate local noise by rotating and scaling objects in the scene while avoiding collisions,which can improve generalisation performance.Finally,we make full use of the structural information of 3D labels to make the model more robust by randomly changing the geometry of objects in the training frames.We verify the proposed framework with four different types of 3D object detectors.Experimental results show that our proposed Syn-Aug significantly improves the performance of various 3D object detectors in the KITTI and nuScenes datasets,proving the effectiveness and generality of Syn-Aug.On KITTI,four different types of baseline models using Syn-Aug improved mAP by 0.89%,1.35%,1.61%and 1.14%respectively.On nuScenes,four different types of baseline models using Syn-Aug improved mAP by 14.93%,10.42%,8.47%and 6.81%respectively.The code is available at https://github.com/liuhuaijjin/Syn-Aug. 展开更多
关键词 3D object detection data augmentation DIVERSITY GENERALIZATION point cloud ROBUSTNESS
在线阅读 下载PDF
DI-YOLOv5:An Improved Dual-Wavelet-Based YOLOv5 for Dense Small Object Detection
16
作者 Zi-Xin Li Yu-Long Wang Fei Wang 《IEEE/CAA Journal of Automatica Sinica》 2025年第2期457-459,共3页
Dear Editor,This letter focuses on the fact that small objects with few pixels disappear in feature maps with large receptive fields, as the network deepens, in object detection tasks. Therefore, the detection of dens... Dear Editor,This letter focuses on the fact that small objects with few pixels disappear in feature maps with large receptive fields, as the network deepens, in object detection tasks. Therefore, the detection of dense small objects is challenging. 展开更多
关键词 small objects receptive fields feature maps detection dense small objects object detection dense objects
在线阅读 下载PDF
Comparative Analysis of Deep Learning Models for Banana Plant Detection in UAV RGB and Grayscale Imagery
17
作者 Ching-Lung Fan Yu-Jen Chung Shan-Min Yen 《Computers, Materials & Continua》 2025年第9期4627-4653,共27页
Efficient banana crop detection is crucial for precision agriculture;however,traditional remote sensing methods often lack the spatial resolution required for accurate identification.This study utilizes low-altitude U... Efficient banana crop detection is crucial for precision agriculture;however,traditional remote sensing methods often lack the spatial resolution required for accurate identification.This study utilizes low-altitude Unmanned Aerial Vehicle(UAV)images and deep learning-based object detection models to enhance banana plant detection.A comparative analysis of Faster Region-Based Convolutional Neural Network(Faster R-CNN),You Only Look Once Version 3(YOLOv3),Retina Network(RetinaNet),and Single Shot MultiBox Detector(SSD)was conducted to evaluate their effectiveness.Results show that RetinaNet achieved the highest detection accuracy,with a precision of 96.67%,a recall of 71.67%,and an F1 score of 81.33%.The study further highlights the impact of scale variation,occlusion,and vegetation density on detection performance.Unlike previous studies,this research systematically evaluates multi-scale object detection models for banana plant identification,offering insights into the advantages of UAV-based deep learning applications in agriculture.In addition,this study compares five evaluation metrics across the four detection models using both RGB and grayscale images.Specifically,RetinaNet exhibited the best overall performance with grayscale images,achieving the highest values across all five metrics.Compared to its performance with RGB images,these results represent a marked improvement,confirming the potential of grayscale preprocessing to enhance detection capability. 展开更多
关键词 Unmanned Aerial Vehicle image object detection deep learning banana crops
在线阅读 下载PDF
An Infrared-Visible Image Fusion Network with Channel-Switching for Low-Light Object Detection
18
作者 Tianzhe Jiao Yuming Chen +2 位作者 Xiaoyue Feng Chaopeng Guo Jie Song 《Computers, Materials & Continua》 2025年第11期2681-2700,共20页
Visible-infrared object detection leverages the day-night stable object perception capability of infrared images to enhance detection robustness in low-light environments by fusing the complementary information of vis... Visible-infrared object detection leverages the day-night stable object perception capability of infrared images to enhance detection robustness in low-light environments by fusing the complementary information of visible and infrared images.However,the inherent differences in the imaging mechanisms of visible and infrared modalities make effective cross-modal fusion challenging.Furthermore,constrained by the physical characteristics of sensors and thermal diffusion effects,infrared images generally suffer from blurred object contours and missing details,making it difficult to extract object features effectively.To address these issues,we propose an infrared-visible image fusion network that realizesmultimodal information fusion of infrared and visible images through a carefully designedmultiscale fusion strategy.First,we design an adaptive gray-radiance enhancement(AGRE)module to strengthen the detail representation in infrared images,improving their usability in complex lighting scenarios.Next,we introduce a channelspatial feature interaction(CSFI)module,which achieves efficient complementarity between the RGB and infrared(IR)modalities via dynamic channel switching and a spatial attention mechanism.Finally,we propose a multi-scale enhanced cross-attention fusion(MSECA)module,which optimizes the fusion ofmulti-level features through dynamic convolution and gating mechanisms and captures long-range complementary relationships of cross-modal features on a global scale,thereby enhancing the expressiveness of the fused features.Experiments on the KAIST,M3FD,and FLIR datasets demonstrate that our method delivers outstanding performance in daytime and nighttime scenarios.On the KAIST dataset,the miss rate drops to 5.99%,and further to 4.26% in night scenes.On the FLIR and M3FD datasets,it achieves AP50 scores of 79.4% and 88.9%,respectively. 展开更多
关键词 Infrared-visible image fusion channel switching low-light object detection cross-attention fusion
在线阅读 下载PDF
Research Progress on Multi-Modal Fusion Object Detection Algorithms for Autonomous Driving:A Review
19
作者 Peicheng Shi Li Yang +2 位作者 Xinlong Dong Heng Qi Aixi Yang 《Computers, Materials & Continua》 2025年第6期3877-3917,共41页
As the number and complexity of sensors in autonomous vehicles continue to rise,multimodal fusionbased object detection algorithms are increasingly being used to detect 3D environmental information,significantly advan... As the number and complexity of sensors in autonomous vehicles continue to rise,multimodal fusionbased object detection algorithms are increasingly being used to detect 3D environmental information,significantly advancing the development of perception technology in autonomous driving.To further promote the development of fusion algorithms and improve detection performance,this paper discusses the advantages and recent advancements of multimodal fusion-based object detection algorithms.Starting fromsingle-modal sensor detection,the paper provides a detailed overview of typical sensors used in autonomous driving and introduces object detection methods based on images and point clouds.For image-based detection methods,they are categorized into monocular detection and binocular detection based on different input types.For point cloud-based detection methods,they are classified into projection-based,voxel-based,point cluster-based,pillar-based,and graph structure-based approaches based on the technical pathways for processing point cloud features.Additionally,multimodal fusion algorithms are divided into Camera-LiDAR fusion,Camera-Radar fusion,Camera-LiDAR-Radar fusion,and other sensor fusion methods based on the types of sensors involved.Furthermore,the paper identifies five key future research directions in this field,aiming to provide insights for researchers engaged in multimodal fusion-based object detection algorithms and to encourage broader attention to the research and application of multimodal fusion-based object detection. 展开更多
关键词 Multi-modal fusion 3D object detection deep learning autonomous driving
在线阅读 下载PDF
Bridging 2D and 3D Object Detection:Advances in Occlusion Handling through Depth Estimation
20
作者 Zainab Ouardirhi Mostapha Zbakh Sidi Ahmed Mahmoudi 《Computer Modeling in Engineering & Sciences》 2025年第6期2509-2571,共63页
Object detection in occluded environments remains a core challenge in computer vision(CV),especially in domains such as autonomous driving and robotics.While Convolutional Neural Network(CNN)-based twodimensional(2D)a... Object detection in occluded environments remains a core challenge in computer vision(CV),especially in domains such as autonomous driving and robotics.While Convolutional Neural Network(CNN)-based twodimensional(2D)and three-dimensional(3D)object detection methods havemade significant progress,they often fall short under severe occlusion due to depth ambiguities in 2D imagery and the high cost and deployment limitations of 3D sensors such as Light Detection and Ranging(LiDAR).This paper presents a comparative review of recent 2D and 3D detection models,focusing on their occlusion-handling capabilities and the impact of sensor modalities such as stereo vision,Time-of-Flight(ToF)cameras,and LiDAR.In this context,we introduce FuDensityNet,our multimodal occlusion-aware detection framework that combines Red-Green-Blue(RGB)images and LiDAR data to enhance detection performance.As a forward-looking direction,we propose a monocular depth-estimation extension to FuDensityNet,aimed at replacing expensive 3D sensors with a more scalable CNN-based pipeline.Although this enhancement is not experimentally evaluated in this manuscript,we describe its conceptual design and potential for future implementation. 展开更多
关键词 Object detection occlusion handling multimodal fusion MONOCULAR 3D sensors depth estimation
在线阅读 下载PDF
上一页 1 2 25 下一页 到第
使用帮助 返回顶部