Pulmonary nodules represent an early manifestation of lung cancer.However,pulmonary nodules only constitute a small portion of the overall image,posing challenges for physicians in image interpretation and potentially...Pulmonary nodules represent an early manifestation of lung cancer.However,pulmonary nodules only constitute a small portion of the overall image,posing challenges for physicians in image interpretation and potentially leading to false positives or missed detections.To solve these problems,the YOLOv8 network is enhanced by adding deformable convolution and atrous spatial pyramid pooling(ASPP),along with the integration of a coordinate attention(CA)mechanism.This allows the network to focus on small targets while expanding the receptive field without losing resolution.At the same time,context information on the target is gathered and feature expression is enhanced by attention modules in different directions.It effectively improves the positioning accuracy and achieves good results on the LUNA16 dataset.Compared with other detection algorithms,it improves the accuracy of pulmonary nodule detection to a certain extent.展开更多
To maintain the reliability of power systems,routine inspections using drones equipped with advanced object detection algorithms are essential for preempting power-related issues.The increasing resolution of drone-cap...To maintain the reliability of power systems,routine inspections using drones equipped with advanced object detection algorithms are essential for preempting power-related issues.The increasing resolution of drone-captured images has posed a challenge for traditional target detection methods,especially in identifying small objects in high-resolution images.This study presents an enhanced object detection algorithm based on the Faster Regionbased Convolutional Neural Network(Faster R-CNN)framework,specifically tailored for detecting small-scale electrical components like insulators,shock hammers,and screws in transmission line.The algorithm features an improved backbone network for Faster R-CNN,which significantly boosts the feature extraction network’s ability to detect fine details.The Region Proposal Network is optimized using a method of guided feature refinement(GFR),which achieves a balance between accuracy and speed.The incorporation of Generalized Intersection over Union(GIOU)and Region of Interest(ROI)Align further refines themodel’s accuracy.Experimental results demonstrate a notable improvement in mean Average Precision,reaching 89.3%,an 11.1%increase compared to the standard Faster R-CNN.This highlights the effectiveness of the proposed algorithm in identifying electrical components in high-resolution aerial images.展开更多
Lunar impact crater detection is crucial for lunar surface studies and spacecraft landing missions,yet deep learning still struggles with accurately detecting small craters,especially when relying on incomplete catalo...Lunar impact crater detection is crucial for lunar surface studies and spacecraft landing missions,yet deep learning still struggles with accurately detecting small craters,especially when relying on incomplete catalogs.In this work,we integrate Digital Elevation Model(DEM)data to construct a high-quality dataset enriched with slope information,enabling a detailed analysis of crater features and effectively improving detection performance in complex terrains and low-contrast areas.Based on this foundation,we propose a novel two-stage detection network,MSFNet,which leverages multi-scale adaptive feature fusion and multisize ROI pooling to enhance the recognition of craters across various scales.Experimental results demonstrate that MSFNet achieves an F1 score of 74.8%on Test Region1 and a recall rate of 87%for craters with diameters larger than 2 km.Moreover,it shows exceptional performance in detecting sub-kilometer craters by successfully identifying a large number of high-confidence,previously unlabeled targets with a low false detection rate confirmed through manual review.This approach offers an efficient and reliable deep learning solution for lunar impact crater detection.展开更多
To address the challenges of low detection accuracy caused by the diverse species,significant size variations,and complex growth environments of wheat pests in natural settings,a PSA-YOLO11n algorithm is proposed to e...To address the challenges of low detection accuracy caused by the diverse species,significant size variations,and complex growth environments of wheat pests in natural settings,a PSA-YOLO11n algorithm is proposed to enhance detection precision.Building upon the YOLO11n framework,the proposed improvements include three key components:1)SimCSPSPPF in Backbone:An improved Spatial Pyramid Pooling-Fast(SPPF)module,SimCSPSPPF,is integrated into the Backbone to reduce the number of channels in the hidden layers,thereby accelerating model training.2)PEC in Neck:The standard convolution layers in the Neck are replaced with Perception Enhancement Convolutions(PEC)to improve multi-scale feature extraction capabilities,enhancing detection speed.3)AWIoU Loss Function:The regression loss function is replaced with Adequate Wise IoU(AWIoU),addressing issues of bounding box distortion caused by the diversity in pest species and size variations,thereby improving the precision of bounding box localization.Experimental evaluations on the IP102 dataset demonstrate that PSA-YOLO11n achieves a mean Average Precision(mAP)of 89.10%,surpassing YOLO11n by 0.8%.Comparisons with other mainstream algorithms,including Faster R-CNN,RetinaNet,YOLOv5s,YOLOv8n,YOLOv10n,and YOLO11n,confirm that PSA-YOLO11n outperforms all baselines in terms of detection performance.These results highlight the algorithm’s capability to significantly improve the detection accuracy of multi-scale wheat pests in natural environments,providing an effective solution for pest management in wheat production.展开更多
Improving consumer satisfaction with the appearance and surface quality of wood-based products requires inspection methods that are both accurate and efficient.The adoption of artificial intelligence(AI)for surface ev...Improving consumer satisfaction with the appearance and surface quality of wood-based products requires inspection methods that are both accurate and efficient.The adoption of artificial intelligence(AI)for surface evaluation has emerged as a promising solution.Since the visual appeal of wooden products directly impacts their market value and overall business success,effective quality control is crucial.However,conventional inspection techniques often fail to meet performance requirements due to limited accuracy and slow processing times.To address these shortcomings,the authors propose a real-time deep learning-based system for evaluating surface appearance quality.The method integrates object detection and classification within an area attention framework and leverages R-ELAN for advanced fine-tuning.This architecture supports precise identification and classification of multiple objects,even under ambiguous or visually complex conditions.Furthermore,the model is computationally efficient and well-suited to moderate or domain-specific datasets commonly found in industrial inspection tasks.Experimental validation on the Zenodo dataset shows that the model achieves an average precision(AP)of 60.6%,outperforming the current state-of-the-art YOLOv12 model(55.3%),with a fast inference time of approximately 70 milliseconds.These results underscore the potential of AI-powered methods to enhance surface quality inspection in the wood manufacturing sector.展开更多
In this paper,a two-stage light detection and ranging(LiDAR) three-dimensional(3D) object detection framework is presented,namely point-voxel dual transformer(PV-DT3D),which is a transformer-based method.In the propos...In this paper,a two-stage light detection and ranging(LiDAR) three-dimensional(3D) object detection framework is presented,namely point-voxel dual transformer(PV-DT3D),which is a transformer-based method.In the proposed PV-DT3D,point-voxel fusion features are used for proposal refinement.Specifically,keypoints are sampled from entire point cloud scene and used to encode representative scene features via a proposal-aware voxel set abstraction module.Subsequently,following the generation of proposals by the region proposal networks(RPN),the internal encoded keypoints are fed into the dual transformer encoder-decoder architecture.In 3D object detection,the proposed PV-DT3D takes advantage of both point-wise transformer and channel-wise architecture to capture contextual information from the spatial and channel dimensions.Experiments conducted on the highly competitive KITTI 3D car detection leaderboard show that the PV-DT3D achieves superior detection accuracy among state-of-the-art point-voxel-based methods.展开更多
Aiming at the problems of low detection accuracy and large model size of existing object detection algorithms applied to complex road scenes,an improved you only look once version 8(YOLOv8)object detection algorithm f...Aiming at the problems of low detection accuracy and large model size of existing object detection algorithms applied to complex road scenes,an improved you only look once version 8(YOLOv8)object detection algorithm for infrared images,F-YOLOv8,is proposed.First,a spatial-to-depth network replaces the traditional backbone network's strided convolution or pooling layer.At the same time,it combines with the channel attention mechanism so that the neural network focuses on the channels with large weight values to better extract low-resolution image feature information;then an improved feature pyramid network of lightweight bidirectional feature pyramid network(L-BiFPN)is proposed,which can efficiently fuse features of different scales.In addition,a loss function of insertion of union based on the minimum point distance(MPDIoU)is introduced for bounding box regression,which obtains faster convergence speed and more accurate regression results.Experimental results on the FLIR dataset show that the improved algorithm can accurately detect infrared road targets in real time with 3%and 2.2%enhancement in mean average precision at 50%IoU(mAP50)and mean average precision at 50%—95%IoU(mAP50-95),respectively,and 38.1%,37.3%and 16.9%reduction in the number of model parameters,the model weight,and floating-point operations per second(FLOPs),respectively.To further demonstrate the detection capability of the improved algorithm,it is tested on the public dataset PASCAL VOC,and the results show that F-YOLO has excellent generalized detection performance.展开更多
The inherent limitations of 2D object detection,such as inadequate spatial reasoning and susceptibility to environmental occlusions,pose significant risks to the safety and reliability of autonomous driving systems.To...The inherent limitations of 2D object detection,such as inadequate spatial reasoning and susceptibility to environmental occlusions,pose significant risks to the safety and reliability of autonomous driving systems.To address these challenges,this paper proposes an enhanced 3D object detection framework(FastSECOND)based on an optimized SECOND architecture,designed to achieve rapid and accurate perception in autonomous driving scenarios.Key innovations include:(1)Replacing the Rectified Linear Unit(ReLU)activation functions with the Gaussian Error Linear Unit(GELU)during voxel feature encoding and region proposal network stages,leveraging partial convolution to balance computational efficiency and detection accuracy;(2)Integrating a Swin-Transformer V2 module into the voxel backbone network to enhance feature extraction capabilities in sparse data;and(3)Introducing an optimized position regression loss combined with a geometry-aware Focal-EIoU loss function,which incorporates bounding box geometric correlations to accelerate network convergence.While this study currently focuses exclusively on the detection of the Car category,with experiments conducted on the Car class of the KITTI dataset,future work will extend to other categories such as Pedestrian and Cyclist to more comprehensively evaluate the generalization capability of the proposed framework.Extensive experimental results demonstrate that our framework achieves a more effective trade-off between detection accuracy and speed.Compared to the baseline SECOND model,it achieves a 21.9%relative improvement in 3D bounding box detection accuracy on the hard subset,while reducing inference time by 14 ms.These advancements underscore the framework’s potential for enabling real-time,high-precision perception in autonomous driving applications.展开更多
Drone-based small object detection is of great significance in practical applications such as military actions, disaster rescue, transportation, etc. However, the severe scale differences in objects captured by drones...Drone-based small object detection is of great significance in practical applications such as military actions, disaster rescue, transportation, etc. However, the severe scale differences in objects captured by drones and lack of detail information for small-scale objects make drone-based small object detection a formidable challenge. To address these issues, we first develop a mathematical model to explore how changing receptive fields impacts the polynomial fitting results. Subsequently, based on the obtained conclusions, we propose a simple but effective Hybrid Receptive Field Network (HRFNet), whose modules include Hybrid Feature Augmentation (HFA), Hybrid Feature Pyramid (HFP) and Dual Scale Head (DSH). Specifically, HFA employs parallel dilated convolution kernels of different sizes to extend shallow features with different receptive fields, committed to improving the multi-scale adaptability of the network;HFP enhances the perception of small objects by capturing contextual information across layers, while DSH reconstructs the original prediction head utilizing a set of high-resolution features and ultrahigh-resolution features. In addition, in order to train HRFNet, the corresponding dual-scale loss function is designed. Finally, comprehensive evaluation results on public benchmarks such as VisDrone-DET and TinyPerson demonstrate the robustness of the proposed method. Most impressively, the proposed HRFNet achieves a mAP of 51.0 on VisDrone-DET with 29.3 M parameters, which outperforms the extant state-of-the-art detectors. HRFNet also performs excellently in complex scenarios captured by drones, achieving the best performance on the CS-Drone dataset we built.展开更多
Accurate defect detection plays a critical role in ensuring product quality and equipment reliability.Small-object detection poses unique challenges due to weak feature representation and significant background interf...Accurate defect detection plays a critical role in ensuring product quality and equipment reliability.Small-object detection poses unique challenges due to weak feature representation and significant background interference.To address these issues,this study incorporates three key innovations into the YOLOv8 framework:the use of GhostNet convolution for lightweight and efficient feature extraction,the addition of a P2 detection layer to enhance small-object detection capabilities,and the integration of the Triplet Attention mechanism to capture comprehensive spatial and channel dependencies.These improvements collectively optimize detection performance for small objects while reducing computational complexity.Experimental results demonstrate that the enhanced model achieves a mean average precision(mAP@0.5)of 97.46%and a mAP@0.5∶0.95 of 61.84%,representing a performance improvement of 1.9%and 3.2%,respectively,compared to the baseline YOLOv8 model.Additionally,the model achieves a frame rate of 158 FPS,maintaining real-time detection capabilities while reducing the parameter count by 50%,further underscoring its efficiency and suitability for smallobject detection in complex scenarios.展开更多
Recent years have seen a surge in interest in object detection on remote sensing images for applications such as surveillance andmanagement.However,challenges like small object detection,scale variation,and the presen...Recent years have seen a surge in interest in object detection on remote sensing images for applications such as surveillance andmanagement.However,challenges like small object detection,scale variation,and the presence of closely packed objects in these images hinder accurate detection.Additionally,the motion blur effect further complicates the identification of such objects.To address these issues,we propose enhanced YOLOv9 with a transformer head(YOLOv9-TH).The model introduces an additional prediction head for detecting objects of varying sizes and swaps the original prediction heads for transformer heads to leverage self-attention mechanisms.We further improve YOLOv9-TH using several strategies,including data augmentation,multi-scale testing,multi-model integration,and the introduction of an additional classifier.The cross-stage partial(CSP)method and the ghost convolution hierarchical graph(GCHG)are combined to improve detection accuracy by better utilizing feature maps,widening the receptive field,and precisely extracting multi-scale objects.Additionally,we incorporate the E-SimAM attention mechanism to address low-resolution feature loss.Extensive experiments on the VisDrone2021 and DIOR datasets demonstrate the effectiveness of YOLOv9-TH,showing good improvement in mAP compared to the best existing methods.The YOLOv9-TH-e achieved 54.2% of mAP50 on the VisDrone2021 dataset and 92.3% of mAP on the DIOR dataset.The results confirmthemodel’s robustness and suitability for real-world applications,particularly for small object detection in remote sensing images.展开更多
At present, salient object detection (SOD) has achieved considerable progress. However, the methods that perform well still face the issue of inadequate detection accuracy. For example, sometimes there are problems of...At present, salient object detection (SOD) has achieved considerable progress. However, the methods that perform well still face the issue of inadequate detection accuracy. For example, sometimes there are problems of missed and false detections. Effectively optimizing features to capture key information and better integrating different levels of features to enhance their complementarity are two significant challenges in the domain of SOD. In response to these challenges, this study proposes a novel SOD method based on multi-strategy feature optimization. We propose the multi-size feature extraction module (MSFEM), which uses the attention mechanism, the multi-level feature fusion, and the residual block to obtain finer features. This module provides robust support for the subsequent accurate detection of the salient object. In addition, we use two rounds of feature fusion and the feedback mechanism to optimize the features obtained by the MSFEM to improve detection accuracy. The first round of feature fusion is applied to integrate the features extracted by the MSFEM to obtain more refined features. Subsequently, the feedback mechanism and the second round of feature fusion are applied to refine the features, thereby providing a stronger foundation for accurately detecting salient objects. To improve the fusion effect, we propose the feature enhancement module (FEM) and the feature optimization module (FOM). The FEM integrates the upper and lower features with the optimized features obtained by the FOM to enhance feature complementarity. The FOM uses different receptive fields, the attention mechanism, and the residual block to more effectively capture key information. Experimental results demonstrate that our method outperforms 10 state-of-the-art SOD methods.展开更多
UAV-based object detection is rapidly expanding in both civilian and military applications,including security surveillance,disaster assessment,and border patrol.However,challenges such as small objects,occlusions,comp...UAV-based object detection is rapidly expanding in both civilian and military applications,including security surveillance,disaster assessment,and border patrol.However,challenges such as small objects,occlusions,complex backgrounds,and variable lighting persist due to the unique perspective of UAV imagery.To address these issues,this paper introduces DAFPN-YOLO,an innovative model based on YOLOv8s(You Only Look Once version 8s).Themodel strikes a balance between detection accuracy and speed while reducing parameters,making itwell-suited for multi-object detection tasks from drone perspectives.A key feature of DAFPN-YOLO is the enhanced Drone-AFPN(Adaptive Feature Pyramid Network),which adaptively fuses multi-scale features to optimize feature extraction and enhance spatial and small-object information.To leverage Drone-AFPN’smulti-scale capabilities fully,a dedicated 160×160 small-object detection head was added,significantly boosting detection accuracy for small targets.In the backbone,the C2f_Dual(Cross Stage Partial with Cross-Stage Feature Fusion Dual)module and SPPELAN(Spatial Pyramid Pooling with Enhanced LocalAttentionNetwork)modulewere integrated.These components improve feature extraction and information aggregationwhile reducing parameters and computational complexity,enhancing inference efficiency.Additionally,Shape-IoU(Shape Intersection over Union)is used as the loss function for bounding box regression,enabling more precise shape-based object matching.Experimental results on the VisDrone 2019 dataset demonstrate the effectiveness ofDAFPN-YOLO.Compared to YOLOv8s,the proposedmodel achieves a 5.4 percentage point increase inmAP@0.5,a 3.8 percentage point improvement in mAP@0.5:0.95,and a 17.2%reduction in parameter count.These results highlight DAFPN-YOLO’s advantages in UAV-based object detection,offering valuable insights for applying deep learning to UAV-specific multi-object detection tasks.展开更多
Bees play a crucial role in the global food chain,pollinating over 75% of food and producing valuable products such as bee pollen,propolis,and royal jelly.However,theAsian hornet poses a serious threat to bee populati...Bees play a crucial role in the global food chain,pollinating over 75% of food and producing valuable products such as bee pollen,propolis,and royal jelly.However,theAsian hornet poses a serious threat to bee populations by preying on them and disrupting agricultural ecosystems.To address this issue,this study developed a modified YOLOv7tiny(You Only Look Once)model for efficient hornet detection.The model incorporated space-to-depth(SPD)and squeeze-and-excitation(SE)attention mechanisms and involved detailed annotation of the hornet’s head and full body,significantly enhancing the detection of small objects.The Taguchi method was also used to optimize the training parameters,resulting in optimal performance.Data for this study were collected from the Roboflow platformusing a 640×640 resolution dataset.The YOLOv7tinymodel was trained on this dataset.After optimizing the training parameters using the Taguchi method,significant improvements were observed in accuracy,precision,recall,F1 score,andmean average precision(mAP)for hornet detection.Without the hornet head label,incorporating the SPD attentionmechanism resulted in a peakmAP of 98.7%,representing an 8.58%increase over the original YOLOv7tiny.By including the hornet head label and applying the SPD attention mechanism and Soft-CIOU loss function,themAP was further enhanced to 97.3%,a 7.04% increase over the original YOLOv7tiny.Furthermore,the Soft-CIOU Loss function contributed to additional performance enhancements during the validation phase.展开更多
Data augmentation plays an important role in boosting the performance of 3D models,while very few studies handle the 3D point cloud data with this technique.Global augmentation and cut-paste are commonly used augmenta...Data augmentation plays an important role in boosting the performance of 3D models,while very few studies handle the 3D point cloud data with this technique.Global augmentation and cut-paste are commonly used augmentation techniques for point clouds,where global augmentation is applied to the entire point cloud of the scene,and cut-paste samples objects from other frames into the current frame.Both types of data augmentation can improve performance,but the cut-paste technique cannot effectively deal with the occlusion relationship between the foreground object and the background scene and the rationality of object sampling,which may be counterproductive and may hurt the overall performance.In addition,LiDAR is susceptible to signal loss,external occlusion,extreme weather and other factors,which can easily cause object shape changes,while global augmentation and cut-paste cannot effectively enhance the robustness of the model.To this end,we propose Syn-Aug,a synchronous data augmentation framework for LiDAR-based 3D object detection.Specifically,we first propose a novel rendering-based object augmentation technique(Ren-Aug)to enrich training data while enhancing scene realism.Second,we propose a local augmentation technique(Local-Aug)to generate local noise by rotating and scaling objects in the scene while avoiding collisions,which can improve generalisation performance.Finally,we make full use of the structural information of 3D labels to make the model more robust by randomly changing the geometry of objects in the training frames.We verify the proposed framework with four different types of 3D object detectors.Experimental results show that our proposed Syn-Aug significantly improves the performance of various 3D object detectors in the KITTI and nuScenes datasets,proving the effectiveness and generality of Syn-Aug.On KITTI,four different types of baseline models using Syn-Aug improved mAP by 0.89%,1.35%,1.61%and 1.14%respectively.On nuScenes,four different types of baseline models using Syn-Aug improved mAP by 14.93%,10.42%,8.47%and 6.81%respectively.The code is available at https://github.com/liuhuaijjin/Syn-Aug.展开更多
Dear Editor,This letter focuses on the fact that small objects with few pixels disappear in feature maps with large receptive fields, as the network deepens, in object detection tasks. Therefore, the detection of dens...Dear Editor,This letter focuses on the fact that small objects with few pixels disappear in feature maps with large receptive fields, as the network deepens, in object detection tasks. Therefore, the detection of dense small objects is challenging.展开更多
Efficient banana crop detection is crucial for precision agriculture;however,traditional remote sensing methods often lack the spatial resolution required for accurate identification.This study utilizes low-altitude U...Efficient banana crop detection is crucial for precision agriculture;however,traditional remote sensing methods often lack the spatial resolution required for accurate identification.This study utilizes low-altitude Unmanned Aerial Vehicle(UAV)images and deep learning-based object detection models to enhance banana plant detection.A comparative analysis of Faster Region-Based Convolutional Neural Network(Faster R-CNN),You Only Look Once Version 3(YOLOv3),Retina Network(RetinaNet),and Single Shot MultiBox Detector(SSD)was conducted to evaluate their effectiveness.Results show that RetinaNet achieved the highest detection accuracy,with a precision of 96.67%,a recall of 71.67%,and an F1 score of 81.33%.The study further highlights the impact of scale variation,occlusion,and vegetation density on detection performance.Unlike previous studies,this research systematically evaluates multi-scale object detection models for banana plant identification,offering insights into the advantages of UAV-based deep learning applications in agriculture.In addition,this study compares five evaluation metrics across the four detection models using both RGB and grayscale images.Specifically,RetinaNet exhibited the best overall performance with grayscale images,achieving the highest values across all five metrics.Compared to its performance with RGB images,these results represent a marked improvement,confirming the potential of grayscale preprocessing to enhance detection capability.展开更多
Visible-infrared object detection leverages the day-night stable object perception capability of infrared images to enhance detection robustness in low-light environments by fusing the complementary information of vis...Visible-infrared object detection leverages the day-night stable object perception capability of infrared images to enhance detection robustness in low-light environments by fusing the complementary information of visible and infrared images.However,the inherent differences in the imaging mechanisms of visible and infrared modalities make effective cross-modal fusion challenging.Furthermore,constrained by the physical characteristics of sensors and thermal diffusion effects,infrared images generally suffer from blurred object contours and missing details,making it difficult to extract object features effectively.To address these issues,we propose an infrared-visible image fusion network that realizesmultimodal information fusion of infrared and visible images through a carefully designedmultiscale fusion strategy.First,we design an adaptive gray-radiance enhancement(AGRE)module to strengthen the detail representation in infrared images,improving their usability in complex lighting scenarios.Next,we introduce a channelspatial feature interaction(CSFI)module,which achieves efficient complementarity between the RGB and infrared(IR)modalities via dynamic channel switching and a spatial attention mechanism.Finally,we propose a multi-scale enhanced cross-attention fusion(MSECA)module,which optimizes the fusion ofmulti-level features through dynamic convolution and gating mechanisms and captures long-range complementary relationships of cross-modal features on a global scale,thereby enhancing the expressiveness of the fused features.Experiments on the KAIST,M3FD,and FLIR datasets demonstrate that our method delivers outstanding performance in daytime and nighttime scenarios.On the KAIST dataset,the miss rate drops to 5.99%,and further to 4.26% in night scenes.On the FLIR and M3FD datasets,it achieves AP50 scores of 79.4% and 88.9%,respectively.展开更多
As the number and complexity of sensors in autonomous vehicles continue to rise,multimodal fusionbased object detection algorithms are increasingly being used to detect 3D environmental information,significantly advan...As the number and complexity of sensors in autonomous vehicles continue to rise,multimodal fusionbased object detection algorithms are increasingly being used to detect 3D environmental information,significantly advancing the development of perception technology in autonomous driving.To further promote the development of fusion algorithms and improve detection performance,this paper discusses the advantages and recent advancements of multimodal fusion-based object detection algorithms.Starting fromsingle-modal sensor detection,the paper provides a detailed overview of typical sensors used in autonomous driving and introduces object detection methods based on images and point clouds.For image-based detection methods,they are categorized into monocular detection and binocular detection based on different input types.For point cloud-based detection methods,they are classified into projection-based,voxel-based,point cluster-based,pillar-based,and graph structure-based approaches based on the technical pathways for processing point cloud features.Additionally,multimodal fusion algorithms are divided into Camera-LiDAR fusion,Camera-Radar fusion,Camera-LiDAR-Radar fusion,and other sensor fusion methods based on the types of sensors involved.Furthermore,the paper identifies five key future research directions in this field,aiming to provide insights for researchers engaged in multimodal fusion-based object detection algorithms and to encourage broader attention to the research and application of multimodal fusion-based object detection.展开更多
Object detection in occluded environments remains a core challenge in computer vision(CV),especially in domains such as autonomous driving and robotics.While Convolutional Neural Network(CNN)-based twodimensional(2D)a...Object detection in occluded environments remains a core challenge in computer vision(CV),especially in domains such as autonomous driving and robotics.While Convolutional Neural Network(CNN)-based twodimensional(2D)and three-dimensional(3D)object detection methods havemade significant progress,they often fall short under severe occlusion due to depth ambiguities in 2D imagery and the high cost and deployment limitations of 3D sensors such as Light Detection and Ranging(LiDAR).This paper presents a comparative review of recent 2D and 3D detection models,focusing on their occlusion-handling capabilities and the impact of sensor modalities such as stereo vision,Time-of-Flight(ToF)cameras,and LiDAR.In this context,we introduce FuDensityNet,our multimodal occlusion-aware detection framework that combines Red-Green-Blue(RGB)images and LiDAR data to enhance detection performance.As a forward-looking direction,we propose a monocular depth-estimation extension to FuDensityNet,aimed at replacing expensive 3D sensors with a more scalable CNN-based pipeline.Although this enhancement is not experimentally evaluated in this manuscript,we describe its conceptual design and potential for future implementation.展开更多
文摘Pulmonary nodules represent an early manifestation of lung cancer.However,pulmonary nodules only constitute a small portion of the overall image,posing challenges for physicians in image interpretation and potentially leading to false positives or missed detections.To solve these problems,the YOLOv8 network is enhanced by adding deformable convolution and atrous spatial pyramid pooling(ASPP),along with the integration of a coordinate attention(CA)mechanism.This allows the network to focus on small targets while expanding the receptive field without losing resolution.At the same time,context information on the target is gathered and feature expression is enhanced by attention modules in different directions.It effectively improves the positioning accuracy and achieves good results on the LUNA16 dataset.Compared with other detection algorithms,it improves the accuracy of pulmonary nodule detection to a certain extent.
基金supported by the Shanghai Science and Technology Innovation Action Plan High-Tech Field Project(Grant No.22511100601)for the year 2022 and Technology Development Fund for People’s Livelihood Research(Research on Transmission Line Deep Foundation Pit Environmental Situation Awareness System Based on Multi-Source Data).
文摘To maintain the reliability of power systems,routine inspections using drones equipped with advanced object detection algorithms are essential for preempting power-related issues.The increasing resolution of drone-captured images has posed a challenge for traditional target detection methods,especially in identifying small objects in high-resolution images.This study presents an enhanced object detection algorithm based on the Faster Regionbased Convolutional Neural Network(Faster R-CNN)framework,specifically tailored for detecting small-scale electrical components like insulators,shock hammers,and screws in transmission line.The algorithm features an improved backbone network for Faster R-CNN,which significantly boosts the feature extraction network’s ability to detect fine details.The Region Proposal Network is optimized using a method of guided feature refinement(GFR),which achieves a balance between accuracy and speed.The incorporation of Generalized Intersection over Union(GIOU)and Region of Interest(ROI)Align further refines themodel’s accuracy.Experimental results demonstrate a notable improvement in mean Average Precision,reaching 89.3%,an 11.1%increase compared to the standard Faster R-CNN.This highlights the effectiveness of the proposed algorithm in identifying electrical components in high-resolution aerial images.
基金National Natural Science Foundation of China(12103020,12363009)Natural Science Foundation of Jiangxi Province(20224BAB211011)+1 种基金Open Project Program of State Key Laboratory of Lunar and Planetary Sciences(Macao University of Science and Technology)(Macao FDCT grant No.002/2024/SKL)Youth Talent Project of Science and Technology Plan of Ganzhou(2022CXRC9191,2023CYZ26970)。
文摘Lunar impact crater detection is crucial for lunar surface studies and spacecraft landing missions,yet deep learning still struggles with accurately detecting small craters,especially when relying on incomplete catalogs.In this work,we integrate Digital Elevation Model(DEM)data to construct a high-quality dataset enriched with slope information,enabling a detailed analysis of crater features and effectively improving detection performance in complex terrains and low-contrast areas.Based on this foundation,we propose a novel two-stage detection network,MSFNet,which leverages multi-scale adaptive feature fusion and multisize ROI pooling to enhance the recognition of craters across various scales.Experimental results demonstrate that MSFNet achieves an F1 score of 74.8%on Test Region1 and a recall rate of 87%for craters with diameters larger than 2 km.Moreover,it shows exceptional performance in detecting sub-kilometer craters by successfully identifying a large number of high-confidence,previously unlabeled targets with a low false detection rate confirmed through manual review.This approach offers an efficient and reliable deep learning solution for lunar impact crater detection.
文摘To address the challenges of low detection accuracy caused by the diverse species,significant size variations,and complex growth environments of wheat pests in natural settings,a PSA-YOLO11n algorithm is proposed to enhance detection precision.Building upon the YOLO11n framework,the proposed improvements include three key components:1)SimCSPSPPF in Backbone:An improved Spatial Pyramid Pooling-Fast(SPPF)module,SimCSPSPPF,is integrated into the Backbone to reduce the number of channels in the hidden layers,thereby accelerating model training.2)PEC in Neck:The standard convolution layers in the Neck are replaced with Perception Enhancement Convolutions(PEC)to improve multi-scale feature extraction capabilities,enhancing detection speed.3)AWIoU Loss Function:The regression loss function is replaced with Adequate Wise IoU(AWIoU),addressing issues of bounding box distortion caused by the diversity in pest species and size variations,thereby improving the precision of bounding box localization.Experimental evaluations on the IP102 dataset demonstrate that PSA-YOLO11n achieves a mean Average Precision(mAP)of 89.10%,surpassing YOLO11n by 0.8%.Comparisons with other mainstream algorithms,including Faster R-CNN,RetinaNet,YOLOv5s,YOLOv8n,YOLOv10n,and YOLO11n,confirm that PSA-YOLO11n outperforms all baselines in terms of detection performance.These results highlight the algorithm’s capability to significantly improve the detection accuracy of multi-scale wheat pests in natural environments,providing an effective solution for pest management in wheat production.
文摘Improving consumer satisfaction with the appearance and surface quality of wood-based products requires inspection methods that are both accurate and efficient.The adoption of artificial intelligence(AI)for surface evaluation has emerged as a promising solution.Since the visual appeal of wooden products directly impacts their market value and overall business success,effective quality control is crucial.However,conventional inspection techniques often fail to meet performance requirements due to limited accuracy and slow processing times.To address these shortcomings,the authors propose a real-time deep learning-based system for evaluating surface appearance quality.The method integrates object detection and classification within an area attention framework and leverages R-ELAN for advanced fine-tuning.This architecture supports precise identification and classification of multiple objects,even under ambiguous or visually complex conditions.Furthermore,the model is computationally efficient and well-suited to moderate or domain-specific datasets commonly found in industrial inspection tasks.Experimental validation on the Zenodo dataset shows that the model achieves an average precision(AP)of 60.6%,outperforming the current state-of-the-art YOLOv12 model(55.3%),with a fast inference time of approximately 70 milliseconds.These results underscore the potential of AI-powered methods to enhance surface quality inspection in the wood manufacturing sector.
基金supported by the Natural Science Foundation of China (No.62103298)the South African National Research Foundation (Nos.132797 and 137951)。
文摘In this paper,a two-stage light detection and ranging(LiDAR) three-dimensional(3D) object detection framework is presented,namely point-voxel dual transformer(PV-DT3D),which is a transformer-based method.In the proposed PV-DT3D,point-voxel fusion features are used for proposal refinement.Specifically,keypoints are sampled from entire point cloud scene and used to encode representative scene features via a proposal-aware voxel set abstraction module.Subsequently,following the generation of proposals by the region proposal networks(RPN),the internal encoded keypoints are fed into the dual transformer encoder-decoder architecture.In 3D object detection,the proposed PV-DT3D takes advantage of both point-wise transformer and channel-wise architecture to capture contextual information from the spatial and channel dimensions.Experiments conducted on the highly competitive KITTI 3D car detection leaderboard show that the PV-DT3D achieves superior detection accuracy among state-of-the-art point-voxel-based methods.
基金supported by the National Natural Science Foundation of China(No.62103298)。
文摘Aiming at the problems of low detection accuracy and large model size of existing object detection algorithms applied to complex road scenes,an improved you only look once version 8(YOLOv8)object detection algorithm for infrared images,F-YOLOv8,is proposed.First,a spatial-to-depth network replaces the traditional backbone network's strided convolution or pooling layer.At the same time,it combines with the channel attention mechanism so that the neural network focuses on the channels with large weight values to better extract low-resolution image feature information;then an improved feature pyramid network of lightweight bidirectional feature pyramid network(L-BiFPN)is proposed,which can efficiently fuse features of different scales.In addition,a loss function of insertion of union based on the minimum point distance(MPDIoU)is introduced for bounding box regression,which obtains faster convergence speed and more accurate regression results.Experimental results on the FLIR dataset show that the improved algorithm can accurately detect infrared road targets in real time with 3%and 2.2%enhancement in mean average precision at 50%IoU(mAP50)and mean average precision at 50%—95%IoU(mAP50-95),respectively,and 38.1%,37.3%and 16.9%reduction in the number of model parameters,the model weight,and floating-point operations per second(FLOPs),respectively.To further demonstrate the detection capability of the improved algorithm,it is tested on the public dataset PASCAL VOC,and the results show that F-YOLO has excellent generalized detection performance.
基金funded by the National Key R&D Program of China(Grant No.2022YFB4703400)the China Three Gorges Corporation(Grant No.2324020012)+2 种基金the National Natural Science Foundation of China(Grant No.62476080)the Jiangsu Province Natural Science Foundation(Grant No.BK20231186)Key Laboratory about Maritime Intelligent Network Information Technology of the Ministry of Education(EKLMIC202405).
文摘The inherent limitations of 2D object detection,such as inadequate spatial reasoning and susceptibility to environmental occlusions,pose significant risks to the safety and reliability of autonomous driving systems.To address these challenges,this paper proposes an enhanced 3D object detection framework(FastSECOND)based on an optimized SECOND architecture,designed to achieve rapid and accurate perception in autonomous driving scenarios.Key innovations include:(1)Replacing the Rectified Linear Unit(ReLU)activation functions with the Gaussian Error Linear Unit(GELU)during voxel feature encoding and region proposal network stages,leveraging partial convolution to balance computational efficiency and detection accuracy;(2)Integrating a Swin-Transformer V2 module into the voxel backbone network to enhance feature extraction capabilities in sparse data;and(3)Introducing an optimized position regression loss combined with a geometry-aware Focal-EIoU loss function,which incorporates bounding box geometric correlations to accelerate network convergence.While this study currently focuses exclusively on the detection of the Car category,with experiments conducted on the Car class of the KITTI dataset,future work will extend to other categories such as Pedestrian and Cyclist to more comprehensively evaluate the generalization capability of the proposed framework.Extensive experimental results demonstrate that our framework achieves a more effective trade-off between detection accuracy and speed.Compared to the baseline SECOND model,it achieves a 21.9%relative improvement in 3D bounding box detection accuracy on the hard subset,while reducing inference time by 14 ms.These advancements underscore the framework’s potential for enabling real-time,high-precision perception in autonomous driving applications.
基金supported by the National Natural Science Foundation of China(Nos.62276204 and 62203343)the Fundamental Research Funds for the Central Universities(No.YJSJ24011)+1 种基金the Natural Science Basic Research Program of Shanxi,China(Nos.2022JM-340 and 2023-JC-QN-0710)the China Postdoctoral Science Foundation(Nos.2020T130494 and 2018M633470).
文摘Drone-based small object detection is of great significance in practical applications such as military actions, disaster rescue, transportation, etc. However, the severe scale differences in objects captured by drones and lack of detail information for small-scale objects make drone-based small object detection a formidable challenge. To address these issues, we first develop a mathematical model to explore how changing receptive fields impacts the polynomial fitting results. Subsequently, based on the obtained conclusions, we propose a simple but effective Hybrid Receptive Field Network (HRFNet), whose modules include Hybrid Feature Augmentation (HFA), Hybrid Feature Pyramid (HFP) and Dual Scale Head (DSH). Specifically, HFA employs parallel dilated convolution kernels of different sizes to extend shallow features with different receptive fields, committed to improving the multi-scale adaptability of the network;HFP enhances the perception of small objects by capturing contextual information across layers, while DSH reconstructs the original prediction head utilizing a set of high-resolution features and ultrahigh-resolution features. In addition, in order to train HRFNet, the corresponding dual-scale loss function is designed. Finally, comprehensive evaluation results on public benchmarks such as VisDrone-DET and TinyPerson demonstrate the robustness of the proposed method. Most impressively, the proposed HRFNet achieves a mAP of 51.0 on VisDrone-DET with 29.3 M parameters, which outperforms the extant state-of-the-art detectors. HRFNet also performs excellently in complex scenarios captured by drones, achieving the best performance on the CS-Drone dataset we built.
文摘Accurate defect detection plays a critical role in ensuring product quality and equipment reliability.Small-object detection poses unique challenges due to weak feature representation and significant background interference.To address these issues,this study incorporates three key innovations into the YOLOv8 framework:the use of GhostNet convolution for lightweight and efficient feature extraction,the addition of a P2 detection layer to enhance small-object detection capabilities,and the integration of the Triplet Attention mechanism to capture comprehensive spatial and channel dependencies.These improvements collectively optimize detection performance for small objects while reducing computational complexity.Experimental results demonstrate that the enhanced model achieves a mean average precision(mAP@0.5)of 97.46%and a mAP@0.5∶0.95 of 61.84%,representing a performance improvement of 1.9%and 3.2%,respectively,compared to the baseline YOLOv8 model.Additionally,the model achieves a frame rate of 158 FPS,maintaining real-time detection capabilities while reducing the parameter count by 50%,further underscoring its efficiency and suitability for smallobject detection in complex scenarios.
文摘Recent years have seen a surge in interest in object detection on remote sensing images for applications such as surveillance andmanagement.However,challenges like small object detection,scale variation,and the presence of closely packed objects in these images hinder accurate detection.Additionally,the motion blur effect further complicates the identification of such objects.To address these issues,we propose enhanced YOLOv9 with a transformer head(YOLOv9-TH).The model introduces an additional prediction head for detecting objects of varying sizes and swaps the original prediction heads for transformer heads to leverage self-attention mechanisms.We further improve YOLOv9-TH using several strategies,including data augmentation,multi-scale testing,multi-model integration,and the introduction of an additional classifier.The cross-stage partial(CSP)method and the ghost convolution hierarchical graph(GCHG)are combined to improve detection accuracy by better utilizing feature maps,widening the receptive field,and precisely extracting multi-scale objects.Additionally,we incorporate the E-SimAM attention mechanism to address low-resolution feature loss.Extensive experiments on the VisDrone2021 and DIOR datasets demonstrate the effectiveness of YOLOv9-TH,showing good improvement in mAP compared to the best existing methods.The YOLOv9-TH-e achieved 54.2% of mAP50 on the VisDrone2021 dataset and 92.3% of mAP on the DIOR dataset.The results confirmthemodel’s robustness and suitability for real-world applications,particularly for small object detection in remote sensing images.
文摘At present, salient object detection (SOD) has achieved considerable progress. However, the methods that perform well still face the issue of inadequate detection accuracy. For example, sometimes there are problems of missed and false detections. Effectively optimizing features to capture key information and better integrating different levels of features to enhance their complementarity are two significant challenges in the domain of SOD. In response to these challenges, this study proposes a novel SOD method based on multi-strategy feature optimization. We propose the multi-size feature extraction module (MSFEM), which uses the attention mechanism, the multi-level feature fusion, and the residual block to obtain finer features. This module provides robust support for the subsequent accurate detection of the salient object. In addition, we use two rounds of feature fusion and the feedback mechanism to optimize the features obtained by the MSFEM to improve detection accuracy. The first round of feature fusion is applied to integrate the features extracted by the MSFEM to obtain more refined features. Subsequently, the feedback mechanism and the second round of feature fusion are applied to refine the features, thereby providing a stronger foundation for accurately detecting salient objects. To improve the fusion effect, we propose the feature enhancement module (FEM) and the feature optimization module (FOM). The FEM integrates the upper and lower features with the optimized features obtained by the FOM to enhance feature complementarity. The FOM uses different receptive fields, the attention mechanism, and the residual block to more effectively capture key information. Experimental results demonstrate that our method outperforms 10 state-of-the-art SOD methods.
基金supported by the National Natural Science Foundation of China(Grant Nos.62101275 and 62101274).
文摘UAV-based object detection is rapidly expanding in both civilian and military applications,including security surveillance,disaster assessment,and border patrol.However,challenges such as small objects,occlusions,complex backgrounds,and variable lighting persist due to the unique perspective of UAV imagery.To address these issues,this paper introduces DAFPN-YOLO,an innovative model based on YOLOv8s(You Only Look Once version 8s).Themodel strikes a balance between detection accuracy and speed while reducing parameters,making itwell-suited for multi-object detection tasks from drone perspectives.A key feature of DAFPN-YOLO is the enhanced Drone-AFPN(Adaptive Feature Pyramid Network),which adaptively fuses multi-scale features to optimize feature extraction and enhance spatial and small-object information.To leverage Drone-AFPN’smulti-scale capabilities fully,a dedicated 160×160 small-object detection head was added,significantly boosting detection accuracy for small targets.In the backbone,the C2f_Dual(Cross Stage Partial with Cross-Stage Feature Fusion Dual)module and SPPELAN(Spatial Pyramid Pooling with Enhanced LocalAttentionNetwork)modulewere integrated.These components improve feature extraction and information aggregationwhile reducing parameters and computational complexity,enhancing inference efficiency.Additionally,Shape-IoU(Shape Intersection over Union)is used as the loss function for bounding box regression,enabling more precise shape-based object matching.Experimental results on the VisDrone 2019 dataset demonstrate the effectiveness ofDAFPN-YOLO.Compared to YOLOv8s,the proposedmodel achieves a 5.4 percentage point increase inmAP@0.5,a 3.8 percentage point improvement in mAP@0.5:0.95,and a 17.2%reduction in parameter count.These results highlight DAFPN-YOLO’s advantages in UAV-based object detection,offering valuable insights for applying deep learning to UAV-specific multi-object detection tasks.
文摘Bees play a crucial role in the global food chain,pollinating over 75% of food and producing valuable products such as bee pollen,propolis,and royal jelly.However,theAsian hornet poses a serious threat to bee populations by preying on them and disrupting agricultural ecosystems.To address this issue,this study developed a modified YOLOv7tiny(You Only Look Once)model for efficient hornet detection.The model incorporated space-to-depth(SPD)and squeeze-and-excitation(SE)attention mechanisms and involved detailed annotation of the hornet’s head and full body,significantly enhancing the detection of small objects.The Taguchi method was also used to optimize the training parameters,resulting in optimal performance.Data for this study were collected from the Roboflow platformusing a 640×640 resolution dataset.The YOLOv7tinymodel was trained on this dataset.After optimizing the training parameters using the Taguchi method,significant improvements were observed in accuracy,precision,recall,F1 score,andmean average precision(mAP)for hornet detection.Without the hornet head label,incorporating the SPD attentionmechanism resulted in a peakmAP of 98.7%,representing an 8.58%increase over the original YOLOv7tiny.By including the hornet head label and applying the SPD attention mechanism and Soft-CIOU loss function,themAP was further enhanced to 97.3%,a 7.04% increase over the original YOLOv7tiny.Furthermore,the Soft-CIOU Loss function contributed to additional performance enhancements during the validation phase.
基金supported by National Natural Science Foundation of China(61673186 and 61871196)Beijing Normal University Education Reform Project(jx2024040)Guangdong Undergraduate Universities Teaching Quality and Reform Project(jx2024309).
文摘Data augmentation plays an important role in boosting the performance of 3D models,while very few studies handle the 3D point cloud data with this technique.Global augmentation and cut-paste are commonly used augmentation techniques for point clouds,where global augmentation is applied to the entire point cloud of the scene,and cut-paste samples objects from other frames into the current frame.Both types of data augmentation can improve performance,but the cut-paste technique cannot effectively deal with the occlusion relationship between the foreground object and the background scene and the rationality of object sampling,which may be counterproductive and may hurt the overall performance.In addition,LiDAR is susceptible to signal loss,external occlusion,extreme weather and other factors,which can easily cause object shape changes,while global augmentation and cut-paste cannot effectively enhance the robustness of the model.To this end,we propose Syn-Aug,a synchronous data augmentation framework for LiDAR-based 3D object detection.Specifically,we first propose a novel rendering-based object augmentation technique(Ren-Aug)to enrich training data while enhancing scene realism.Second,we propose a local augmentation technique(Local-Aug)to generate local noise by rotating and scaling objects in the scene while avoiding collisions,which can improve generalisation performance.Finally,we make full use of the structural information of 3D labels to make the model more robust by randomly changing the geometry of objects in the training frames.We verify the proposed framework with four different types of 3D object detectors.Experimental results show that our proposed Syn-Aug significantly improves the performance of various 3D object detectors in the KITTI and nuScenes datasets,proving the effectiveness and generality of Syn-Aug.On KITTI,four different types of baseline models using Syn-Aug improved mAP by 0.89%,1.35%,1.61%and 1.14%respectively.On nuScenes,four different types of baseline models using Syn-Aug improved mAP by 14.93%,10.42%,8.47%and 6.81%respectively.The code is available at https://github.com/liuhuaijjin/Syn-Aug.
基金supported in part by the National Science Foundation of China(52371372)the Project of Science and Technology Commission of Shanghai Municipality,China(22JC1401400,21190780300)the 111 Project,China(D18003)
文摘Dear Editor,This letter focuses on the fact that small objects with few pixels disappear in feature maps with large receptive fields, as the network deepens, in object detection tasks. Therefore, the detection of dense small objects is challenging.
文摘Efficient banana crop detection is crucial for precision agriculture;however,traditional remote sensing methods often lack the spatial resolution required for accurate identification.This study utilizes low-altitude Unmanned Aerial Vehicle(UAV)images and deep learning-based object detection models to enhance banana plant detection.A comparative analysis of Faster Region-Based Convolutional Neural Network(Faster R-CNN),You Only Look Once Version 3(YOLOv3),Retina Network(RetinaNet),and Single Shot MultiBox Detector(SSD)was conducted to evaluate their effectiveness.Results show that RetinaNet achieved the highest detection accuracy,with a precision of 96.67%,a recall of 71.67%,and an F1 score of 81.33%.The study further highlights the impact of scale variation,occlusion,and vegetation density on detection performance.Unlike previous studies,this research systematically evaluates multi-scale object detection models for banana plant identification,offering insights into the advantages of UAV-based deep learning applications in agriculture.In addition,this study compares five evaluation metrics across the four detection models using both RGB and grayscale images.Specifically,RetinaNet exhibited the best overall performance with grayscale images,achieving the highest values across all five metrics.Compared to its performance with RGB images,these results represent a marked improvement,confirming the potential of grayscale preprocessing to enhance detection capability.
基金supported by the National Natural Science Foundation of China(Grant No.62302086)the Natural Science Foundation of Liaoning Province(Grant No.2023-MSBA-070)the Fundamental Research Funds for the Central Universities(Grant No.N2317005).
文摘Visible-infrared object detection leverages the day-night stable object perception capability of infrared images to enhance detection robustness in low-light environments by fusing the complementary information of visible and infrared images.However,the inherent differences in the imaging mechanisms of visible and infrared modalities make effective cross-modal fusion challenging.Furthermore,constrained by the physical characteristics of sensors and thermal diffusion effects,infrared images generally suffer from blurred object contours and missing details,making it difficult to extract object features effectively.To address these issues,we propose an infrared-visible image fusion network that realizesmultimodal information fusion of infrared and visible images through a carefully designedmultiscale fusion strategy.First,we design an adaptive gray-radiance enhancement(AGRE)module to strengthen the detail representation in infrared images,improving their usability in complex lighting scenarios.Next,we introduce a channelspatial feature interaction(CSFI)module,which achieves efficient complementarity between the RGB and infrared(IR)modalities via dynamic channel switching and a spatial attention mechanism.Finally,we propose a multi-scale enhanced cross-attention fusion(MSECA)module,which optimizes the fusion ofmulti-level features through dynamic convolution and gating mechanisms and captures long-range complementary relationships of cross-modal features on a global scale,thereby enhancing the expressiveness of the fused features.Experiments on the KAIST,M3FD,and FLIR datasets demonstrate that our method delivers outstanding performance in daytime and nighttime scenarios.On the KAIST dataset,the miss rate drops to 5.99%,and further to 4.26% in night scenes.On the FLIR and M3FD datasets,it achieves AP50 scores of 79.4% and 88.9%,respectively.
基金funded by the Yangtze River Delta Science and Technology Innovation Community Joint Research Project(2023CSJGG1600)the Natural Science Foundation of Anhui Province(2208085MF173)Wuhu“ChiZhu Light”Major Science and Technology Project(2023ZD01,2023ZD03).
文摘As the number and complexity of sensors in autonomous vehicles continue to rise,multimodal fusionbased object detection algorithms are increasingly being used to detect 3D environmental information,significantly advancing the development of perception technology in autonomous driving.To further promote the development of fusion algorithms and improve detection performance,this paper discusses the advantages and recent advancements of multimodal fusion-based object detection algorithms.Starting fromsingle-modal sensor detection,the paper provides a detailed overview of typical sensors used in autonomous driving and introduces object detection methods based on images and point clouds.For image-based detection methods,they are categorized into monocular detection and binocular detection based on different input types.For point cloud-based detection methods,they are classified into projection-based,voxel-based,point cluster-based,pillar-based,and graph structure-based approaches based on the technical pathways for processing point cloud features.Additionally,multimodal fusion algorithms are divided into Camera-LiDAR fusion,Camera-Radar fusion,Camera-LiDAR-Radar fusion,and other sensor fusion methods based on the types of sensors involved.Furthermore,the paper identifies five key future research directions in this field,aiming to provide insights for researchers engaged in multimodal fusion-based object detection algorithms and to encourage broader attention to the research and application of multimodal fusion-based object detection.
文摘Object detection in occluded environments remains a core challenge in computer vision(CV),especially in domains such as autonomous driving and robotics.While Convolutional Neural Network(CNN)-based twodimensional(2D)and three-dimensional(3D)object detection methods havemade significant progress,they often fall short under severe occlusion due to depth ambiguities in 2D imagery and the high cost and deployment limitations of 3D sensors such as Light Detection and Ranging(LiDAR).This paper presents a comparative review of recent 2D and 3D detection models,focusing on their occlusion-handling capabilities and the impact of sensor modalities such as stereo vision,Time-of-Flight(ToF)cameras,and LiDAR.In this context,we introduce FuDensityNet,our multimodal occlusion-aware detection framework that combines Red-Green-Blue(RGB)images and LiDAR data to enhance detection performance.As a forward-looking direction,we propose a monocular depth-estimation extension to FuDensityNet,aimed at replacing expensive 3D sensors with a more scalable CNN-based pipeline.Although this enhancement is not experimentally evaluated in this manuscript,we describe its conceptual design and potential for future implementation.