In order to solve the problem of small objects detection in unmanned aerial vehicle(UAV)aerial images with complex background,a general detection method for multi-scale small objects based on Faster region-based convo...In order to solve the problem of small objects detection in unmanned aerial vehicle(UAV)aerial images with complex background,a general detection method for multi-scale small objects based on Faster region-based convolutional neural network(Faster R-CNN)is proposed.The bird’s nest on the high-voltage tower is taken as the research object.Firstly,we use the improved convolutional neural network ResNet101 to extract object features,and then use multi-scale sliding windows to obtain the object region proposals on the convolution feature maps with different resolutions.Finally,a deconvolution operation is added to further enhance the selected feature map with higher resolution,and then it taken as a feature mapping layer of the region proposals passing to the object detection sub-network.The detection results of the bird’s nest in UAV aerial images show that the proposed method can precisely detect small objects in aerial images.展开更多
Detecting moving objects in the stationary background is an important problem in visual surveillance systems.However,the traditional background subtraction method fails when the background is not completely stationary...Detecting moving objects in the stationary background is an important problem in visual surveillance systems.However,the traditional background subtraction method fails when the background is not completely stationary and involves certain dynamic changes.In this paper,according to the basic steps of the background subtraction method,a novel non-parametric moving object detection method is proposed based on an improved ant colony algorithm by using the Markov random field.Concretely,the contributions are as follows:1)A new nonparametric strategy is utilized to model the background,based on an improved kernel density estimation;this approach uses an adaptive bandwidth,and the fused features combine the colours,gradients and positions.2)A Markov random field method based on this adaptive background model via the constraint of the spatial context is proposed to extract objects.3)The posterior function is maximized efficiently by using an improved ant colony system algorithm.Extensive experiments show that the proposed method demonstrates a better performance than many existing state-of-the-art methods.展开更多
Geospatial objects detection within complex environment is a challenging problem in remote sensing area. In this paper, we derive an extension of the Relevance Vector Machine (RVM) technique to multiple kernel version...Geospatial objects detection within complex environment is a challenging problem in remote sensing area. In this paper, we derive an extension of the Relevance Vector Machine (RVM) technique to multiple kernel version. The proposed method learns an optimal kernel combination and the associated classifier simultaneously. Two feature types are extracted from images, forming basis kernels. Then these basis kernels are weighted combined and resulted the composite kernel exploits interesting points and appearance information of objects simultaneously. Weights and the detection model are finally learnt by a new algorithm. Experimental results show that the proposed method improve detection accuracy to above 88%, yields good interpretation for the selected subset of features and appears sparser than traditional single-kernel RVMs.展开更多
The article deals with the experimental studies of atmosphere indistinct radiation structure. The information extraction background of dot size thermal object presence in atmosphere is reasonable. Indistinct generaliz...The article deals with the experimental studies of atmosphere indistinct radiation structure. The information extraction background of dot size thermal object presence in atmosphere is reasonable. Indistinct generalization of experimental study regularities technique of space-time irregularity radiation structure in infrared wave range is offered. The approach to dot size thermal object detection in atmosphere is proved with a help of threshold method in the thermodynamic and turbulent process conditions, based on the indistinct statement return task solution.展开更多
To support the process of grasping objects on a tabletop for the blind or robotic arm,it is necessary to address fundamental computer vision tasks,such as detecting,recognizing,and locating objects in space,and determ...To support the process of grasping objects on a tabletop for the blind or robotic arm,it is necessary to address fundamental computer vision tasks,such as detecting,recognizing,and locating objects in space,and determining the position of the grasping information.These results can then be used to guide the visually impaired or to execute grasping tasks with a robotic arm.In this paper,we collected,annotated,and published the benchmark TQUGraspingObject dataset for testing,validation,and evaluation of deep learning(DL)models for detecting,recognizing,and localizing grasping objects in 2D and 3D space,especially 3D point cloud data.Our dataset is collected in a shared room,with common everyday objects placed on the tabletop in jumbled positions by Intel RealSense D435(IR-D435).This dataset includes more than 63k RGB-D pairs and related data such as normalized 3D object point cloud,3D object point cloud segmented,coordinate system normalizationmatrix,3D object point cloud normalized,and hand pose for grasping each object.At the same time,we also conducted experiments on fourDL networks with the best performance:SSD-MobileNetV3,ResNet50-Transformer,ResNet101-Transformer,and YOLOv12.The results present that YOLOv12 has the most suitable results in detecting and recognizing objects in images.All data,annotations,toolkit,source code,point cloud data,and results are publicly available on our project website:https://github.com/HuaTThanhIT2327Tqu/datasetv2.展开更多
Visible and infrared(RGB-IR)fusion object detection plays an important role in security,disaster relief,etc.In recent years,deep-learning-based RGB-IR fusion detection methods have been developing rapidly,but still st...Visible and infrared(RGB-IR)fusion object detection plays an important role in security,disaster relief,etc.In recent years,deep-learning-based RGB-IR fusion detection methods have been developing rapidly,but still struggle to deal with the complex and changing scenarios captured by drones,mainly due to two reasons:(A)RGB-IR fusion detectors are susceptible to inferior inputs that degrade performance and stability.(B)RGB-IR fusion detectors are susceptible to redundant features that reduce accuracy and efficiency.In this paper,an innovative RGB-IR fusion detection framework based on global-local feature optimization,named GLFDet,is proposed to improve the detection performance and efficiency of drone-captured objects.The key components of GLFDet include a Global Feature Optimization(GFO)module,a Local Feature Optimization(LFO)module and a Channel Separation Fusion(CSF)module.Specifically,GFO calculates the information content of the input image from the frequency domain and optimizes the features holistically.Then,LFO dynamically selects high-value features and filters out low-value features before fusion,which significantly improves the efficiency of fusion.Finally,CSF fuses the RGB and IR features across the corresponding channels,which avoids the rearrangement of the channel relationships and enhances the model stability.Extensive experimental results show that the proposed method achieves the best performance on three popular RGB-IR datasets Drone Vehicle,VEDAI,and LLVIP.In addition,GLFDet is more lightweight than other comparable models,making it more appealing to edge devices such as drones.The code is available at https://github.com/lao chen330/GLFDet.展开更多
Most image-based object detection methods employ horizontal bounding boxes(HBBs)to capture objects in tunnel images.However,these bounding boxes often fail to effectively enclose objects oriented in arbitrary directio...Most image-based object detection methods employ horizontal bounding boxes(HBBs)to capture objects in tunnel images.However,these bounding boxes often fail to effectively enclose objects oriented in arbitrary directions,resulting in reduced accuracy and suboptimal detection performance.Moreover,HBBs cannot provide directional information for rotated objects.This study proposes a rotated detection method for identifying apparent defects in shield tunnels.Specifically,the oriented region-convolutional neural network(oriented R-CNN)is utilized to detect rotated objects in tunnel images.To enhance feature extraction,a novel hybrid backbone combining CNN-based networks with Swin Transformers is proposed.A feature fusion strategy is employed to integrate features extracted from both networks.Additionally,a neck network based on the bidirectional-feature pyramid network(Bi-FPN)is designed to combine multi-scale object features.The bolt hole dataset is curated to evaluate the efficacyof the proposed method.In addition,a dedicated pre-processing approach is developed for large-sized images to accommodate the rotated,dense,and small-scale characteristics of objects in tunnel images.Experimental results demonstrate that the proposed method achieves a more than 4%improvement in mAP_(50-95)compared to other rotated detectors and a 6.6%-12.7%improvement over mainstream horizontal detectors.Furthermore,the proposed method outperforms mainstream methods by 6.5%-14.7%in detecting leakage bolt holes,underscoring its significant engineering applicability.展开更多
Ensuring the reliability of power transmission networks depends heavily on the early detection of faults in key components such as insulators,which serve both mechanical and electrical functions.Even a single defectiv...Ensuring the reliability of power transmission networks depends heavily on the early detection of faults in key components such as insulators,which serve both mechanical and electrical functions.Even a single defective insulator can lead to equipment breakdown,costly service interruptions,and increased maintenance demands.While unmanned aerial vehicles(UAVs)enable rapid and cost-effective collection of high-resolution imagery,accurate defect identification remains challenging due to cluttered backgrounds,variable lighting,and the diverse appearance of faults.To address these issues,we introduce a real-time inspection framework that integrates an enhanced YOLOv10 detector with a Hybrid Quantum-Enhanced Graph Neural Network(HQGNN).The YOLOv10 module,fine-tuned on domainspecific UAV datasets,improves detection precision,while the HQGNN ensures multi-object tracking and temporal consistency across video frames.This synergy enables reliable and efficient identification of faulty insulators under complex environmental conditions.Experimental results show that the proposed YOLOv10-HQGNN model surpasses existing methods across all metrics,achieving Recall of 0.85 and Average Precision(AP)of 0.83,with clear gains in both accuracy and throughput.These advancements support automated,proactive maintenance strategies that minimize downtime and contribute to a safer,smarter energy infrastructure.展开更多
In the field of smart agriculture,accurate and efficient object detection technology is crucial for automated crop management.A particularly challenging task in this domain is small object detection,such as the identi...In the field of smart agriculture,accurate and efficient object detection technology is crucial for automated crop management.A particularly challenging task in this domain is small object detection,such as the identification of immature fruits or early stage disease spots.These objects pose significant difficulties due to their small pixel coverage,limited feature information,substantial scale variations,and high susceptibility to complex background interference.These challenges frequently result in inadequate accuracy and robustness in current detection models.This study addresses two critical needs in the cashew cultivation industry—fruitmaturity and anthracnose detection—by proposing an improved YOLOv11-NSDDil model.The method introduces three key technological innovations:(1)The SDDil module is designed and integrated into the backbone network.This module combines depthwise separable convolution with the SimAM attention mechanism to expand the receptive field and enhance contextual semantic capture at a low computational cost,effectively alleviating the feature deficiency problem caused by limited pixel coverage of small objects.Simultaneously,the SDmodule dynamically enhances discriminative features and suppresses background noise,significantly improving the model’s feature discrimination capability in complex environments;(2)The introduction of the DynamicScalSeq-Zoom_cat neck network,significantly improving multi-scale feature fusion;and(3)The optimization of the Minimum Point Distance Intersection over Union(MPDIoU)loss function,which enhances bounding box localization accuracy byminimizing vertex distance.Experimental results on a self-constructed cashew dataset containing 1123 images demonstrate significant performance improvements in the enhanced model:mAP50 reaches 0.825,a 4.6% increase compared to the originalYOLOv11;mAP50-95 improves to 0.624,a 6.5% increase;and recall rises to 0.777,a 2.4%increase.This provides a reliable technical solution for intelligent quality inspection of agricultural products and holds broad application prospects.展开更多
To address critical challenges in nighttime ship detection—high small-target missed detection(over 20%),insufficient lightweighting,and limited generalization due to scarce,low-quality datasets—this study proposes a...To address critical challenges in nighttime ship detection—high small-target missed detection(over 20%),insufficient lightweighting,and limited generalization due to scarce,low-quality datasets—this study proposes a systematic solution.First,a high-quality Night-Ships dataset is constructed via CycleGAN-based day-night transfer,combined with a dual-threshold cleaning strategy(Laplacian variance sharpness filtering and brightness-color deviation screening).Second,a Cross-stage Lightweight Fusion-You Only Look Once version 8(CLF-YOLOv8)is proposed with key improvements:the Neck network is reconstructed by replacing Cross Stage Partial(CSP)structure with the Cross Stage Partial Multi-Scale Convolutional Block(CSP-MSCB)and integrating Bidirectional Feature Pyramid Network(BiFPN)for weighted multi-scale fusion to enhance small-target detection;a Lightweight Shared Convolutional and Separated Batch Normalization Detection-Head(LSCSBD-Head)with shared convolutions and layer-wise Batch Normalization(BN)reduces parameters to 1.8M(42% fewer than YOLOv8n);and the FocalMinimum Point Distance Intersection over Union(Focal-MPDIoU)loss combines Minimum Point Distance Intersection over Union(MPDIoU)geometric constraints and Focal weighting to optimize low-overlap targets.Experiments show CLFYOLOv8 achieves 97.6%mAP@0.5(0.7% higher than YOLOv8n)with 1.8 M parameters,outperforming mainstream models in small-target detection,overlapping target discrimination,and adaptability to complex lighting.展开更多
Modern manufacturing processes have become more reliant on automation because of the accelerated transition from Industry 3.0 to Industry 4.0.Manual inspection of products on assembly lines remains inefficient,prone t...Modern manufacturing processes have become more reliant on automation because of the accelerated transition from Industry 3.0 to Industry 4.0.Manual inspection of products on assembly lines remains inefficient,prone to errors and lacks consistency,emphasizing the need for a reliable and automated inspection system.Leveraging both object detection and image segmentation approaches,this research proposes a vision-based solution for the detection of various kinds of tools in the toolkit using deep learning(DL)models.Two Intel RealSense D455f depth cameras were arranged in a top down configuration to capture both RGB and depth images of the toolkits.After applying multiple constraints and enhancing them through preprocessing and augmentation,a dataset consisting of 3300 annotated RGB-D photos was generated.Several DL models were selected through a comprehensive assessment of mean Average Precision(mAP),precision-recall equilibrium,inference latency(target≥30 FPS),and computational burden,resulting in a preference for YOLO and Region-based Convolutional Neural Networks(R-CNN)variants over ViT-based models due to the latter’s increased latency and resource requirements.YOLOV5,YOLOV8,YOLOV11,Faster R-CNN,and Mask R-CNN were trained on the annotated dataset and evaluated using key performance metrics(Recall,Accuracy,F1-score,and Precision).YOLOV11 demonstrated balanced excellence with 93.0%precision,89.9%recall,and a 90.6%F1-score in object detection,as well as 96.9%precision,95.3%recall,and a 96.5%F1-score in instance segmentation with an average inference time of 25 ms per frame(≈40 FPS),demonstrating real-time performance.Leveraging these results,a YOLOV11-based windows application was successfully deployed in a real-time assembly line environment,where it accurately processed live video streams to detect and segment tools within toolkits,demonstrating its practical effectiveness in industrial automation.The application is capable of precisely measuring socket dimensions by utilising edge detection techniques on YOLOv11 segmentation masks,in addition to detection and segmentation.This makes it possible to do specification-level quality control right on the assembly line,which improves the ability to examine things in real time.The implementation is a big step forward for intelligent manufacturing in the Industry 4.0 paradigm.It provides a scalable,efficient,and accurate way to do automated inspection and dimensional verification activities.展开更多
With the rapid expansion of drone applications,accurate detection of objects in aerial imagery has become crucial for intelligent transportation,urban management,and emergency rescue missions.However,existing methods ...With the rapid expansion of drone applications,accurate detection of objects in aerial imagery has become crucial for intelligent transportation,urban management,and emergency rescue missions.However,existing methods face numerous challenges in practical deployment,including scale variation handling,feature degradation,and complex backgrounds.To address these issues,we propose Edge-enhanced and Detail-Capturing You Only Look Once(EHDC-YOLO),a novel framework for object detection in Unmanned Aerial Vehicle(UAV)imagery.Based on the You Only Look Once version 11 nano(YOLOv11n)baseline,EHDC-YOLO systematically introduces several architectural enhancements:(1)a Multi-Scale Edge Enhancement(MSEE)module that leverages multi-scale pooling and edge information to enhance boundary feature extraction;(2)an Enhanced Feature Pyramid Network(EFPN)that integrates P2-level features with Cross Stage Partial(CSP)structures and OmniKernel convolutions for better fine-grained representation;and(3)Dynamic Head(DyHead)with multi-dimensional attention mechanisms for enhanced cross-scale modeling and perspective adaptability.Comprehensive experiments on the Vision meets Drones for Detection(VisDrone-DET)2019 dataset demonstrate that EHDC-YOLO achieves significant improvements,increasing mean Average Precision(mAP)@0.5 from 33.2%to 46.1%(an absolute improvement of 12.9 percentage points)and mAP@0.5:0.95 from 19.5%to 28.0%(an absolute improvement of 8.5 percentage points)compared with the YOLOv11n baseline,while maintaining a reasonable parameter count(2.81 M vs the baseline’s 2.58 M).Further ablation studies confirm the effectiveness of each proposed component,while visualization results highlight EHDC-YOLO’s superior performance in detecting objects and handling occlusions in complex drone scenarios.展开更多
Desert shrubs are indispensable in maintaining ecological stability by reducing soil erosion,enhancing water retention,and boosting soil fertility,which are critical factors in mitigating desertification processes.Due...Desert shrubs are indispensable in maintaining ecological stability by reducing soil erosion,enhancing water retention,and boosting soil fertility,which are critical factors in mitigating desertification processes.Due to the complex topography,variable climate,and challenges in field surveys in desert regions,this paper proposes YOLO-Desert-Shrub(YOLO-DS),a detection method for identifying desert shrubs in UAV remote sensing images based on an enhanced YOLOv8n framework.This method accurately identifying shrub species,locations,and coverage.To address the issue of small individual plants dominating the dataset,the SPDconv convolution module is introduced in the Backbone and Neck layers of the YOLOv8n model,replacing conventional convolutions.This structural optimization mitigates information degradation in fine-grained data while strengthening discriminative feature capture across spatial scales within desert shrub datasets.Furthermore,a structured state-space model is integrated into the main network,and the MambaLayer is designed to dynamically extract and refine shrub-specific features from remote sensing images,effectively filtering out background noise and irrelevant interference to enhance feature representation.Benchmark evaluations reveal the YOLO-DS framework attains 79.56%mAP40weight,demonstrating 2.2%absolute gain versus the baseline YOLOv8n architecture,with statistically significant advantages over contemporary detectors in cross-validation trials.The predicted plant coverage exhibits strong consistency with manually measured coverage,with a coefficient of determination(R^(2))of 0.9148 and a Root Mean Square Error(RMSE)of1.8266%.The proposed UAV-based remote sensing method utilizing the YOLO-DS effectively identify and locate desert shrubs,monitor canopy sizes and distribution,and provide technical support for automated desert shrub monitoring.展开更多
Object detection,a major challenge in computer vision and pattern recognition,plays a significant part in many applications,crossing artificial intelligence,face recognition,and autonomous driving.It involves focusing...Object detection,a major challenge in computer vision and pattern recognition,plays a significant part in many applications,crossing artificial intelligence,face recognition,and autonomous driving.It involves focusing on identifying the detection,localization,and categorization of targets in images.A particularly important emerging task is distinguishing real animals from toy replicas in real-time,mostly for smart camera systems in both urban and natural environments.However,that difficult task is affected by factors such as showing angle,occlusion,light intensity,variations,and texture differences.To tackle these challenges,this paper recommends Group Sparse YOLOv8(You Only Look Once version 8),an improved real-time object detection algorithm that improves YOLOv8 by integrating group sparsity regularization.This adjustment improves efficiency and accuracy while utilizing the computational costs and power consumption,including a frame selection approach.And a hybrid parallel processing method that merges pipelining with dataflow strategies to improve the performance.Established using a custom dataset of toy and real animal images along with well-known datasets,namely ImageNet,MSCOCO,and CIFAR-10/100.The combination of Group Sparsity with YOLOv8 shows high detection accuracy with lower latency.Here provides a real and resource-efficient solution for intelligent camera systems and improves real-time object detection and classification in environments,differentiating between real and toy animals.展开更多
Online examinations have become a dominant assessment mode,increasing concerns over academic integrity.To address the critical challenge of detecting cheating behaviours,this study proposes a hybrid deep learning appr...Online examinations have become a dominant assessment mode,increasing concerns over academic integrity.To address the critical challenge of detecting cheating behaviours,this study proposes a hybrid deep learning approach that combines visual detection and temporal behaviour classification.The methodology utilises object detection models—You Only Look Once(YOLOv12),Faster Region-based Convolutional Neural Network(RCNN),and Single Shot Detector(SSD)MobileNet—integrated with classification models such as Convolutional Neural Networks(CNN),Bidirectional Gated Recurrent Unit(Bi-GRU),and CNN-LSTM(Long Short-Term Memory).Two distinct datasets were used:the Online Exam Proctoring(EOP)dataset from Michigan State University and the School of Computer Science,Duy Tan Unievrsity(SCS-DTU)dataset collected in a controlled classroom setting.A diverse set of cheating behaviours,including book usage,unauthorised interaction,internet access,and mobile phone use,was categorised.Comprehensive experiments evaluated the models based on accuracy,precision,recall,training time,inference speed,and memory usage.We evaluate nine detector-classifier pairings under a unified budget and score them via a calibrated harmonic mean of detection and classification accuracies,enabling deployment-oriented selection under latency and memory constraints.Macro-Precision/Recall/F1 and Receiver Operating Characteristic-Area Under the Curve(ROC-AUC)are reported for the top configurations,revealing consistent advantages of object-centric pipelines for fine-grained cheating cues.The highest overall score is achieved by YOLOv12+CNN(97.15%accuracy),while SSD-MobileNet+CNN provides the best speed-efficiency trade-off for edge devices.This research provides valuable insights into selecting and deploying appropriate deep learning models for maintaining exam integrity under varying resource constraints.展开更多
Traffic sign detection is a critical component of driving systems.Single-stage network-based traffic sign detection algorithms,renowned for their fast detection speeds and high accuracy,have become the dominant approa...Traffic sign detection is a critical component of driving systems.Single-stage network-based traffic sign detection algorithms,renowned for their fast detection speeds and high accuracy,have become the dominant approach in current practices.However,in complex and dynamic traffic scenes,particularly with smaller traffic sign objects,challenges such as missed and false detections can lead to reduced overall detection accuracy.To address this issue,this paper proposes a detection algorithm that integrates edge and shape information.Recognizing that traffic signs have specific shapes and distinct edge contours,this paper introduces an edge feature extraction branch within the backbone network,enabling adaptive fusion with features of the same hierarchical level.Additionally,a shape prior convolution module is designed to replaces the first two convolutional modules of the backbone network,aimed at enhancing the model's perception ability for specific shape objects and reducing its sensitivity to background noise.The algorithm was evaluated on the CCTSDB and TT100k datasets,and compared to YOLOv8s,the mAP50 values increased by 3.0%and 10.4%,respectively,demonstrating the effectiveness of the proposed method in improving the accuracy of traffic sign detection.展开更多
Small object detection has been a focus of attention since the emergence of deep learning-based object detection.Although classical object detection frameworks have made significant contributions to the development of...Small object detection has been a focus of attention since the emergence of deep learning-based object detection.Although classical object detection frameworks have made significant contributions to the development of object detection,there are still many issues to be resolved in detecting small objects due to the inherent complexity and diversity of real-world visual scenes.In particular,the YOLO(You Only Look Once)series of detection models,renowned for their real-time performance,have undergone numerous adaptations aimed at improving the detection of small targets.In this survey,we summarize the state-of-the-art YOLO-based small object detection methods.This review presents a systematic categorization of YOLO-based approaches for small-object detection,organized into four methodological avenues,namely attention-based feature enhancement,detection-head optimization,loss function,and multi-scale feature fusion strategies.We then examine the principal challenges addressed by each category.Finally,we analyze the performance of thesemethods on public benchmarks and,by comparing current approaches,identify limitations and outline directions for future research.展开更多
Deep learning has made significant progress in the field of oriented object detection for remote sensing images.However,existing methods still face challenges when dealing with difficult tasks such as multi-scale targ...Deep learning has made significant progress in the field of oriented object detection for remote sensing images.However,existing methods still face challenges when dealing with difficult tasks such as multi-scale targets,complex backgrounds,and small objects in remote sensing.Maintaining model lightweight to address resource constraints in remote sensing scenarios while improving task completion for remote sensing tasks remains a research hotspot.Therefore,we propose an enhanced multi-scale feature extraction lightweight network EM-YOLO based on the YOLOv8s architecture,specifically optimized for the characteristics of large target scale variations,diverse orientations,and numerous small objects in remote sensing images.Our innovations lie in two main aspects:First,a dynamic snake convolution(DSC)is introduced into the backbone network to enhance the model’s feature extraction capability for oriented targets.Second,an innovative focusing-diffusion module is designed in the feature fusion neck to effectively integrate multi-scale feature information.Finally,we introduce Layer-Adaptive Sparsity for magnitude-based Pruning(LASP)method to perform lightweight network pruning to better complete tasks in resource-constrained scenarios.Experimental results on the lightweight platform Orin demonstrate that the proposed method significantly outperforms the original YOLOv8s model in oriented remote sensing object detection tasks,and achieves comparable or superior performance to state-of-the-art methods on three authoritative remote sensing datasets(DOTA v1.0,DOTA v1.5,and HRSC2016).展开更多
In recent years,with the rapid advancement of artificial intelligence,object detection algorithms have made significant strides in accuracy and computational efficiency.Notably,research and applications of Anchor-Free...In recent years,with the rapid advancement of artificial intelligence,object detection algorithms have made significant strides in accuracy and computational efficiency.Notably,research and applications of Anchor-Free models have opened new avenues for real-time target detection in optical remote sensing images(ORSIs).However,in the realmof adversarial attacks,developing adversarial techniques tailored to Anchor-Freemodels remains challenging.Adversarial examples generated based on Anchor-Based models often exhibit poor transferability to these new model architectures.Furthermore,the growing diversity of Anchor-Free models poses additional hurdles to achieving robust transferability of adversarial attacks.This study presents an improved cross-conv-block feature fusion You Only Look Once(YOLO)architecture,meticulously engineered to facilitate the extraction ofmore comprehensive semantic features during the backpropagation process.To address the asymmetry between densely distributed objects in ORSIs and the corresponding detector outputs,a novel dense bounding box attack strategy is proposed.This approach leverages dense target bounding boxes loss in the calculation of adversarial loss functions.Furthermore,by integrating translation-invariant(TI)and momentum-iteration(MI)adversarial methodologies,the proposed framework significantly improves the transferability of adversarial attacks.Experimental results demonstrate that our method achieves superior adversarial attack performance,with adversarial transferability rates(ATR)of 67.53%on the NWPU VHR-10 dataset and 90.71%on the HRSC2016 dataset.Compared to ensemble adversarial attack and cascaded adversarial attack approaches,our method generates adversarial examples in an average of 0.64 s,representing an approximately 14.5%improvement in efficiency under equivalent conditions.展开更多
To solve the false detection and missed detection problems caused by various types and sizes of defects in the detection of steel surface defects,similar defects and background features,and similarities between differ...To solve the false detection and missed detection problems caused by various types and sizes of defects in the detection of steel surface defects,similar defects and background features,and similarities between different defects,this paper proposes a lightweight detection model named multiscale edge and squeeze-and-excitation attention detection network(MSESE),which is built upon the You Only Look Once version 11 nano(YOLOv11n).To address the difficulty of locating defect edges,we first propose an edge enhancement module(EEM),apply it to the process of multiscale feature extraction,and then propose a multiscale edge enhancement module(MSEEM).By obtaining defect features from different scales and enhancing their edge contours,the module uses the dual-domain selection mechanism to effectively focus on the important areas in the image to ensure that the feature images have richer information and clearer contour features.By fusing the squeeze-and-excitation attention mechanism with the EEM,we obtain a lighter module that can enhance the representation of edge features,which is named the edge enhancement module with squeeze-and-excitation attention(EEMSE).This module was subsequently integrated into the detection head.The enhanced detection head achieves improved edge feature enhancement with reduced computational overhead,while effectively adjusting channel-wise importance and further refining feature representation.Experiments on the NEU-DET dataset show that,compared with the original YOLOv11n,the improved model achieves improvements of 4.1%and 2.2%in terms of mAP@0.5 and mAP@0.5:0.95,respectively,and the GFLOPs value decreases from the original value of 6.4 to 6.2.Furthermore,when compared to current mainstream models,Mamba-YOLOT and RTDETR-R34,our method achieves superior performance with 6.5%and 8.9%higher mAP@0.5,respectively,while maintaining a more compact parameter footprint.These results collectively validate the effectiveness and efficiency of our proposed approach.展开更多
基金National Defense Pre-research Fund Project(No.KMGY318002531)。
文摘In order to solve the problem of small objects detection in unmanned aerial vehicle(UAV)aerial images with complex background,a general detection method for multi-scale small objects based on Faster region-based convolutional neural network(Faster R-CNN)is proposed.The bird’s nest on the high-voltage tower is taken as the research object.Firstly,we use the improved convolutional neural network ResNet101 to extract object features,and then use multi-scale sliding windows to obtain the object region proposals on the convolution feature maps with different resolutions.Finally,a deconvolution operation is added to further enhance the selected feature map with higher resolution,and then it taken as a feature mapping layer of the region proposals passing to the object detection sub-network.The detection results of the bird’s nest in UAV aerial images show that the proposed method can precisely detect small objects in aerial images.
基金supported in part by the National Natural Science Foundation of China under Grants 61841103,61673164,and 61602397in part by the Natural Science Foundation of Hunan Provincial under Grants 2016JJ2041 and 2019JJ50106+1 种基金in part by the Key Project of Education Department of Hunan Provincial under Grant 18B385and in part by the Graduate Research Innovation Projects of Hunan Province under Grants CX2018B805 and CX2018B813.
文摘Detecting moving objects in the stationary background is an important problem in visual surveillance systems.However,the traditional background subtraction method fails when the background is not completely stationary and involves certain dynamic changes.In this paper,according to the basic steps of the background subtraction method,a novel non-parametric moving object detection method is proposed based on an improved ant colony algorithm by using the Markov random field.Concretely,the contributions are as follows:1)A new nonparametric strategy is utilized to model the background,based on an improved kernel density estimation;this approach uses an adaptive bandwidth,and the fused features combine the colours,gradients and positions.2)A Markov random field method based on this adaptive background model via the constraint of the spatial context is proposed to extract objects.3)The posterior function is maximized efficiently by using an improved ant colony system algorithm.Extensive experiments show that the proposed method demonstrates a better performance than many existing state-of-the-art methods.
基金Supported by the National Natural Science Foundation of China (No.41001285)
文摘Geospatial objects detection within complex environment is a challenging problem in remote sensing area. In this paper, we derive an extension of the Relevance Vector Machine (RVM) technique to multiple kernel version. The proposed method learns an optimal kernel combination and the associated classifier simultaneously. Two feature types are extracted from images, forming basis kernels. Then these basis kernels are weighted combined and resulted the composite kernel exploits interesting points and appearance information of objects simultaneously. Weights and the detection model are finally learnt by a new algorithm. Experimental results show that the proposed method improve detection accuracy to above 88%, yields good interpretation for the selected subset of features and appears sparser than traditional single-kernel RVMs.
文摘The article deals with the experimental studies of atmosphere indistinct radiation structure. The information extraction background of dot size thermal object presence in atmosphere is reasonable. Indistinct generalization of experimental study regularities technique of space-time irregularity radiation structure in infrared wave range is offered. The approach to dot size thermal object detection in atmosphere is proved with a help of threshold method in the thermodynamic and turbulent process conditions, based on the indistinct statement return task solution.
文摘To support the process of grasping objects on a tabletop for the blind or robotic arm,it is necessary to address fundamental computer vision tasks,such as detecting,recognizing,and locating objects in space,and determining the position of the grasping information.These results can then be used to guide the visually impaired or to execute grasping tasks with a robotic arm.In this paper,we collected,annotated,and published the benchmark TQUGraspingObject dataset for testing,validation,and evaluation of deep learning(DL)models for detecting,recognizing,and localizing grasping objects in 2D and 3D space,especially 3D point cloud data.Our dataset is collected in a shared room,with common everyday objects placed on the tabletop in jumbled positions by Intel RealSense D435(IR-D435).This dataset includes more than 63k RGB-D pairs and related data such as normalized 3D object point cloud,3D object point cloud segmented,coordinate system normalizationmatrix,3D object point cloud normalized,and hand pose for grasping each object.At the same time,we also conducted experiments on fourDL networks with the best performance:SSD-MobileNetV3,ResNet50-Transformer,ResNet101-Transformer,and YOLOv12.The results present that YOLOv12 has the most suitable results in detecting and recognizing objects in images.All data,annotations,toolkit,source code,point cloud data,and results are publicly available on our project website:https://github.com/HuaTThanhIT2327Tqu/datasetv2.
基金supported by the National Natural Science Foundation of China(No.62276204)the Fundamental Research Funds for the Central Universities,China(No.YJSJ24011)+1 种基金the Natural Science Basic Research Program of Shaanxi,China(Nos.2022JM-340 and 2023-JC-QN-0710)the China Postdoctoral Science Foundation(Nos.2020T130494 and 2018M633470)。
文摘Visible and infrared(RGB-IR)fusion object detection plays an important role in security,disaster relief,etc.In recent years,deep-learning-based RGB-IR fusion detection methods have been developing rapidly,but still struggle to deal with the complex and changing scenarios captured by drones,mainly due to two reasons:(A)RGB-IR fusion detectors are susceptible to inferior inputs that degrade performance and stability.(B)RGB-IR fusion detectors are susceptible to redundant features that reduce accuracy and efficiency.In this paper,an innovative RGB-IR fusion detection framework based on global-local feature optimization,named GLFDet,is proposed to improve the detection performance and efficiency of drone-captured objects.The key components of GLFDet include a Global Feature Optimization(GFO)module,a Local Feature Optimization(LFO)module and a Channel Separation Fusion(CSF)module.Specifically,GFO calculates the information content of the input image from the frequency domain and optimizes the features holistically.Then,LFO dynamically selects high-value features and filters out low-value features before fusion,which significantly improves the efficiency of fusion.Finally,CSF fuses the RGB and IR features across the corresponding channels,which avoids the rearrangement of the channel relationships and enhances the model stability.Extensive experimental results show that the proposed method achieves the best performance on three popular RGB-IR datasets Drone Vehicle,VEDAI,and LLVIP.In addition,GLFDet is more lightweight than other comparable models,making it more appealing to edge devices such as drones.The code is available at https://github.com/lao chen330/GLFDet.
基金support from the National Natural Science Foundation of China(Grant Nos.52025084 and 52408420)the Beijing Natural Science Foundation(Grant No.8244058).
文摘Most image-based object detection methods employ horizontal bounding boxes(HBBs)to capture objects in tunnel images.However,these bounding boxes often fail to effectively enclose objects oriented in arbitrary directions,resulting in reduced accuracy and suboptimal detection performance.Moreover,HBBs cannot provide directional information for rotated objects.This study proposes a rotated detection method for identifying apparent defects in shield tunnels.Specifically,the oriented region-convolutional neural network(oriented R-CNN)is utilized to detect rotated objects in tunnel images.To enhance feature extraction,a novel hybrid backbone combining CNN-based networks with Swin Transformers is proposed.A feature fusion strategy is employed to integrate features extracted from both networks.Additionally,a neck network based on the bidirectional-feature pyramid network(Bi-FPN)is designed to combine multi-scale object features.The bolt hole dataset is curated to evaluate the efficacyof the proposed method.In addition,a dedicated pre-processing approach is developed for large-sized images to accommodate the rotated,dense,and small-scale characteristics of objects in tunnel images.Experimental results demonstrate that the proposed method achieves a more than 4%improvement in mAP_(50-95)compared to other rotated detectors and a 6.6%-12.7%improvement over mainstream horizontal detectors.Furthermore,the proposed method outperforms mainstream methods by 6.5%-14.7%in detecting leakage bolt holes,underscoring its significant engineering applicability.
基金supported by Ho Chi Minh City Open University,Vietnam and Suan Sunandha Rajabhat Univeristy,Thailand.
文摘Ensuring the reliability of power transmission networks depends heavily on the early detection of faults in key components such as insulators,which serve both mechanical and electrical functions.Even a single defective insulator can lead to equipment breakdown,costly service interruptions,and increased maintenance demands.While unmanned aerial vehicles(UAVs)enable rapid and cost-effective collection of high-resolution imagery,accurate defect identification remains challenging due to cluttered backgrounds,variable lighting,and the diverse appearance of faults.To address these issues,we introduce a real-time inspection framework that integrates an enhanced YOLOv10 detector with a Hybrid Quantum-Enhanced Graph Neural Network(HQGNN).The YOLOv10 module,fine-tuned on domainspecific UAV datasets,improves detection precision,while the HQGNN ensures multi-object tracking and temporal consistency across video frames.This synergy enables reliable and efficient identification of faulty insulators under complex environmental conditions.Experimental results show that the proposed YOLOv10-HQGNN model surpasses existing methods across all metrics,achieving Recall of 0.85 and Average Precision(AP)of 0.83,with clear gains in both accuracy and throughput.These advancements support automated,proactive maintenance strategies that minimize downtime and contribute to a safer,smarter energy infrastructure.
基金supported by Hebei North University Doctoral Research Fund Project(No.BSJJ202315)the Youth Research Fund Project of Higher Education Institutions in Hebei Province(No.QN2024146).
文摘In the field of smart agriculture,accurate and efficient object detection technology is crucial for automated crop management.A particularly challenging task in this domain is small object detection,such as the identification of immature fruits or early stage disease spots.These objects pose significant difficulties due to their small pixel coverage,limited feature information,substantial scale variations,and high susceptibility to complex background interference.These challenges frequently result in inadequate accuracy and robustness in current detection models.This study addresses two critical needs in the cashew cultivation industry—fruitmaturity and anthracnose detection—by proposing an improved YOLOv11-NSDDil model.The method introduces three key technological innovations:(1)The SDDil module is designed and integrated into the backbone network.This module combines depthwise separable convolution with the SimAM attention mechanism to expand the receptive field and enhance contextual semantic capture at a low computational cost,effectively alleviating the feature deficiency problem caused by limited pixel coverage of small objects.Simultaneously,the SDmodule dynamically enhances discriminative features and suppresses background noise,significantly improving the model’s feature discrimination capability in complex environments;(2)The introduction of the DynamicScalSeq-Zoom_cat neck network,significantly improving multi-scale feature fusion;and(3)The optimization of the Minimum Point Distance Intersection over Union(MPDIoU)loss function,which enhances bounding box localization accuracy byminimizing vertex distance.Experimental results on a self-constructed cashew dataset containing 1123 images demonstrate significant performance improvements in the enhanced model:mAP50 reaches 0.825,a 4.6% increase compared to the originalYOLOv11;mAP50-95 improves to 0.624,a 6.5% increase;and recall rises to 0.777,a 2.4%increase.This provides a reliable technical solution for intelligent quality inspection of agricultural products and holds broad application prospects.
基金the Shandong Provincial Key Research and Development Program(Grant No.2024SFGC0201).
文摘To address critical challenges in nighttime ship detection—high small-target missed detection(over 20%),insufficient lightweighting,and limited generalization due to scarce,low-quality datasets—this study proposes a systematic solution.First,a high-quality Night-Ships dataset is constructed via CycleGAN-based day-night transfer,combined with a dual-threshold cleaning strategy(Laplacian variance sharpness filtering and brightness-color deviation screening).Second,a Cross-stage Lightweight Fusion-You Only Look Once version 8(CLF-YOLOv8)is proposed with key improvements:the Neck network is reconstructed by replacing Cross Stage Partial(CSP)structure with the Cross Stage Partial Multi-Scale Convolutional Block(CSP-MSCB)and integrating Bidirectional Feature Pyramid Network(BiFPN)for weighted multi-scale fusion to enhance small-target detection;a Lightweight Shared Convolutional and Separated Batch Normalization Detection-Head(LSCSBD-Head)with shared convolutions and layer-wise Batch Normalization(BN)reduces parameters to 1.8M(42% fewer than YOLOv8n);and the FocalMinimum Point Distance Intersection over Union(Focal-MPDIoU)loss combines Minimum Point Distance Intersection over Union(MPDIoU)geometric constraints and Focal weighting to optimize low-overlap targets.Experiments show CLFYOLOv8 achieves 97.6%mAP@0.5(0.7% higher than YOLOv8n)with 1.8 M parameters,outperforming mainstream models in small-target detection,overlapping target discrimination,and adaptability to complex lighting.
文摘Modern manufacturing processes have become more reliant on automation because of the accelerated transition from Industry 3.0 to Industry 4.0.Manual inspection of products on assembly lines remains inefficient,prone to errors and lacks consistency,emphasizing the need for a reliable and automated inspection system.Leveraging both object detection and image segmentation approaches,this research proposes a vision-based solution for the detection of various kinds of tools in the toolkit using deep learning(DL)models.Two Intel RealSense D455f depth cameras were arranged in a top down configuration to capture both RGB and depth images of the toolkits.After applying multiple constraints and enhancing them through preprocessing and augmentation,a dataset consisting of 3300 annotated RGB-D photos was generated.Several DL models were selected through a comprehensive assessment of mean Average Precision(mAP),precision-recall equilibrium,inference latency(target≥30 FPS),and computational burden,resulting in a preference for YOLO and Region-based Convolutional Neural Networks(R-CNN)variants over ViT-based models due to the latter’s increased latency and resource requirements.YOLOV5,YOLOV8,YOLOV11,Faster R-CNN,and Mask R-CNN were trained on the annotated dataset and evaluated using key performance metrics(Recall,Accuracy,F1-score,and Precision).YOLOV11 demonstrated balanced excellence with 93.0%precision,89.9%recall,and a 90.6%F1-score in object detection,as well as 96.9%precision,95.3%recall,and a 96.5%F1-score in instance segmentation with an average inference time of 25 ms per frame(≈40 FPS),demonstrating real-time performance.Leveraging these results,a YOLOV11-based windows application was successfully deployed in a real-time assembly line environment,where it accurately processed live video streams to detect and segment tools within toolkits,demonstrating its practical effectiveness in industrial automation.The application is capable of precisely measuring socket dimensions by utilising edge detection techniques on YOLOv11 segmentation masks,in addition to detection and segmentation.This makes it possible to do specification-level quality control right on the assembly line,which improves the ability to examine things in real time.The implementation is a big step forward for intelligent manufacturing in the Industry 4.0 paradigm.It provides a scalable,efficient,and accurate way to do automated inspection and dimensional verification activities.
文摘With the rapid expansion of drone applications,accurate detection of objects in aerial imagery has become crucial for intelligent transportation,urban management,and emergency rescue missions.However,existing methods face numerous challenges in practical deployment,including scale variation handling,feature degradation,and complex backgrounds.To address these issues,we propose Edge-enhanced and Detail-Capturing You Only Look Once(EHDC-YOLO),a novel framework for object detection in Unmanned Aerial Vehicle(UAV)imagery.Based on the You Only Look Once version 11 nano(YOLOv11n)baseline,EHDC-YOLO systematically introduces several architectural enhancements:(1)a Multi-Scale Edge Enhancement(MSEE)module that leverages multi-scale pooling and edge information to enhance boundary feature extraction;(2)an Enhanced Feature Pyramid Network(EFPN)that integrates P2-level features with Cross Stage Partial(CSP)structures and OmniKernel convolutions for better fine-grained representation;and(3)Dynamic Head(DyHead)with multi-dimensional attention mechanisms for enhanced cross-scale modeling and perspective adaptability.Comprehensive experiments on the Vision meets Drones for Detection(VisDrone-DET)2019 dataset demonstrate that EHDC-YOLO achieves significant improvements,increasing mean Average Precision(mAP)@0.5 from 33.2%to 46.1%(an absolute improvement of 12.9 percentage points)and mAP@0.5:0.95 from 19.5%to 28.0%(an absolute improvement of 8.5 percentage points)compared with the YOLOv11n baseline,while maintaining a reasonable parameter count(2.81 M vs the baseline’s 2.58 M).Further ablation studies confirm the effectiveness of each proposed component,while visualization results highlight EHDC-YOLO’s superior performance in detecting objects and handling occlusions in complex drone scenarios.
基金supported by the National Public Welfare Forest Desert Shrubbery Monitoring Project。
文摘Desert shrubs are indispensable in maintaining ecological stability by reducing soil erosion,enhancing water retention,and boosting soil fertility,which are critical factors in mitigating desertification processes.Due to the complex topography,variable climate,and challenges in field surveys in desert regions,this paper proposes YOLO-Desert-Shrub(YOLO-DS),a detection method for identifying desert shrubs in UAV remote sensing images based on an enhanced YOLOv8n framework.This method accurately identifying shrub species,locations,and coverage.To address the issue of small individual plants dominating the dataset,the SPDconv convolution module is introduced in the Backbone and Neck layers of the YOLOv8n model,replacing conventional convolutions.This structural optimization mitigates information degradation in fine-grained data while strengthening discriminative feature capture across spatial scales within desert shrub datasets.Furthermore,a structured state-space model is integrated into the main network,and the MambaLayer is designed to dynamically extract and refine shrub-specific features from remote sensing images,effectively filtering out background noise and irrelevant interference to enhance feature representation.Benchmark evaluations reveal the YOLO-DS framework attains 79.56%mAP40weight,demonstrating 2.2%absolute gain versus the baseline YOLOv8n architecture,with statistically significant advantages over contemporary detectors in cross-validation trials.The predicted plant coverage exhibits strong consistency with manually measured coverage,with a coefficient of determination(R^(2))of 0.9148 and a Root Mean Square Error(RMSE)of1.8266%.The proposed UAV-based remote sensing method utilizing the YOLO-DS effectively identify and locate desert shrubs,monitor canopy sizes and distribution,and provide technical support for automated desert shrub monitoring.
文摘Object detection,a major challenge in computer vision and pattern recognition,plays a significant part in many applications,crossing artificial intelligence,face recognition,and autonomous driving.It involves focusing on identifying the detection,localization,and categorization of targets in images.A particularly important emerging task is distinguishing real animals from toy replicas in real-time,mostly for smart camera systems in both urban and natural environments.However,that difficult task is affected by factors such as showing angle,occlusion,light intensity,variations,and texture differences.To tackle these challenges,this paper recommends Group Sparse YOLOv8(You Only Look Once version 8),an improved real-time object detection algorithm that improves YOLOv8 by integrating group sparsity regularization.This adjustment improves efficiency and accuracy while utilizing the computational costs and power consumption,including a frame selection approach.And a hybrid parallel processing method that merges pipelining with dataflow strategies to improve the performance.Established using a custom dataset of toy and real animal images along with well-known datasets,namely ImageNet,MSCOCO,and CIFAR-10/100.The combination of Group Sparsity with YOLOv8 shows high detection accuracy with lower latency.Here provides a real and resource-efficient solution for intelligent camera systems and improves real-time object detection and classification in environments,differentiating between real and toy animals.
文摘Online examinations have become a dominant assessment mode,increasing concerns over academic integrity.To address the critical challenge of detecting cheating behaviours,this study proposes a hybrid deep learning approach that combines visual detection and temporal behaviour classification.The methodology utilises object detection models—You Only Look Once(YOLOv12),Faster Region-based Convolutional Neural Network(RCNN),and Single Shot Detector(SSD)MobileNet—integrated with classification models such as Convolutional Neural Networks(CNN),Bidirectional Gated Recurrent Unit(Bi-GRU),and CNN-LSTM(Long Short-Term Memory).Two distinct datasets were used:the Online Exam Proctoring(EOP)dataset from Michigan State University and the School of Computer Science,Duy Tan Unievrsity(SCS-DTU)dataset collected in a controlled classroom setting.A diverse set of cheating behaviours,including book usage,unauthorised interaction,internet access,and mobile phone use,was categorised.Comprehensive experiments evaluated the models based on accuracy,precision,recall,training time,inference speed,and memory usage.We evaluate nine detector-classifier pairings under a unified budget and score them via a calibrated harmonic mean of detection and classification accuracies,enabling deployment-oriented selection under latency and memory constraints.Macro-Precision/Recall/F1 and Receiver Operating Characteristic-Area Under the Curve(ROC-AUC)are reported for the top configurations,revealing consistent advantages of object-centric pipelines for fine-grained cheating cues.The highest overall score is achieved by YOLOv12+CNN(97.15%accuracy),while SSD-MobileNet+CNN provides the best speed-efficiency trade-off for edge devices.This research provides valuable insights into selecting and deploying appropriate deep learning models for maintaining exam integrity under varying resource constraints.
基金supported by the National Natural Science Foundation of China(Grant Nos.62572057,62272049,U24A20331)Beijing Natural Science Foundation(Grant Nos.4232026,4242020)Academic Research Projects of Beijing Union University(Grant No.ZK10202404).
文摘Traffic sign detection is a critical component of driving systems.Single-stage network-based traffic sign detection algorithms,renowned for their fast detection speeds and high accuracy,have become the dominant approach in current practices.However,in complex and dynamic traffic scenes,particularly with smaller traffic sign objects,challenges such as missed and false detections can lead to reduced overall detection accuracy.To address this issue,this paper proposes a detection algorithm that integrates edge and shape information.Recognizing that traffic signs have specific shapes and distinct edge contours,this paper introduces an edge feature extraction branch within the backbone network,enabling adaptive fusion with features of the same hierarchical level.Additionally,a shape prior convolution module is designed to replaces the first two convolutional modules of the backbone network,aimed at enhancing the model's perception ability for specific shape objects and reducing its sensitivity to background noise.The algorithm was evaluated on the CCTSDB and TT100k datasets,and compared to YOLOv8s,the mAP50 values increased by 3.0%and 10.4%,respectively,demonstrating the effectiveness of the proposed method in improving the accuracy of traffic sign detection.
基金supported in part by the by Chongqing Research Program of Basic Research and Frontier Technology under Grant CSTB2025NSCQ-GPX1309.
文摘Small object detection has been a focus of attention since the emergence of deep learning-based object detection.Although classical object detection frameworks have made significant contributions to the development of object detection,there are still many issues to be resolved in detecting small objects due to the inherent complexity and diversity of real-world visual scenes.In particular,the YOLO(You Only Look Once)series of detection models,renowned for their real-time performance,have undergone numerous adaptations aimed at improving the detection of small targets.In this survey,we summarize the state-of-the-art YOLO-based small object detection methods.This review presents a systematic categorization of YOLO-based approaches for small-object detection,organized into four methodological avenues,namely attention-based feature enhancement,detection-head optimization,loss function,and multi-scale feature fusion strategies.We then examine the principal challenges addressed by each category.Finally,we analyze the performance of thesemethods on public benchmarks and,by comparing current approaches,identify limitations and outline directions for future research.
基金funded by the Hainan Province Science and Technology Special Fund under Grant ZDYF2024GXJS292.
文摘Deep learning has made significant progress in the field of oriented object detection for remote sensing images.However,existing methods still face challenges when dealing with difficult tasks such as multi-scale targets,complex backgrounds,and small objects in remote sensing.Maintaining model lightweight to address resource constraints in remote sensing scenarios while improving task completion for remote sensing tasks remains a research hotspot.Therefore,we propose an enhanced multi-scale feature extraction lightweight network EM-YOLO based on the YOLOv8s architecture,specifically optimized for the characteristics of large target scale variations,diverse orientations,and numerous small objects in remote sensing images.Our innovations lie in two main aspects:First,a dynamic snake convolution(DSC)is introduced into the backbone network to enhance the model’s feature extraction capability for oriented targets.Second,an innovative focusing-diffusion module is designed in the feature fusion neck to effectively integrate multi-scale feature information.Finally,we introduce Layer-Adaptive Sparsity for magnitude-based Pruning(LASP)method to perform lightweight network pruning to better complete tasks in resource-constrained scenarios.Experimental results on the lightweight platform Orin demonstrate that the proposed method significantly outperforms the original YOLOv8s model in oriented remote sensing object detection tasks,and achieves comparable or superior performance to state-of-the-art methods on three authoritative remote sensing datasets(DOTA v1.0,DOTA v1.5,and HRSC2016).
文摘In recent years,with the rapid advancement of artificial intelligence,object detection algorithms have made significant strides in accuracy and computational efficiency.Notably,research and applications of Anchor-Free models have opened new avenues for real-time target detection in optical remote sensing images(ORSIs).However,in the realmof adversarial attacks,developing adversarial techniques tailored to Anchor-Freemodels remains challenging.Adversarial examples generated based on Anchor-Based models often exhibit poor transferability to these new model architectures.Furthermore,the growing diversity of Anchor-Free models poses additional hurdles to achieving robust transferability of adversarial attacks.This study presents an improved cross-conv-block feature fusion You Only Look Once(YOLO)architecture,meticulously engineered to facilitate the extraction ofmore comprehensive semantic features during the backpropagation process.To address the asymmetry between densely distributed objects in ORSIs and the corresponding detector outputs,a novel dense bounding box attack strategy is proposed.This approach leverages dense target bounding boxes loss in the calculation of adversarial loss functions.Furthermore,by integrating translation-invariant(TI)and momentum-iteration(MI)adversarial methodologies,the proposed framework significantly improves the transferability of adversarial attacks.Experimental results demonstrate that our method achieves superior adversarial attack performance,with adversarial transferability rates(ATR)of 67.53%on the NWPU VHR-10 dataset and 90.71%on the HRSC2016 dataset.Compared to ensemble adversarial attack and cascaded adversarial attack approaches,our method generates adversarial examples in an average of 0.64 s,representing an approximately 14.5%improvement in efficiency under equivalent conditions.
基金funded by Ministry of Education Humanities and Social Science Research Project,grant number 23YJAZH034The Postgraduate Research and Practice Innovation Program of Jiangsu Province,grant number SJCX25_17National Computer Basic Education Research Project in Higher Education Institutions,grant number 2024-AFCEC-056,2024-AFCEC-057.
文摘To solve the false detection and missed detection problems caused by various types and sizes of defects in the detection of steel surface defects,similar defects and background features,and similarities between different defects,this paper proposes a lightweight detection model named multiscale edge and squeeze-and-excitation attention detection network(MSESE),which is built upon the You Only Look Once version 11 nano(YOLOv11n).To address the difficulty of locating defect edges,we first propose an edge enhancement module(EEM),apply it to the process of multiscale feature extraction,and then propose a multiscale edge enhancement module(MSEEM).By obtaining defect features from different scales and enhancing their edge contours,the module uses the dual-domain selection mechanism to effectively focus on the important areas in the image to ensure that the feature images have richer information and clearer contour features.By fusing the squeeze-and-excitation attention mechanism with the EEM,we obtain a lighter module that can enhance the representation of edge features,which is named the edge enhancement module with squeeze-and-excitation attention(EEMSE).This module was subsequently integrated into the detection head.The enhanced detection head achieves improved edge feature enhancement with reduced computational overhead,while effectively adjusting channel-wise importance and further refining feature representation.Experiments on the NEU-DET dataset show that,compared with the original YOLOv11n,the improved model achieves improvements of 4.1%and 2.2%in terms of mAP@0.5 and mAP@0.5:0.95,respectively,and the GFLOPs value decreases from the original value of 6.4 to 6.2.Furthermore,when compared to current mainstream models,Mamba-YOLOT and RTDETR-R34,our method achieves superior performance with 6.5%and 8.9%higher mAP@0.5,respectively,while maintaining a more compact parameter footprint.These results collectively validate the effectiveness and efficiency of our proposed approach.