Visible and infrared(RGB-IR)fusion object detection plays an important role in security,disaster relief,etc.In recent years,deep-learning-based RGB-IR fusion detection methods have been developing rapidly,but still st...Visible and infrared(RGB-IR)fusion object detection plays an important role in security,disaster relief,etc.In recent years,deep-learning-based RGB-IR fusion detection methods have been developing rapidly,but still struggle to deal with the complex and changing scenarios captured by drones,mainly due to two reasons:(A)RGB-IR fusion detectors are susceptible to inferior inputs that degrade performance and stability.(B)RGB-IR fusion detectors are susceptible to redundant features that reduce accuracy and efficiency.In this paper,an innovative RGB-IR fusion detection framework based on global-local feature optimization,named GLFDet,is proposed to improve the detection performance and efficiency of drone-captured objects.The key components of GLFDet include a Global Feature Optimization(GFO)module,a Local Feature Optimization(LFO)module and a Channel Separation Fusion(CSF)module.Specifically,GFO calculates the information content of the input image from the frequency domain and optimizes the features holistically.Then,LFO dynamically selects high-value features and filters out low-value features before fusion,which significantly improves the efficiency of fusion.Finally,CSF fuses the RGB and IR features across the corresponding channels,which avoids the rearrangement of the channel relationships and enhances the model stability.Extensive experimental results show that the proposed method achieves the best performance on three popular RGB-IR datasets Drone Vehicle,VEDAI,and LLVIP.In addition,GLFDet is more lightweight than other comparable models,making it more appealing to edge devices such as drones.The code is available at https://github.com/lao chen330/GLFDet.展开更多
Desert shrubs are indispensable in maintaining ecological stability by reducing soil erosion,enhancing water retention,and boosting soil fertility,which are critical factors in mitigating desertification processes.Due...Desert shrubs are indispensable in maintaining ecological stability by reducing soil erosion,enhancing water retention,and boosting soil fertility,which are critical factors in mitigating desertification processes.Due to the complex topography,variable climate,and challenges in field surveys in desert regions,this paper proposes YOLO-Desert-Shrub(YOLO-DS),a detection method for identifying desert shrubs in UAV remote sensing images based on an enhanced YOLOv8n framework.This method accurately identifying shrub species,locations,and coverage.To address the issue of small individual plants dominating the dataset,the SPDconv convolution module is introduced in the Backbone and Neck layers of the YOLOv8n model,replacing conventional convolutions.This structural optimization mitigates information degradation in fine-grained data while strengthening discriminative feature capture across spatial scales within desert shrub datasets.Furthermore,a structured state-space model is integrated into the main network,and the MambaLayer is designed to dynamically extract and refine shrub-specific features from remote sensing images,effectively filtering out background noise and irrelevant interference to enhance feature representation.Benchmark evaluations reveal the YOLO-DS framework attains 79.56%mAP40weight,demonstrating 2.2%absolute gain versus the baseline YOLOv8n architecture,with statistically significant advantages over contemporary detectors in cross-validation trials.The predicted plant coverage exhibits strong consistency with manually measured coverage,with a coefficient of determination(R^(2))of 0.9148 and a Root Mean Square Error(RMSE)of1.8266%.The proposed UAV-based remote sensing method utilizing the YOLO-DS effectively identify and locate desert shrubs,monitor canopy sizes and distribution,and provide technical support for automated desert shrub monitoring.展开更多
Ensuring the reliability of power transmission networks depends heavily on the early detection of faults in key components such as insulators,which serve both mechanical and electrical functions.Even a single defectiv...Ensuring the reliability of power transmission networks depends heavily on the early detection of faults in key components such as insulators,which serve both mechanical and electrical functions.Even a single defective insulator can lead to equipment breakdown,costly service interruptions,and increased maintenance demands.While unmanned aerial vehicles(UAVs)enable rapid and cost-effective collection of high-resolution imagery,accurate defect identification remains challenging due to cluttered backgrounds,variable lighting,and the diverse appearance of faults.To address these issues,we introduce a real-time inspection framework that integrates an enhanced YOLOv10 detector with a Hybrid Quantum-Enhanced Graph Neural Network(HQGNN).The YOLOv10 module,fine-tuned on domainspecific UAV datasets,improves detection precision,while the HQGNN ensures multi-object tracking and temporal consistency across video frames.This synergy enables reliable and efficient identification of faulty insulators under complex environmental conditions.Experimental results show that the proposed YOLOv10-HQGNN model surpasses existing methods across all metrics,achieving Recall of 0.85 and Average Precision(AP)of 0.83,with clear gains in both accuracy and throughput.These advancements support automated,proactive maintenance strategies that minimize downtime and contribute to a safer,smarter energy infrastructure.展开更多
Salient object detection(SOD)models struggle to simultaneously preserve global structure,maintain sharp object boundaries,and sustain computational efficiency in complex scenes.In this study,we propose SPSALNet,a task...Salient object detection(SOD)models struggle to simultaneously preserve global structure,maintain sharp object boundaries,and sustain computational efficiency in complex scenes.In this study,we propose SPSALNet,a task-driven two-stage(macro–micro)architecture that restructures the SOD process around superpixel representations.In the proposed approach,a“split-and-enhance”principle,introduced to our knowledge for the first time in the SOD literature,hierarchically classifies superpixels and then applies targeted refinement only to ambiguous or error-prone regions.At the macro stage,the image is partitioned into content-adaptive superpixel regions,and each superpixel is represented by a high-dimensional region-level feature vector.These representations define a regional decomposition problem in which superpixels are assigned to three classes:background,object interior,and transition regions.Superpixel tokens interact with a global feature vector from a deep network backbone through a cross-attention module and are projected into an enriched embedding space that jointly encodes local topology and global context.At the micro stage,the model employs a U-Net-based refinement process that allocates computational resources only to ambiguous transition regions.The image and distance–similarity maps derived from superpixels are processed through a dual-encoder pathway.Subsequently,channel-aware fusion blocks adaptively combine information from these two sources,producing sharper and more stable object boundaries.Experimental results show that SPSALNet achieves high accuracy with lower computational cost compared to recent competing methods.On the PASCAL-S and DUT-OMRON datasets,SPSALNet exhibits a clear performance advantage across all key metrics,and it ranks first on accuracy-oriented measures on HKU-IS.On the challenging DUT-OMRON benchmark,SPSALNet reaches a MAE of 0.034.Across all datasets,it preserves object boundaries and regional structure in a stable and competitive manner.展开更多
Traffic sign detection is an important part of autonomous driving,and its recognition accuracy and speed are directly related to road traffic safety.Although convolutional neural networks(CNNs)have made certain breakt...Traffic sign detection is an important part of autonomous driving,and its recognition accuracy and speed are directly related to road traffic safety.Although convolutional neural networks(CNNs)have made certain breakthroughs in this field,in the face of complex scenes,such as image blur and target occlusion,the traffic sign detection continues to exhibit limited accuracy,accompanied by false positives and missed detections.To address the above problems,a traffic sign detection algorithm,You Only Look Once-based Skip Dynamic Way(YOLO-SDW)based on You Only Look Once version 8 small(YOLOv8s),is proposed.Firstly,a Skip Connection Reconstruction(SCR)module is introduced to efficiently integrate fine-grained feature information and enhance the detection accuracy of the algorithm in complex scenes.Secondly,a C2f module based on Dynamic Snake Convolution(C2f-DySnake)is proposed to dynamically adjust the receptive field information,improve the algorithm’s feature extraction ability for blurred or occluded targets,and reduce the occurrence of false detections and missed detections.Finally,the Wise Powerful IoU v2(WPIoUv2)loss function is proposed to further improve the detection accuracy of the algorithm.Experimental results show that the average precision mAP@0.5 of YOLO-SDW on the TT100K dataset is 89.2%,and mAP@0.5:0.95 is 68.5%,which is 4%and 3.3%higher than the YOLOv8s baseline,respectively.YOLO-SDW ensures real-time performance while having higher accuracy.展开更多
In modern industrial production,foreign object detection in complex environments is crucial to ensure product quality and production safety.Detection systems based on deep-learning image processing algorithms often fa...In modern industrial production,foreign object detection in complex environments is crucial to ensure product quality and production safety.Detection systems based on deep-learning image processing algorithms often face challenges with handling high-resolution images and achieving accurate detection against complex backgrounds.To address these issues,this study employs the PatchCore unsupervised anomaly detection algorithm combined with data augmentation techniques to enhance the system’s generalization capability across varying lighting conditions,viewing angles,and object scales.The proposed method is evaluated in a complex industrial detection scenario involving the bogie of an electric multiple unit(EMU).A dataset consisting of complex backgrounds,diverse lighting conditions,and multiple viewing angles is constructed to validate the performance of the detection system in real industrial environments.Experimental results show that the proposed model achieves an average area under the receiver operating characteristic curve(AUROC)of 0.92 and an average F1 score of 0.85.Combined with data augmentation,the proposed model exhibits improvements in AUROC by 0.06 and F1 score by 0.03,demonstrating enhanced accuracy and robustness for foreign object detection in complex industrial settings.In addition,the effects of key factors on detection performance are systematically analyzed,providing practical guidance for parameter selection in real industrial applications.展开更多
With the rapid expansion of drone applications,accurate detection of objects in aerial imagery has become crucial for intelligent transportation,urban management,and emergency rescue missions.However,existing methods ...With the rapid expansion of drone applications,accurate detection of objects in aerial imagery has become crucial for intelligent transportation,urban management,and emergency rescue missions.However,existing methods face numerous challenges in practical deployment,including scale variation handling,feature degradation,and complex backgrounds.To address these issues,we propose Edge-enhanced and Detail-Capturing You Only Look Once(EHDC-YOLO),a novel framework for object detection in Unmanned Aerial Vehicle(UAV)imagery.Based on the You Only Look Once version 11 nano(YOLOv11n)baseline,EHDC-YOLO systematically introduces several architectural enhancements:(1)a Multi-Scale Edge Enhancement(MSEE)module that leverages multi-scale pooling and edge information to enhance boundary feature extraction;(2)an Enhanced Feature Pyramid Network(EFPN)that integrates P2-level features with Cross Stage Partial(CSP)structures and OmniKernel convolutions for better fine-grained representation;and(3)Dynamic Head(DyHead)with multi-dimensional attention mechanisms for enhanced cross-scale modeling and perspective adaptability.Comprehensive experiments on the Vision meets Drones for Detection(VisDrone-DET)2019 dataset demonstrate that EHDC-YOLO achieves significant improvements,increasing mean Average Precision(mAP)@0.5 from 33.2%to 46.1%(an absolute improvement of 12.9 percentage points)and mAP@0.5:0.95 from 19.5%to 28.0%(an absolute improvement of 8.5 percentage points)compared with the YOLOv11n baseline,while maintaining a reasonable parameter count(2.81 M vs the baseline’s 2.58 M).Further ablation studies confirm the effectiveness of each proposed component,while visualization results highlight EHDC-YOLO’s superior performance in detecting objects and handling occlusions in complex drone scenarios.展开更多
Modern manufacturing processes have become more reliant on automation because of the accelerated transition from Industry 3.0 to Industry 4.0.Manual inspection of products on assembly lines remains inefficient,prone t...Modern manufacturing processes have become more reliant on automation because of the accelerated transition from Industry 3.0 to Industry 4.0.Manual inspection of products on assembly lines remains inefficient,prone to errors and lacks consistency,emphasizing the need for a reliable and automated inspection system.Leveraging both object detection and image segmentation approaches,this research proposes a vision-based solution for the detection of various kinds of tools in the toolkit using deep learning(DL)models.Two Intel RealSense D455f depth cameras were arranged in a top down configuration to capture both RGB and depth images of the toolkits.After applying multiple constraints and enhancing them through preprocessing and augmentation,a dataset consisting of 3300 annotated RGB-D photos was generated.Several DL models were selected through a comprehensive assessment of mean Average Precision(mAP),precision-recall equilibrium,inference latency(target≥30 FPS),and computational burden,resulting in a preference for YOLO and Region-based Convolutional Neural Networks(R-CNN)variants over ViT-based models due to the latter’s increased latency and resource requirements.YOLOV5,YOLOV8,YOLOV11,Faster R-CNN,and Mask R-CNN were trained on the annotated dataset and evaluated using key performance metrics(Recall,Accuracy,F1-score,and Precision).YOLOV11 demonstrated balanced excellence with 93.0%precision,89.9%recall,and a 90.6%F1-score in object detection,as well as 96.9%precision,95.3%recall,and a 96.5%F1-score in instance segmentation with an average inference time of 25 ms per frame(≈40 FPS),demonstrating real-time performance.Leveraging these results,a YOLOV11-based windows application was successfully deployed in a real-time assembly line environment,where it accurately processed live video streams to detect and segment tools within toolkits,demonstrating its practical effectiveness in industrial automation.The application is capable of precisely measuring socket dimensions by utilising edge detection techniques on YOLOv11 segmentation masks,in addition to detection and segmentation.This makes it possible to do specification-level quality control right on the assembly line,which improves the ability to examine things in real time.The implementation is a big step forward for intelligent manufacturing in the Industry 4.0 paradigm.It provides a scalable,efficient,and accurate way to do automated inspection and dimensional verification activities.展开更多
Traffic sign detection is a critical component of driving systems.Single-stage network-based traffic sign detection algorithms,renowned for their fast detection speeds and high accuracy,have become the dominant approa...Traffic sign detection is a critical component of driving systems.Single-stage network-based traffic sign detection algorithms,renowned for their fast detection speeds and high accuracy,have become the dominant approach in current practices.However,in complex and dynamic traffic scenes,particularly with smaller traffic sign objects,challenges such as missed and false detections can lead to reduced overall detection accuracy.To address this issue,this paper proposes a detection algorithm that integrates edge and shape information.Recognizing that traffic signs have specific shapes and distinct edge contours,this paper introduces an edge feature extraction branch within the backbone network,enabling adaptive fusion with features of the same hierarchical level.Additionally,a shape prior convolution module is designed to replaces the first two convolutional modules of the backbone network,aimed at enhancing the model's perception ability for specific shape objects and reducing its sensitivity to background noise.The algorithm was evaluated on the CCTSDB and TT100k datasets,and compared to YOLOv8s,the mAP50 values increased by 3.0%and 10.4%,respectively,demonstrating the effectiveness of the proposed method in improving the accuracy of traffic sign detection.展开更多
Small object detection has been a focus of attention since the emergence of deep learning-based object detection.Although classical object detection frameworks have made significant contributions to the development of...Small object detection has been a focus of attention since the emergence of deep learning-based object detection.Although classical object detection frameworks have made significant contributions to the development of object detection,there are still many issues to be resolved in detecting small objects due to the inherent complexity and diversity of real-world visual scenes.In particular,the YOLO(You Only Look Once)series of detection models,renowned for their real-time performance,have undergone numerous adaptations aimed at improving the detection of small targets.In this survey,we summarize the state-of-the-art YOLO-based small object detection methods.This review presents a systematic categorization of YOLO-based approaches for small-object detection,organized into four methodological avenues,namely attention-based feature enhancement,detection-head optimization,loss function,and multi-scale feature fusion strategies.We then examine the principal challenges addressed by each category.Finally,we analyze the performance of thesemethods on public benchmarks and,by comparing current approaches,identify limitations and outline directions for future research.展开更多
To solve the false detection and missed detection problems caused by various types and sizes of defects in the detection of steel surface defects,similar defects and background features,and similarities between differ...To solve the false detection and missed detection problems caused by various types and sizes of defects in the detection of steel surface defects,similar defects and background features,and similarities between different defects,this paper proposes a lightweight detection model named multiscale edge and squeeze-and-excitation attention detection network(MSESE),which is built upon the You Only Look Once version 11 nano(YOLOv11n).To address the difficulty of locating defect edges,we first propose an edge enhancement module(EEM),apply it to the process of multiscale feature extraction,and then propose a multiscale edge enhancement module(MSEEM).By obtaining defect features from different scales and enhancing their edge contours,the module uses the dual-domain selection mechanism to effectively focus on the important areas in the image to ensure that the feature images have richer information and clearer contour features.By fusing the squeeze-and-excitation attention mechanism with the EEM,we obtain a lighter module that can enhance the representation of edge features,which is named the edge enhancement module with squeeze-and-excitation attention(EEMSE).This module was subsequently integrated into the detection head.The enhanced detection head achieves improved edge feature enhancement with reduced computational overhead,while effectively adjusting channel-wise importance and further refining feature representation.Experiments on the NEU-DET dataset show that,compared with the original YOLOv11n,the improved model achieves improvements of 4.1%and 2.2%in terms of mAP@0.5 and mAP@0.5:0.95,respectively,and the GFLOPs value decreases from the original value of 6.4 to 6.2.Furthermore,when compared to current mainstream models,Mamba-YOLOT and RTDETR-R34,our method achieves superior performance with 6.5%and 8.9%higher mAP@0.5,respectively,while maintaining a more compact parameter footprint.These results collectively validate the effectiveness and efficiency of our proposed approach.展开更多
In recent years,with the rapid advancement of artificial intelligence,object detection algorithms have made significant strides in accuracy and computational efficiency.Notably,research and applications of Anchor-Free...In recent years,with the rapid advancement of artificial intelligence,object detection algorithms have made significant strides in accuracy and computational efficiency.Notably,research and applications of Anchor-Free models have opened new avenues for real-time target detection in optical remote sensing images(ORSIs).However,in the realmof adversarial attacks,developing adversarial techniques tailored to Anchor-Freemodels remains challenging.Adversarial examples generated based on Anchor-Based models often exhibit poor transferability to these new model architectures.Furthermore,the growing diversity of Anchor-Free models poses additional hurdles to achieving robust transferability of adversarial attacks.This study presents an improved cross-conv-block feature fusion You Only Look Once(YOLO)architecture,meticulously engineered to facilitate the extraction ofmore comprehensive semantic features during the backpropagation process.To address the asymmetry between densely distributed objects in ORSIs and the corresponding detector outputs,a novel dense bounding box attack strategy is proposed.This approach leverages dense target bounding boxes loss in the calculation of adversarial loss functions.Furthermore,by integrating translation-invariant(TI)and momentum-iteration(MI)adversarial methodologies,the proposed framework significantly improves the transferability of adversarial attacks.Experimental results demonstrate that our method achieves superior adversarial attack performance,with adversarial transferability rates(ATR)of 67.53%on the NWPU VHR-10 dataset and 90.71%on the HRSC2016 dataset.Compared to ensemble adversarial attack and cascaded adversarial attack approaches,our method generates adversarial examples in an average of 0.64 s,representing an approximately 14.5%improvement in efficiency under equivalent conditions.展开更多
Deep learning has made significant progress in the field of oriented object detection for remote sensing images.However,existing methods still face challenges when dealing with difficult tasks such as multi-scale targ...Deep learning has made significant progress in the field of oriented object detection for remote sensing images.However,existing methods still face challenges when dealing with difficult tasks such as multi-scale targets,complex backgrounds,and small objects in remote sensing.Maintaining model lightweight to address resource constraints in remote sensing scenarios while improving task completion for remote sensing tasks remains a research hotspot.Therefore,we propose an enhanced multi-scale feature extraction lightweight network EM-YOLO based on the YOLOv8s architecture,specifically optimized for the characteristics of large target scale variations,diverse orientations,and numerous small objects in remote sensing images.Our innovations lie in two main aspects:First,a dynamic snake convolution(DSC)is introduced into the backbone network to enhance the model’s feature extraction capability for oriented targets.Second,an innovative focusing-diffusion module is designed in the feature fusion neck to effectively integrate multi-scale feature information.Finally,we introduce Layer-Adaptive Sparsity for magnitude-based Pruning(LASP)method to perform lightweight network pruning to better complete tasks in resource-constrained scenarios.Experimental results on the lightweight platform Orin demonstrate that the proposed method significantly outperforms the original YOLOv8s model in oriented remote sensing object detection tasks,and achieves comparable or superior performance to state-of-the-art methods on three authoritative remote sensing datasets(DOTA v1.0,DOTA v1.5,and HRSC2016).展开更多
Breast cancer screening programs rely heavily on mammography for early detection;however,diagnostic performance is strongly affected by inter-reader variability,breast density,and the limitations of conven-tional comp...Breast cancer screening programs rely heavily on mammography for early detection;however,diagnostic performance is strongly affected by inter-reader variability,breast density,and the limitations of conven-tional computer-aided detection systems.Recent advances in deep learning have enabled more robust and scalable solutions for large-scale screening,yet a systematic comparison of modern object detection architectures on nationally representative datasets remains limited.This study presents a comprehensive quantitative comparison of prominent deep learning–based object detection architectures for Artificial Intelligence-assisted mammography analysis using the MammosighTR dataset,developed within the Turkish National Breast Cancer Screening Program.The dataset comprises 12,740 patient cases collected between 2016 and 2022,annotated with BI-RADS categories,breast density levels,and lesion localization labels.A total of 31 models were evaluated,including One-Stage,Two-Stage,and Transformer-based architectures,under a unified experimental framework at both patient and breast levels.The results demonstrate that Two-Stage architectures consistently outperform One-Stage models,achieving approximately 2%–4%higher Macro F1-Scores and more balanced precision–recall trade-offs,with Double-Head R-CNN and Dynamic R-CNN yielding the highest overall performance(Macro F1≈0.84–0.86).This advantage is primarily attributed to the region proposal mechanism and improved class balance inherent to Two-Stage designs.One-Stage detectors exhibited higher sensitivity and faster inference,reaching Recall values above 0.88,but experienced minor reductions in Precision and overall accuracy(≈1%–2%)compared with Two-Stage models.Among Transformer-based architectures,Deformable DEtection TRansformer demonstrated strong robustness and consistency across datasets,achieving Macro F1-Scores comparable to CNN-based detectors(≈0.83–0.85)while exhibiting minimal performance degradation under distributional shifts.Breast density–based analysis revealed increased misclassification rates in medium-density categories(types B and C),whereas Transformer-based architectures maintained more stable performance in high-density type D tissue.These findings quantitatively confirm that both architectural design and tissue characteristics play a decisive role in diagnostic accuracy.Overall,the study provides a reproducible benchmark and highlights the potential of hybrid approaches that combine the accuracy of Two-Stage detectors with the contextual modeling capability of Transformer architectures for clinically reliable breast cancer screening systems.展开更多
Online examinations have become a dominant assessment mode,increasing concerns over academic integrity.To address the critical challenge of detecting cheating behaviours,this study proposes a hybrid deep learning appr...Online examinations have become a dominant assessment mode,increasing concerns over academic integrity.To address the critical challenge of detecting cheating behaviours,this study proposes a hybrid deep learning approach that combines visual detection and temporal behaviour classification.The methodology utilises object detection models—You Only Look Once(YOLOv12),Faster Region-based Convolutional Neural Network(RCNN),and Single Shot Detector(SSD)MobileNet—integrated with classification models such as Convolutional Neural Networks(CNN),Bidirectional Gated Recurrent Unit(Bi-GRU),and CNN-LSTM(Long Short-Term Memory).Two distinct datasets were used:the Online Exam Proctoring(EOP)dataset from Michigan State University and the School of Computer Science,Duy Tan Unievrsity(SCS-DTU)dataset collected in a controlled classroom setting.A diverse set of cheating behaviours,including book usage,unauthorised interaction,internet access,and mobile phone use,was categorised.Comprehensive experiments evaluated the models based on accuracy,precision,recall,training time,inference speed,and memory usage.We evaluate nine detector-classifier pairings under a unified budget and score them via a calibrated harmonic mean of detection and classification accuracies,enabling deployment-oriented selection under latency and memory constraints.Macro-Precision/Recall/F1 and Receiver Operating Characteristic-Area Under the Curve(ROC-AUC)are reported for the top configurations,revealing consistent advantages of object-centric pipelines for fine-grained cheating cues.The highest overall score is achieved by YOLOv12+CNN(97.15%accuracy),while SSD-MobileNet+CNN provides the best speed-efficiency trade-off for edge devices.This research provides valuable insights into selecting and deploying appropriate deep learning models for maintaining exam integrity under varying resource constraints.展开更多
Defect detection in printed circuit boards(PCB)remains challenging due to the difficulty of identifying small-scale defects,the inefficiency of conventional approaches,and the interference from complex backgrounds.To ...Defect detection in printed circuit boards(PCB)remains challenging due to the difficulty of identifying small-scale defects,the inefficiency of conventional approaches,and the interference from complex backgrounds.To address these issues,this paper proposes SIM-Net,an enhanced detection framework derived from YOLOv11.The model integrates SPDConv to preserve fine-grained features for small object detection,introduces a novel convolutional partial attention module(C2PAM)to suppress redundant background information and highlight salient regions,and employs a multi-scale fusion network(MFN)with a multi-grain contextual module(MGCT)to strengthen contextual representation and accelerate inference.Experimental evaluations demonstrate that SIM-Net achieves 92.4%mAP,92%accuracy,and 89.4%recall with an inference speed of 75.1 FPS,outperforming existing state-of-the-art methods.These results confirm the robustness and real-time applicability of SIM-Net for PCB defect inspection.展开更多
Human object detection and recognition is essential for elderly monitoring and assisted living however,models relying solely on pose or scene context often struggle in cluttered or visually ambiguous settings.To addre...Human object detection and recognition is essential for elderly monitoring and assisted living however,models relying solely on pose or scene context often struggle in cluttered or visually ambiguous settings.To address this,we present SCENET-3D,a transformer-drivenmultimodal framework that unifies human-centric skeleton features with scene-object semantics for intelligent robotic vision through a three-stage pipeline.In the first stage,scene analysis,rich geometric and texture descriptors are extracted from RGB frames,including surface-normal histograms,angles between neighboring normals,Zernike moments,directional standard deviation,and Gabor-filter responses.In the second stage,scene-object analysis,non-human objects are segmented and represented using local feature descriptors and complementary surface-normal information.In the third stage,human-pose estimation,silhouettes are processed through an enhanced MoveNet to obtain 2D anatomical keypoints,which are fused with depth information and converted into RGB-based point clouds to construct pseudo-3D skeletons.Features from all three stages are fused and fed in a transformer encoder with multi-head attention to resolve visually similar activities.Experiments on UCLA(95.8%),ETRI-Activity3D(89.4%),andCAD-120(91.2%)demonstrate that combining pseudo-3D skeletonswith rich scene-object fusion significantly improves generalizable activity recognition,enabling safer elderly care,natural human–robot interaction,and robust context-aware robotic perception in real-world environments.展开更多
Recognising human-object interactions(HOI)is a challenging task for traditional machine learning models,including convolutional neural networks(CNNs).Existing models show limited transferability across complex dataset...Recognising human-object interactions(HOI)is a challenging task for traditional machine learning models,including convolutional neural networks(CNNs).Existing models show limited transferability across complex datasets such as D3D-HOI and SYSU 3D HOI.The conventional architecture of CNNs restricts their ability to handle HOI scenarios with high complexity.HOI recognition requires improved feature extraction methods to overcome the current limitations in accuracy and scalability.This work proposes a Novel quantum gate-enabled hybrid CNN(QEH-CNN)for effectiveHOI recognition.Themodel enhancesCNNperformance by integrating quantumcomputing components.The framework begins with bilateral image filtering,followed bymulti-object tracking(MOT)and Felzenszwalb superpixel segmentation.A watershed algorithm refines object boundaries by cleaning merged superpixels.Feature extraction combines a histogram of oriented gradients(HOG),Global Image Statistics for Texture(GIST)descriptors,and a novel 23-joint keypoint extractionmethod using relative joint angles and joint proximitymeasures.A fuzzy optimization process refines the extracted features before feeding them into the QEH-CNNmodel.The proposed model achieves 95.06%accuracy on the 3D-D3D-HOI dataset and 97.29%on the SYSU3DHOI dataset.Theintegration of quantum computing enhances feature optimization,leading to improved accuracy and overall model efficiency.展开更多
In industrial manufacturing,efficient surface defect detection is crucial for ensuring product quality and production safety.Traditional inspectionmethods are often slow,subjective,and prone to errors,while classicalm...In industrial manufacturing,efficient surface defect detection is crucial for ensuring product quality and production safety.Traditional inspectionmethods are often slow,subjective,and prone to errors,while classicalmachine vision techniques strugglewith complex backgrounds and small defects.To address these challenges,this study proposes an improved YOLOv11 model for detecting defects on hot-rolled steel strips using the NEU-DET dataset.Three key improvements are introduced in the proposed model.First,a lightweight Guided Attention Feature Module(GAFM)is incorporated to enhance multi-scale feature fusion,allowing the model to better capture and integrate semantic and spatial information across different layers,which improves its ability to detect defects of varying sizes.Second,an Aggregated Attention(AA)mechanism is employed to strengthen the representation of critical defect features while effectively suppressing irrelevant background information,particularly enhancing the detection of small,low-contrast,or complex defects.Third,Ghost Dynamic Convolution(GDC)is applied to reduce computational cost by generating low-cost ghost features and dynamically reweighting convolutional kernels,enabling faster inference without sacrificing feature quality or detection accuracy.Extensive experiments demonstrate that the proposed model achieves a mean Average Precision(mAP)of 87.2%,compared to 81.5%for the baseline,while lowering computational cost from6.3Giga Floating-point Operations Per Second(GFLOPs)to 5.1 GFLOPs.These results indicate that the improved YOLOv11 is both accurate and computationally efficient,making it suitable for real-time industrial surface defect detection and contributing to the development of practical,high-performance inspection systems.展开更多
This study proposes a lightweight rice disease detection model optimized for edge computing environments.The goal is to enhance the You Only Look Once(YOLO)v5 architecture to achieve a balance between real-time diagno...This study proposes a lightweight rice disease detection model optimized for edge computing environments.The goal is to enhance the You Only Look Once(YOLO)v5 architecture to achieve a balance between real-time diagnostic performance and computational efficiency.To this end,a total of 3234 high-resolution images(2400×1080)were collected from three major rice diseases Rice Blast,Bacterial Blight,and Brown Spot—frequently found in actual rice cultivation fields.These images served as the training dataset.The proposed YOLOv5-V2 model removes the Focus layer from the original YOLOv5s and integrates ShuffleNet V2 into the backbone,thereby resulting in both model compression and improved inference speed.Additionally,YOLOv5-P,based on PP-PicoDet,was configured as a comparative model to quantitatively evaluate performance.Experimental results demonstrated that YOLOv5-V2 achieved excellent detection performance,with an mAP 0.5 of 89.6%,mAP 0.5–0.95 of 66.7%,precision of 91.3%,and recall of 85.6%,while maintaining a lightweight model size of 6.45 MB.In contrast,YOLOv5-P exhibited a smaller model size of 4.03 MB,but showed lower performance with an mAP 0.5 of 70.3%,mAP 0.5–0.95 of 35.2%,precision of 62.3%,and recall of 74.1%.This study lays a technical foundation for the implementation of smart agriculture and real-time disease diagnosis systems by proposing a model that satisfies both accuracy and lightweight requirements.展开更多
基金supported by the National Natural Science Foundation of China(No.62276204)the Fundamental Research Funds for the Central Universities,China(No.YJSJ24011)+1 种基金the Natural Science Basic Research Program of Shaanxi,China(Nos.2022JM-340 and 2023-JC-QN-0710)the China Postdoctoral Science Foundation(Nos.2020T130494 and 2018M633470)。
文摘Visible and infrared(RGB-IR)fusion object detection plays an important role in security,disaster relief,etc.In recent years,deep-learning-based RGB-IR fusion detection methods have been developing rapidly,but still struggle to deal with the complex and changing scenarios captured by drones,mainly due to two reasons:(A)RGB-IR fusion detectors are susceptible to inferior inputs that degrade performance and stability.(B)RGB-IR fusion detectors are susceptible to redundant features that reduce accuracy and efficiency.In this paper,an innovative RGB-IR fusion detection framework based on global-local feature optimization,named GLFDet,is proposed to improve the detection performance and efficiency of drone-captured objects.The key components of GLFDet include a Global Feature Optimization(GFO)module,a Local Feature Optimization(LFO)module and a Channel Separation Fusion(CSF)module.Specifically,GFO calculates the information content of the input image from the frequency domain and optimizes the features holistically.Then,LFO dynamically selects high-value features and filters out low-value features before fusion,which significantly improves the efficiency of fusion.Finally,CSF fuses the RGB and IR features across the corresponding channels,which avoids the rearrangement of the channel relationships and enhances the model stability.Extensive experimental results show that the proposed method achieves the best performance on three popular RGB-IR datasets Drone Vehicle,VEDAI,and LLVIP.In addition,GLFDet is more lightweight than other comparable models,making it more appealing to edge devices such as drones.The code is available at https://github.com/lao chen330/GLFDet.
基金supported by the National Public Welfare Forest Desert Shrubbery Monitoring Project。
文摘Desert shrubs are indispensable in maintaining ecological stability by reducing soil erosion,enhancing water retention,and boosting soil fertility,which are critical factors in mitigating desertification processes.Due to the complex topography,variable climate,and challenges in field surveys in desert regions,this paper proposes YOLO-Desert-Shrub(YOLO-DS),a detection method for identifying desert shrubs in UAV remote sensing images based on an enhanced YOLOv8n framework.This method accurately identifying shrub species,locations,and coverage.To address the issue of small individual plants dominating the dataset,the SPDconv convolution module is introduced in the Backbone and Neck layers of the YOLOv8n model,replacing conventional convolutions.This structural optimization mitigates information degradation in fine-grained data while strengthening discriminative feature capture across spatial scales within desert shrub datasets.Furthermore,a structured state-space model is integrated into the main network,and the MambaLayer is designed to dynamically extract and refine shrub-specific features from remote sensing images,effectively filtering out background noise and irrelevant interference to enhance feature representation.Benchmark evaluations reveal the YOLO-DS framework attains 79.56%mAP40weight,demonstrating 2.2%absolute gain versus the baseline YOLOv8n architecture,with statistically significant advantages over contemporary detectors in cross-validation trials.The predicted plant coverage exhibits strong consistency with manually measured coverage,with a coefficient of determination(R^(2))of 0.9148 and a Root Mean Square Error(RMSE)of1.8266%.The proposed UAV-based remote sensing method utilizing the YOLO-DS effectively identify and locate desert shrubs,monitor canopy sizes and distribution,and provide technical support for automated desert shrub monitoring.
基金supported by Ho Chi Minh City Open University,Vietnam and Suan Sunandha Rajabhat Univeristy,Thailand.
文摘Ensuring the reliability of power transmission networks depends heavily on the early detection of faults in key components such as insulators,which serve both mechanical and electrical functions.Even a single defective insulator can lead to equipment breakdown,costly service interruptions,and increased maintenance demands.While unmanned aerial vehicles(UAVs)enable rapid and cost-effective collection of high-resolution imagery,accurate defect identification remains challenging due to cluttered backgrounds,variable lighting,and the diverse appearance of faults.To address these issues,we introduce a real-time inspection framework that integrates an enhanced YOLOv10 detector with a Hybrid Quantum-Enhanced Graph Neural Network(HQGNN).The YOLOv10 module,fine-tuned on domainspecific UAV datasets,improves detection precision,while the HQGNN ensures multi-object tracking and temporal consistency across video frames.This synergy enables reliable and efficient identification of faulty insulators under complex environmental conditions.Experimental results show that the proposed YOLOv10-HQGNN model surpasses existing methods across all metrics,achieving Recall of 0.85 and Average Precision(AP)of 0.83,with clear gains in both accuracy and throughput.These advancements support automated,proactive maintenance strategies that minimize downtime and contribute to a safer,smarter energy infrastructure.
文摘Salient object detection(SOD)models struggle to simultaneously preserve global structure,maintain sharp object boundaries,and sustain computational efficiency in complex scenes.In this study,we propose SPSALNet,a task-driven two-stage(macro–micro)architecture that restructures the SOD process around superpixel representations.In the proposed approach,a“split-and-enhance”principle,introduced to our knowledge for the first time in the SOD literature,hierarchically classifies superpixels and then applies targeted refinement only to ambiguous or error-prone regions.At the macro stage,the image is partitioned into content-adaptive superpixel regions,and each superpixel is represented by a high-dimensional region-level feature vector.These representations define a regional decomposition problem in which superpixels are assigned to three classes:background,object interior,and transition regions.Superpixel tokens interact with a global feature vector from a deep network backbone through a cross-attention module and are projected into an enriched embedding space that jointly encodes local topology and global context.At the micro stage,the model employs a U-Net-based refinement process that allocates computational resources only to ambiguous transition regions.The image and distance–similarity maps derived from superpixels are processed through a dual-encoder pathway.Subsequently,channel-aware fusion blocks adaptively combine information from these two sources,producing sharper and more stable object boundaries.Experimental results show that SPSALNet achieves high accuracy with lower computational cost compared to recent competing methods.On the PASCAL-S and DUT-OMRON datasets,SPSALNet exhibits a clear performance advantage across all key metrics,and it ranks first on accuracy-oriented measures on HKU-IS.On the challenging DUT-OMRON benchmark,SPSALNet reaches a MAE of 0.034.Across all datasets,it preserves object boundaries and regional structure in a stable and competitive manner.
基金funded by Key research and development Program of Henan Province(No.251111211200)National Natural Science Foundation of China(Grant No.U2004163).
文摘Traffic sign detection is an important part of autonomous driving,and its recognition accuracy and speed are directly related to road traffic safety.Although convolutional neural networks(CNNs)have made certain breakthroughs in this field,in the face of complex scenes,such as image blur and target occlusion,the traffic sign detection continues to exhibit limited accuracy,accompanied by false positives and missed detections.To address the above problems,a traffic sign detection algorithm,You Only Look Once-based Skip Dynamic Way(YOLO-SDW)based on You Only Look Once version 8 small(YOLOv8s),is proposed.Firstly,a Skip Connection Reconstruction(SCR)module is introduced to efficiently integrate fine-grained feature information and enhance the detection accuracy of the algorithm in complex scenes.Secondly,a C2f module based on Dynamic Snake Convolution(C2f-DySnake)is proposed to dynamically adjust the receptive field information,improve the algorithm’s feature extraction ability for blurred or occluded targets,and reduce the occurrence of false detections and missed detections.Finally,the Wise Powerful IoU v2(WPIoUv2)loss function is proposed to further improve the detection accuracy of the algorithm.Experimental results show that the average precision mAP@0.5 of YOLO-SDW on the TT100K dataset is 89.2%,and mAP@0.5:0.95 is 68.5%,which is 4%and 3.3%higher than the YOLOv8s baseline,respectively.YOLO-SDW ensures real-time performance while having higher accuracy.
文摘In modern industrial production,foreign object detection in complex environments is crucial to ensure product quality and production safety.Detection systems based on deep-learning image processing algorithms often face challenges with handling high-resolution images and achieving accurate detection against complex backgrounds.To address these issues,this study employs the PatchCore unsupervised anomaly detection algorithm combined with data augmentation techniques to enhance the system’s generalization capability across varying lighting conditions,viewing angles,and object scales.The proposed method is evaluated in a complex industrial detection scenario involving the bogie of an electric multiple unit(EMU).A dataset consisting of complex backgrounds,diverse lighting conditions,and multiple viewing angles is constructed to validate the performance of the detection system in real industrial environments.Experimental results show that the proposed model achieves an average area under the receiver operating characteristic curve(AUROC)of 0.92 and an average F1 score of 0.85.Combined with data augmentation,the proposed model exhibits improvements in AUROC by 0.06 and F1 score by 0.03,demonstrating enhanced accuracy and robustness for foreign object detection in complex industrial settings.In addition,the effects of key factors on detection performance are systematically analyzed,providing practical guidance for parameter selection in real industrial applications.
文摘With the rapid expansion of drone applications,accurate detection of objects in aerial imagery has become crucial for intelligent transportation,urban management,and emergency rescue missions.However,existing methods face numerous challenges in practical deployment,including scale variation handling,feature degradation,and complex backgrounds.To address these issues,we propose Edge-enhanced and Detail-Capturing You Only Look Once(EHDC-YOLO),a novel framework for object detection in Unmanned Aerial Vehicle(UAV)imagery.Based on the You Only Look Once version 11 nano(YOLOv11n)baseline,EHDC-YOLO systematically introduces several architectural enhancements:(1)a Multi-Scale Edge Enhancement(MSEE)module that leverages multi-scale pooling and edge information to enhance boundary feature extraction;(2)an Enhanced Feature Pyramid Network(EFPN)that integrates P2-level features with Cross Stage Partial(CSP)structures and OmniKernel convolutions for better fine-grained representation;and(3)Dynamic Head(DyHead)with multi-dimensional attention mechanisms for enhanced cross-scale modeling and perspective adaptability.Comprehensive experiments on the Vision meets Drones for Detection(VisDrone-DET)2019 dataset demonstrate that EHDC-YOLO achieves significant improvements,increasing mean Average Precision(mAP)@0.5 from 33.2%to 46.1%(an absolute improvement of 12.9 percentage points)and mAP@0.5:0.95 from 19.5%to 28.0%(an absolute improvement of 8.5 percentage points)compared with the YOLOv11n baseline,while maintaining a reasonable parameter count(2.81 M vs the baseline’s 2.58 M).Further ablation studies confirm the effectiveness of each proposed component,while visualization results highlight EHDC-YOLO’s superior performance in detecting objects and handling occlusions in complex drone scenarios.
文摘Modern manufacturing processes have become more reliant on automation because of the accelerated transition from Industry 3.0 to Industry 4.0.Manual inspection of products on assembly lines remains inefficient,prone to errors and lacks consistency,emphasizing the need for a reliable and automated inspection system.Leveraging both object detection and image segmentation approaches,this research proposes a vision-based solution for the detection of various kinds of tools in the toolkit using deep learning(DL)models.Two Intel RealSense D455f depth cameras were arranged in a top down configuration to capture both RGB and depth images of the toolkits.After applying multiple constraints and enhancing them through preprocessing and augmentation,a dataset consisting of 3300 annotated RGB-D photos was generated.Several DL models were selected through a comprehensive assessment of mean Average Precision(mAP),precision-recall equilibrium,inference latency(target≥30 FPS),and computational burden,resulting in a preference for YOLO and Region-based Convolutional Neural Networks(R-CNN)variants over ViT-based models due to the latter’s increased latency and resource requirements.YOLOV5,YOLOV8,YOLOV11,Faster R-CNN,and Mask R-CNN were trained on the annotated dataset and evaluated using key performance metrics(Recall,Accuracy,F1-score,and Precision).YOLOV11 demonstrated balanced excellence with 93.0%precision,89.9%recall,and a 90.6%F1-score in object detection,as well as 96.9%precision,95.3%recall,and a 96.5%F1-score in instance segmentation with an average inference time of 25 ms per frame(≈40 FPS),demonstrating real-time performance.Leveraging these results,a YOLOV11-based windows application was successfully deployed in a real-time assembly line environment,where it accurately processed live video streams to detect and segment tools within toolkits,demonstrating its practical effectiveness in industrial automation.The application is capable of precisely measuring socket dimensions by utilising edge detection techniques on YOLOv11 segmentation masks,in addition to detection and segmentation.This makes it possible to do specification-level quality control right on the assembly line,which improves the ability to examine things in real time.The implementation is a big step forward for intelligent manufacturing in the Industry 4.0 paradigm.It provides a scalable,efficient,and accurate way to do automated inspection and dimensional verification activities.
基金supported by the National Natural Science Foundation of China(Grant Nos.62572057,62272049,U24A20331)Beijing Natural Science Foundation(Grant Nos.4232026,4242020)Academic Research Projects of Beijing Union University(Grant No.ZK10202404).
文摘Traffic sign detection is a critical component of driving systems.Single-stage network-based traffic sign detection algorithms,renowned for their fast detection speeds and high accuracy,have become the dominant approach in current practices.However,in complex and dynamic traffic scenes,particularly with smaller traffic sign objects,challenges such as missed and false detections can lead to reduced overall detection accuracy.To address this issue,this paper proposes a detection algorithm that integrates edge and shape information.Recognizing that traffic signs have specific shapes and distinct edge contours,this paper introduces an edge feature extraction branch within the backbone network,enabling adaptive fusion with features of the same hierarchical level.Additionally,a shape prior convolution module is designed to replaces the first two convolutional modules of the backbone network,aimed at enhancing the model's perception ability for specific shape objects and reducing its sensitivity to background noise.The algorithm was evaluated on the CCTSDB and TT100k datasets,and compared to YOLOv8s,the mAP50 values increased by 3.0%and 10.4%,respectively,demonstrating the effectiveness of the proposed method in improving the accuracy of traffic sign detection.
基金supported in part by the by Chongqing Research Program of Basic Research and Frontier Technology under Grant CSTB2025NSCQ-GPX1309.
文摘Small object detection has been a focus of attention since the emergence of deep learning-based object detection.Although classical object detection frameworks have made significant contributions to the development of object detection,there are still many issues to be resolved in detecting small objects due to the inherent complexity and diversity of real-world visual scenes.In particular,the YOLO(You Only Look Once)series of detection models,renowned for their real-time performance,have undergone numerous adaptations aimed at improving the detection of small targets.In this survey,we summarize the state-of-the-art YOLO-based small object detection methods.This review presents a systematic categorization of YOLO-based approaches for small-object detection,organized into four methodological avenues,namely attention-based feature enhancement,detection-head optimization,loss function,and multi-scale feature fusion strategies.We then examine the principal challenges addressed by each category.Finally,we analyze the performance of thesemethods on public benchmarks and,by comparing current approaches,identify limitations and outline directions for future research.
基金funded by Ministry of Education Humanities and Social Science Research Project,grant number 23YJAZH034The Postgraduate Research and Practice Innovation Program of Jiangsu Province,grant number SJCX25_17National Computer Basic Education Research Project in Higher Education Institutions,grant number 2024-AFCEC-056,2024-AFCEC-057.
文摘To solve the false detection and missed detection problems caused by various types and sizes of defects in the detection of steel surface defects,similar defects and background features,and similarities between different defects,this paper proposes a lightweight detection model named multiscale edge and squeeze-and-excitation attention detection network(MSESE),which is built upon the You Only Look Once version 11 nano(YOLOv11n).To address the difficulty of locating defect edges,we first propose an edge enhancement module(EEM),apply it to the process of multiscale feature extraction,and then propose a multiscale edge enhancement module(MSEEM).By obtaining defect features from different scales and enhancing their edge contours,the module uses the dual-domain selection mechanism to effectively focus on the important areas in the image to ensure that the feature images have richer information and clearer contour features.By fusing the squeeze-and-excitation attention mechanism with the EEM,we obtain a lighter module that can enhance the representation of edge features,which is named the edge enhancement module with squeeze-and-excitation attention(EEMSE).This module was subsequently integrated into the detection head.The enhanced detection head achieves improved edge feature enhancement with reduced computational overhead,while effectively adjusting channel-wise importance and further refining feature representation.Experiments on the NEU-DET dataset show that,compared with the original YOLOv11n,the improved model achieves improvements of 4.1%and 2.2%in terms of mAP@0.5 and mAP@0.5:0.95,respectively,and the GFLOPs value decreases from the original value of 6.4 to 6.2.Furthermore,when compared to current mainstream models,Mamba-YOLOT and RTDETR-R34,our method achieves superior performance with 6.5%and 8.9%higher mAP@0.5,respectively,while maintaining a more compact parameter footprint.These results collectively validate the effectiveness and efficiency of our proposed approach.
文摘In recent years,with the rapid advancement of artificial intelligence,object detection algorithms have made significant strides in accuracy and computational efficiency.Notably,research and applications of Anchor-Free models have opened new avenues for real-time target detection in optical remote sensing images(ORSIs).However,in the realmof adversarial attacks,developing adversarial techniques tailored to Anchor-Freemodels remains challenging.Adversarial examples generated based on Anchor-Based models often exhibit poor transferability to these new model architectures.Furthermore,the growing diversity of Anchor-Free models poses additional hurdles to achieving robust transferability of adversarial attacks.This study presents an improved cross-conv-block feature fusion You Only Look Once(YOLO)architecture,meticulously engineered to facilitate the extraction ofmore comprehensive semantic features during the backpropagation process.To address the asymmetry between densely distributed objects in ORSIs and the corresponding detector outputs,a novel dense bounding box attack strategy is proposed.This approach leverages dense target bounding boxes loss in the calculation of adversarial loss functions.Furthermore,by integrating translation-invariant(TI)and momentum-iteration(MI)adversarial methodologies,the proposed framework significantly improves the transferability of adversarial attacks.Experimental results demonstrate that our method achieves superior adversarial attack performance,with adversarial transferability rates(ATR)of 67.53%on the NWPU VHR-10 dataset and 90.71%on the HRSC2016 dataset.Compared to ensemble adversarial attack and cascaded adversarial attack approaches,our method generates adversarial examples in an average of 0.64 s,representing an approximately 14.5%improvement in efficiency under equivalent conditions.
基金funded by the Hainan Province Science and Technology Special Fund under Grant ZDYF2024GXJS292.
文摘Deep learning has made significant progress in the field of oriented object detection for remote sensing images.However,existing methods still face challenges when dealing with difficult tasks such as multi-scale targets,complex backgrounds,and small objects in remote sensing.Maintaining model lightweight to address resource constraints in remote sensing scenarios while improving task completion for remote sensing tasks remains a research hotspot.Therefore,we propose an enhanced multi-scale feature extraction lightweight network EM-YOLO based on the YOLOv8s architecture,specifically optimized for the characteristics of large target scale variations,diverse orientations,and numerous small objects in remote sensing images.Our innovations lie in two main aspects:First,a dynamic snake convolution(DSC)is introduced into the backbone network to enhance the model’s feature extraction capability for oriented targets.Second,an innovative focusing-diffusion module is designed in the feature fusion neck to effectively integrate multi-scale feature information.Finally,we introduce Layer-Adaptive Sparsity for magnitude-based Pruning(LASP)method to perform lightweight network pruning to better complete tasks in resource-constrained scenarios.Experimental results on the lightweight platform Orin demonstrate that the proposed method significantly outperforms the original YOLOv8s model in oriented remote sensing object detection tasks,and achieves comparable or superior performance to state-of-the-art methods on three authoritative remote sensing datasets(DOTA v1.0,DOTA v1.5,and HRSC2016).
文摘Breast cancer screening programs rely heavily on mammography for early detection;however,diagnostic performance is strongly affected by inter-reader variability,breast density,and the limitations of conven-tional computer-aided detection systems.Recent advances in deep learning have enabled more robust and scalable solutions for large-scale screening,yet a systematic comparison of modern object detection architectures on nationally representative datasets remains limited.This study presents a comprehensive quantitative comparison of prominent deep learning–based object detection architectures for Artificial Intelligence-assisted mammography analysis using the MammosighTR dataset,developed within the Turkish National Breast Cancer Screening Program.The dataset comprises 12,740 patient cases collected between 2016 and 2022,annotated with BI-RADS categories,breast density levels,and lesion localization labels.A total of 31 models were evaluated,including One-Stage,Two-Stage,and Transformer-based architectures,under a unified experimental framework at both patient and breast levels.The results demonstrate that Two-Stage architectures consistently outperform One-Stage models,achieving approximately 2%–4%higher Macro F1-Scores and more balanced precision–recall trade-offs,with Double-Head R-CNN and Dynamic R-CNN yielding the highest overall performance(Macro F1≈0.84–0.86).This advantage is primarily attributed to the region proposal mechanism and improved class balance inherent to Two-Stage designs.One-Stage detectors exhibited higher sensitivity and faster inference,reaching Recall values above 0.88,but experienced minor reductions in Precision and overall accuracy(≈1%–2%)compared with Two-Stage models.Among Transformer-based architectures,Deformable DEtection TRansformer demonstrated strong robustness and consistency across datasets,achieving Macro F1-Scores comparable to CNN-based detectors(≈0.83–0.85)while exhibiting minimal performance degradation under distributional shifts.Breast density–based analysis revealed increased misclassification rates in medium-density categories(types B and C),whereas Transformer-based architectures maintained more stable performance in high-density type D tissue.These findings quantitatively confirm that both architectural design and tissue characteristics play a decisive role in diagnostic accuracy.Overall,the study provides a reproducible benchmark and highlights the potential of hybrid approaches that combine the accuracy of Two-Stage detectors with the contextual modeling capability of Transformer architectures for clinically reliable breast cancer screening systems.
文摘Online examinations have become a dominant assessment mode,increasing concerns over academic integrity.To address the critical challenge of detecting cheating behaviours,this study proposes a hybrid deep learning approach that combines visual detection and temporal behaviour classification.The methodology utilises object detection models—You Only Look Once(YOLOv12),Faster Region-based Convolutional Neural Network(RCNN),and Single Shot Detector(SSD)MobileNet—integrated with classification models such as Convolutional Neural Networks(CNN),Bidirectional Gated Recurrent Unit(Bi-GRU),and CNN-LSTM(Long Short-Term Memory).Two distinct datasets were used:the Online Exam Proctoring(EOP)dataset from Michigan State University and the School of Computer Science,Duy Tan Unievrsity(SCS-DTU)dataset collected in a controlled classroom setting.A diverse set of cheating behaviours,including book usage,unauthorised interaction,internet access,and mobile phone use,was categorised.Comprehensive experiments evaluated the models based on accuracy,precision,recall,training time,inference speed,and memory usage.We evaluate nine detector-classifier pairings under a unified budget and score them via a calibrated harmonic mean of detection and classification accuracies,enabling deployment-oriented selection under latency and memory constraints.Macro-Precision/Recall/F1 and Receiver Operating Characteristic-Area Under the Curve(ROC-AUC)are reported for the top configurations,revealing consistent advantages of object-centric pipelines for fine-grained cheating cues.The highest overall score is achieved by YOLOv12+CNN(97.15%accuracy),while SSD-MobileNet+CNN provides the best speed-efficiency trade-off for edge devices.This research provides valuable insights into selecting and deploying appropriate deep learning models for maintaining exam integrity under varying resource constraints.
文摘Defect detection in printed circuit boards(PCB)remains challenging due to the difficulty of identifying small-scale defects,the inefficiency of conventional approaches,and the interference from complex backgrounds.To address these issues,this paper proposes SIM-Net,an enhanced detection framework derived from YOLOv11.The model integrates SPDConv to preserve fine-grained features for small object detection,introduces a novel convolutional partial attention module(C2PAM)to suppress redundant background information and highlight salient regions,and employs a multi-scale fusion network(MFN)with a multi-grain contextual module(MGCT)to strengthen contextual representation and accelerate inference.Experimental evaluations demonstrate that SIM-Net achieves 92.4%mAP,92%accuracy,and 89.4%recall with an inference speed of 75.1 FPS,outperforming existing state-of-the-art methods.These results confirm the robustness and real-time applicability of SIM-Net for PCB defect inspection.
基金funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R410),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Human object detection and recognition is essential for elderly monitoring and assisted living however,models relying solely on pose or scene context often struggle in cluttered or visually ambiguous settings.To address this,we present SCENET-3D,a transformer-drivenmultimodal framework that unifies human-centric skeleton features with scene-object semantics for intelligent robotic vision through a three-stage pipeline.In the first stage,scene analysis,rich geometric and texture descriptors are extracted from RGB frames,including surface-normal histograms,angles between neighboring normals,Zernike moments,directional standard deviation,and Gabor-filter responses.In the second stage,scene-object analysis,non-human objects are segmented and represented using local feature descriptors and complementary surface-normal information.In the third stage,human-pose estimation,silhouettes are processed through an enhanced MoveNet to obtain 2D anatomical keypoints,which are fused with depth information and converted into RGB-based point clouds to construct pseudo-3D skeletons.Features from all three stages are fused and fed in a transformer encoder with multi-head attention to resolve visually similar activities.Experiments on UCLA(95.8%),ETRI-Activity3D(89.4%),andCAD-120(91.2%)demonstrate that combining pseudo-3D skeletonswith rich scene-object fusion significantly improves generalizable activity recognition,enabling safer elderly care,natural human–robot interaction,and robust context-aware robotic perception in real-world environments.
基金supported and funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R410),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Recognising human-object interactions(HOI)is a challenging task for traditional machine learning models,including convolutional neural networks(CNNs).Existing models show limited transferability across complex datasets such as D3D-HOI and SYSU 3D HOI.The conventional architecture of CNNs restricts their ability to handle HOI scenarios with high complexity.HOI recognition requires improved feature extraction methods to overcome the current limitations in accuracy and scalability.This work proposes a Novel quantum gate-enabled hybrid CNN(QEH-CNN)for effectiveHOI recognition.Themodel enhancesCNNperformance by integrating quantumcomputing components.The framework begins with bilateral image filtering,followed bymulti-object tracking(MOT)and Felzenszwalb superpixel segmentation.A watershed algorithm refines object boundaries by cleaning merged superpixels.Feature extraction combines a histogram of oriented gradients(HOG),Global Image Statistics for Texture(GIST)descriptors,and a novel 23-joint keypoint extractionmethod using relative joint angles and joint proximitymeasures.A fuzzy optimization process refines the extracted features before feeding them into the QEH-CNNmodel.The proposed model achieves 95.06%accuracy on the 3D-D3D-HOI dataset and 97.29%on the SYSU3DHOI dataset.Theintegration of quantum computing enhances feature optimization,leading to improved accuracy and overall model efficiency.
基金supported in part by the National Natural Science Foundation of China(Grant No.62071123)in part by the Natural Science Foundation of Fujian Province(Grant Nos.2024J01971,2022J05202)in part by the Young and Middle-Aged Teacher Education Research Project of Fujian Province(Grant No.JAT210370).
文摘In industrial manufacturing,efficient surface defect detection is crucial for ensuring product quality and production safety.Traditional inspectionmethods are often slow,subjective,and prone to errors,while classicalmachine vision techniques strugglewith complex backgrounds and small defects.To address these challenges,this study proposes an improved YOLOv11 model for detecting defects on hot-rolled steel strips using the NEU-DET dataset.Three key improvements are introduced in the proposed model.First,a lightweight Guided Attention Feature Module(GAFM)is incorporated to enhance multi-scale feature fusion,allowing the model to better capture and integrate semantic and spatial information across different layers,which improves its ability to detect defects of varying sizes.Second,an Aggregated Attention(AA)mechanism is employed to strengthen the representation of critical defect features while effectively suppressing irrelevant background information,particularly enhancing the detection of small,low-contrast,or complex defects.Third,Ghost Dynamic Convolution(GDC)is applied to reduce computational cost by generating low-cost ghost features and dynamically reweighting convolutional kernels,enabling faster inference without sacrificing feature quality or detection accuracy.Extensive experiments demonstrate that the proposed model achieves a mean Average Precision(mAP)of 87.2%,compared to 81.5%for the baseline,while lowering computational cost from6.3Giga Floating-point Operations Per Second(GFLOPs)to 5.1 GFLOPs.These results indicate that the improved YOLOv11 is both accurate and computationally efficient,making it suitable for real-time industrial surface defect detection and contributing to the development of practical,high-performance inspection systems.
文摘This study proposes a lightweight rice disease detection model optimized for edge computing environments.The goal is to enhance the You Only Look Once(YOLO)v5 architecture to achieve a balance between real-time diagnostic performance and computational efficiency.To this end,a total of 3234 high-resolution images(2400×1080)were collected from three major rice diseases Rice Blast,Bacterial Blight,and Brown Spot—frequently found in actual rice cultivation fields.These images served as the training dataset.The proposed YOLOv5-V2 model removes the Focus layer from the original YOLOv5s and integrates ShuffleNet V2 into the backbone,thereby resulting in both model compression and improved inference speed.Additionally,YOLOv5-P,based on PP-PicoDet,was configured as a comparative model to quantitatively evaluate performance.Experimental results demonstrated that YOLOv5-V2 achieved excellent detection performance,with an mAP 0.5 of 89.6%,mAP 0.5–0.95 of 66.7%,precision of 91.3%,and recall of 85.6%,while maintaining a lightweight model size of 6.45 MB.In contrast,YOLOv5-P exhibited a smaller model size of 4.03 MB,but showed lower performance with an mAP 0.5 of 70.3%,mAP 0.5–0.95 of 35.2%,precision of 62.3%,and recall of 74.1%.This study lays a technical foundation for the implementation of smart agriculture and real-time disease diagnosis systems by proposing a model that satisfies both accuracy and lightweight requirements.