Rice kernel chalkiness is an impor-tant quality character.Being the un-transparent portions in grain en-dosperm,chalkiness iS always mea-sured by some subjective eye-judgingmethods domestically and interna-tionally.Re...Rice kernel chalkiness is an impor-tant quality character.Being the un-transparent portions in grain en-dosperm,chalkiness iS always mea-sured by some subjective eye-judgingmethods domestically and interna-tionally.Results measured by suchmethods aye subjective,inaccurate,and unstable.This research is in-展开更多
Visible and infrared(RGB-IR)fusion object detection plays an important role in security,disaster relief,etc.In recent years,deep-learning-based RGB-IR fusion detection methods have been developing rapidly,but still st...Visible and infrared(RGB-IR)fusion object detection plays an important role in security,disaster relief,etc.In recent years,deep-learning-based RGB-IR fusion detection methods have been developing rapidly,but still struggle to deal with the complex and changing scenarios captured by drones,mainly due to two reasons:(A)RGB-IR fusion detectors are susceptible to inferior inputs that degrade performance and stability.(B)RGB-IR fusion detectors are susceptible to redundant features that reduce accuracy and efficiency.In this paper,an innovative RGB-IR fusion detection framework based on global-local feature optimization,named GLFDet,is proposed to improve the detection performance and efficiency of drone-captured objects.The key components of GLFDet include a Global Feature Optimization(GFO)module,a Local Feature Optimization(LFO)module and a Channel Separation Fusion(CSF)module.Specifically,GFO calculates the information content of the input image from the frequency domain and optimizes the features holistically.Then,LFO dynamically selects high-value features and filters out low-value features before fusion,which significantly improves the efficiency of fusion.Finally,CSF fuses the RGB and IR features across the corresponding channels,which avoids the rearrangement of the channel relationships and enhances the model stability.Extensive experimental results show that the proposed method achieves the best performance on three popular RGB-IR datasets Drone Vehicle,VEDAI,and LLVIP.In addition,GLFDet is more lightweight than other comparable models,making it more appealing to edge devices such as drones.The code is available at https://github.com/lao chen330/GLFDet.展开更多
Human object detection and recognition is essential for elderly monitoring and assisted living however,models relying solely on pose or scene context often struggle in cluttered or visually ambiguous settings.To addre...Human object detection and recognition is essential for elderly monitoring and assisted living however,models relying solely on pose or scene context often struggle in cluttered or visually ambiguous settings.To address this,we present SCENET-3D,a transformer-drivenmultimodal framework that unifies human-centric skeleton features with scene-object semantics for intelligent robotic vision through a three-stage pipeline.In the first stage,scene analysis,rich geometric and texture descriptors are extracted from RGB frames,including surface-normal histograms,angles between neighboring normals,Zernike moments,directional standard deviation,and Gabor-filter responses.In the second stage,scene-object analysis,non-human objects are segmented and represented using local feature descriptors and complementary surface-normal information.In the third stage,human-pose estimation,silhouettes are processed through an enhanced MoveNet to obtain 2D anatomical keypoints,which are fused with depth information and converted into RGB-based point clouds to construct pseudo-3D skeletons.Features from all three stages are fused and fed in a transformer encoder with multi-head attention to resolve visually similar activities.Experiments on UCLA(95.8%),ETRI-Activity3D(89.4%),andCAD-120(91.2%)demonstrate that combining pseudo-3D skeletonswith rich scene-object fusion significantly improves generalizable activity recognition,enabling safer elderly care,natural human–robot interaction,and robust context-aware robotic perception in real-world environments.展开更多
Ensuring the reliability of power transmission networks depends heavily on the early detection of faults in key components such as insulators,which serve both mechanical and electrical functions.Even a single defectiv...Ensuring the reliability of power transmission networks depends heavily on the early detection of faults in key components such as insulators,which serve both mechanical and electrical functions.Even a single defective insulator can lead to equipment breakdown,costly service interruptions,and increased maintenance demands.While unmanned aerial vehicles(UAVs)enable rapid and cost-effective collection of high-resolution imagery,accurate defect identification remains challenging due to cluttered backgrounds,variable lighting,and the diverse appearance of faults.To address these issues,we introduce a real-time inspection framework that integrates an enhanced YOLOv10 detector with a Hybrid Quantum-Enhanced Graph Neural Network(HQGNN).The YOLOv10 module,fine-tuned on domainspecific UAV datasets,improves detection precision,while the HQGNN ensures multi-object tracking and temporal consistency across video frames.This synergy enables reliable and efficient identification of faulty insulators under complex environmental conditions.Experimental results show that the proposed YOLOv10-HQGNN model surpasses existing methods across all metrics,achieving Recall of 0.85 and Average Precision(AP)of 0.83,with clear gains in both accuracy and throughput.These advancements support automated,proactive maintenance strategies that minimize downtime and contribute to a safer,smarter energy infrastructure.展开更多
This paper presents an intelligent patrol and security robot integrating 2D LiDAR and RGB-D vision sensors to achieve semantic simultaneous localization and mapping(SLAM),real-time object recognition,and dynamic obsta...This paper presents an intelligent patrol and security robot integrating 2D LiDAR and RGB-D vision sensors to achieve semantic simultaneous localization and mapping(SLAM),real-time object recognition,and dynamic obstacle avoidance.The system employs the YOLOv7 deep-learning framework for semantic detection and SLAM for localization and mapping,fusing geometric and visual data to build a high-fidelity 2D semantic map.This map enables the robot to identify and project object information for improved situational awareness.Experimental results show that object recognition reached 95.4%mAP@0.5.Semantic completeness increased from 68.7%(single view)to 94.1%(multi-view)with an average position error of 3.1 cm.During navigation,the robot achieved 98.0%reliability,avoided moving obstacles in 90.0%of encounters,and replanned paths in 0.42 s on average.The integration of LiDAR-based SLAMwith deep-learning–driven semantic perception establishes a robust foundation for intelligent,adaptive,and safe robotic navigation in dynamic environments.展开更多
Traffic sign detection is a critical component of driving systems.Single-stage network-based traffic sign detection algorithms,renowned for their fast detection speeds and high accuracy,have become the dominant approa...Traffic sign detection is a critical component of driving systems.Single-stage network-based traffic sign detection algorithms,renowned for their fast detection speeds and high accuracy,have become the dominant approach in current practices.However,in complex and dynamic traffic scenes,particularly with smaller traffic sign objects,challenges such as missed and false detections can lead to reduced overall detection accuracy.To address this issue,this paper proposes a detection algorithm that integrates edge and shape information.Recognizing that traffic signs have specific shapes and distinct edge contours,this paper introduces an edge feature extraction branch within the backbone network,enabling adaptive fusion with features of the same hierarchical level.Additionally,a shape prior convolution module is designed to replaces the first two convolutional modules of the backbone network,aimed at enhancing the model's perception ability for specific shape objects and reducing its sensitivity to background noise.The algorithm was evaluated on the CCTSDB and TT100k datasets,and compared to YOLOv8s,the mAP50 values increased by 3.0%and 10.4%,respectively,demonstrating the effectiveness of the proposed method in improving the accuracy of traffic sign detection.展开更多
The initial noise present in the depth images obtained with RGB-D sensors is a combination of hardware limitations in addition to the environmental factors,due to the limited capabilities of sensors,which also produce...The initial noise present in the depth images obtained with RGB-D sensors is a combination of hardware limitations in addition to the environmental factors,due to the limited capabilities of sensors,which also produce poor computer vision results.The common image denoising techniques tend to remove significant image details and also remove noise,provided they are based on space and frequency filtering.The updated framework presented in this paper is a novel denoising model that makes use of Boruta-driven feature selection using a Long Short-Term Memory Autoencoder(LSTMAE).The Boruta algorithm identifies the most useful depth features that are used to maximize the spatial structure integrity and reduce redundancy.An LSTMAE is then used to process these selected features and model depth pixel sequences to generate robust,noise-resistant representations.The system uses the encoder to encode the input data into a latent space that has been compressed before it is decoded to retrieve the clean image.Experiments on a benchmark data set show that the suggested technique attains a PSNR of 45 dB and an SSIM of 0.90,which is 10 dB higher than the performance of conventional convolutional autoencoders and 15 times higher than that of the wavelet-based models.Moreover,the feature selection step will decrease the input dimensionality by 40%,resulting in a 37.5%reduction in training time and a real-time inference rate of 200 FPS.Boruta-LSTMAE framework,therefore,offers a highly efficient and scalable system for depth image denoising,with a high potential to be applied to close-range 3D systems,such as robotic manipulation and gesture-based interfaces.展开更多
Regular detection of pavement cracks is essential for infrastructure maintenance.However,existing methods often ignore the challenges such as the continuous evolution of crack features between video frames and the dif...Regular detection of pavement cracks is essential for infrastructure maintenance.However,existing methods often ignore the challenges such as the continuous evolution of crack features between video frames and the difficulty of defect quantification.To this end,this paper proposes an integrated framework for pavement crack detection,segmentation,tracking and counting based on Transformer.Firstly,we design theVitSeg-Det network,which is an integrated detection and segmentation network that can accurately locate and segment tiny cracks in complex scenes.Second,the TransTra-Count system is developed to automatically count the number of defects by combining defect tracking with width estimation.Finally,we conduct experimental verification on three datasets.The results show that the proposed method is superior to the existing deep learning methods in detection accuracy.In addition,the actual scene video test shows that the framework can accurately label the defect location and output the number of defects in real time.展开更多
Deep learning has made significant progress in the field of oriented object detection for remote sensing images.However,existing methods still face challenges when dealing with difficult tasks such as multi-scale targ...Deep learning has made significant progress in the field of oriented object detection for remote sensing images.However,existing methods still face challenges when dealing with difficult tasks such as multi-scale targets,complex backgrounds,and small objects in remote sensing.Maintaining model lightweight to address resource constraints in remote sensing scenarios while improving task completion for remote sensing tasks remains a research hotspot.Therefore,we propose an enhanced multi-scale feature extraction lightweight network EM-YOLO based on the YOLOv8s architecture,specifically optimized for the characteristics of large target scale variations,diverse orientations,and numerous small objects in remote sensing images.Our innovations lie in two main aspects:First,a dynamic snake convolution(DSC)is introduced into the backbone network to enhance the model’s feature extraction capability for oriented targets.Second,an innovative focusing-diffusion module is designed in the feature fusion neck to effectively integrate multi-scale feature information.Finally,we introduce Layer-Adaptive Sparsity for magnitude-based Pruning(LASP)method to perform lightweight network pruning to better complete tasks in resource-constrained scenarios.Experimental results on the lightweight platform Orin demonstrate that the proposed method significantly outperforms the original YOLOv8s model in oriented remote sensing object detection tasks,and achieves comparable or superior performance to state-of-the-art methods on three authoritative remote sensing datasets(DOTA v1.0,DOTA v1.5,and HRSC2016).展开更多
In modern industrial production,foreign object detection in complex environments is crucial to ensure product quality and production safety.Detection systems based on deep-learning image processing algorithms often fa...In modern industrial production,foreign object detection in complex environments is crucial to ensure product quality and production safety.Detection systems based on deep-learning image processing algorithms often face challenges with handling high-resolution images and achieving accurate detection against complex backgrounds.To address these issues,this study employs the PatchCore unsupervised anomaly detection algorithm combined with data augmentation techniques to enhance the system’s generalization capability across varying lighting conditions,viewing angles,and object scales.The proposed method is evaluated in a complex industrial detection scenario involving the bogie of an electric multiple unit(EMU).A dataset consisting of complex backgrounds,diverse lighting conditions,and multiple viewing angles is constructed to validate the performance of the detection system in real industrial environments.Experimental results show that the proposed model achieves an average area under the receiver operating characteristic curve(AUROC)of 0.92 and an average F1 score of 0.85.Combined with data augmentation,the proposed model exhibits improvements in AUROC by 0.06 and F1 score by 0.03,demonstrating enhanced accuracy and robustness for foreign object detection in complex industrial settings.In addition,the effects of key factors on detection performance are systematically analyzed,providing practical guidance for parameter selection in real industrial applications.展开更多
Small object detection has been a focus of attention since the emergence of deep learning-based object detection.Although classical object detection frameworks have made significant contributions to the development of...Small object detection has been a focus of attention since the emergence of deep learning-based object detection.Although classical object detection frameworks have made significant contributions to the development of object detection,there are still many issues to be resolved in detecting small objects due to the inherent complexity and diversity of real-world visual scenes.In particular,the YOLO(You Only Look Once)series of detection models,renowned for their real-time performance,have undergone numerous adaptations aimed at improving the detection of small targets.In this survey,we summarize the state-of-the-art YOLO-based small object detection methods.This review presents a systematic categorization of YOLO-based approaches for small-object detection,organized into four methodological avenues,namely attention-based feature enhancement,detection-head optimization,loss function,and multi-scale feature fusion strategies.We then examine the principal challenges addressed by each category.Finally,we analyze the performance of thesemethods on public benchmarks and,by comparing current approaches,identify limitations and outline directions for future research.展开更多
In recent years,with the rapid advancement of artificial intelligence,object detection algorithms have made significant strides in accuracy and computational efficiency.Notably,research and applications of Anchor-Free...In recent years,with the rapid advancement of artificial intelligence,object detection algorithms have made significant strides in accuracy and computational efficiency.Notably,research and applications of Anchor-Free models have opened new avenues for real-time target detection in optical remote sensing images(ORSIs).However,in the realmof adversarial attacks,developing adversarial techniques tailored to Anchor-Freemodels remains challenging.Adversarial examples generated based on Anchor-Based models often exhibit poor transferability to these new model architectures.Furthermore,the growing diversity of Anchor-Free models poses additional hurdles to achieving robust transferability of adversarial attacks.This study presents an improved cross-conv-block feature fusion You Only Look Once(YOLO)architecture,meticulously engineered to facilitate the extraction ofmore comprehensive semantic features during the backpropagation process.To address the asymmetry between densely distributed objects in ORSIs and the corresponding detector outputs,a novel dense bounding box attack strategy is proposed.This approach leverages dense target bounding boxes loss in the calculation of adversarial loss functions.Furthermore,by integrating translation-invariant(TI)and momentum-iteration(MI)adversarial methodologies,the proposed framework significantly improves the transferability of adversarial attacks.Experimental results demonstrate that our method achieves superior adversarial attack performance,with adversarial transferability rates(ATR)of 67.53%on the NWPU VHR-10 dataset and 90.71%on the HRSC2016 dataset.Compared to ensemble adversarial attack and cascaded adversarial attack approaches,our method generates adversarial examples in an average of 0.64 s,representing an approximately 14.5%improvement in efficiency under equivalent conditions.展开更多
The use of Unmanned Aerial Vehicles(UAVs)for defect detection on railway slopes is becoming increasingly widespread due to their ability to capture high-resolution images over large,inaccessible,and topographically co...The use of Unmanned Aerial Vehicles(UAVs)for defect detection on railway slopes is becoming increasingly widespread due to their ability to capture high-resolution images over large,inaccessible,and topographically complex areas.However,current UAV-based detection methods face several critical limitations,including constrained deployment frequency,limited availability of annotated defect data,and the lack of mature risk assessment frameworks.To address these challenges,this study introduces a novel approach that integrates diffusion models with Large Language Models(LLMs)to generate highquality synthetic defect images tailored to railway slope scenarios.Furthermore,an improved transformerbased architecture is proposed,incorporating attention mechanisms and LLM-guided diffusion-generated imagery to enhance defect recognition performance under complex environmental conditions.Experimental evaluations conducted on a dataset of 300 field-collected images from high-risk railway slopes demonstrate that the proposed method significantly outperforms existing baselines in terms of precision,recall,and robustness,indicating strong applicability for real-world railway infrastructure monitoring and disaster prevention.展开更多
Breast cancer screening programs rely heavily on mammography for early detection;however,diagnostic performance is strongly affected by inter-reader variability,breast density,and the limitations of conven-tional comp...Breast cancer screening programs rely heavily on mammography for early detection;however,diagnostic performance is strongly affected by inter-reader variability,breast density,and the limitations of conven-tional computer-aided detection systems.Recent advances in deep learning have enabled more robust and scalable solutions for large-scale screening,yet a systematic comparison of modern object detection architectures on nationally representative datasets remains limited.This study presents a comprehensive quantitative comparison of prominent deep learning–based object detection architectures for Artificial Intelligence-assisted mammography analysis using the MammosighTR dataset,developed within the Turkish National Breast Cancer Screening Program.The dataset comprises 12,740 patient cases collected between 2016 and 2022,annotated with BI-RADS categories,breast density levels,and lesion localization labels.A total of 31 models were evaluated,including One-Stage,Two-Stage,and Transformer-based architectures,under a unified experimental framework at both patient and breast levels.The results demonstrate that Two-Stage architectures consistently outperform One-Stage models,achieving approximately 2%–4%higher Macro F1-Scores and more balanced precision–recall trade-offs,with Double-Head R-CNN and Dynamic R-CNN yielding the highest overall performance(Macro F1≈0.84–0.86).This advantage is primarily attributed to the region proposal mechanism and improved class balance inherent to Two-Stage designs.One-Stage detectors exhibited higher sensitivity and faster inference,reaching Recall values above 0.88,but experienced minor reductions in Precision and overall accuracy(≈1%–2%)compared with Two-Stage models.Among Transformer-based architectures,Deformable DEtection TRansformer demonstrated strong robustness and consistency across datasets,achieving Macro F1-Scores comparable to CNN-based detectors(≈0.83–0.85)while exhibiting minimal performance degradation under distributional shifts.Breast density–based analysis revealed increased misclassification rates in medium-density categories(types B and C),whereas Transformer-based architectures maintained more stable performance in high-density type D tissue.These findings quantitatively confirm that both architectural design and tissue characteristics play a decisive role in diagnostic accuracy.Overall,the study provides a reproducible benchmark and highlights the potential of hybrid approaches that combine the accuracy of Two-Stage detectors with the contextual modeling capability of Transformer architectures for clinically reliable breast cancer screening systems.展开更多
To solve the false detection and missed detection problems caused by various types and sizes of defects in the detection of steel surface defects,similar defects and background features,and similarities between differ...To solve the false detection and missed detection problems caused by various types and sizes of defects in the detection of steel surface defects,similar defects and background features,and similarities between different defects,this paper proposes a lightweight detection model named multiscale edge and squeeze-and-excitation attention detection network(MSESE),which is built upon the You Only Look Once version 11 nano(YOLOv11n).To address the difficulty of locating defect edges,we first propose an edge enhancement module(EEM),apply it to the process of multiscale feature extraction,and then propose a multiscale edge enhancement module(MSEEM).By obtaining defect features from different scales and enhancing their edge contours,the module uses the dual-domain selection mechanism to effectively focus on the important areas in the image to ensure that the feature images have richer information and clearer contour features.By fusing the squeeze-and-excitation attention mechanism with the EEM,we obtain a lighter module that can enhance the representation of edge features,which is named the edge enhancement module with squeeze-and-excitation attention(EEMSE).This module was subsequently integrated into the detection head.The enhanced detection head achieves improved edge feature enhancement with reduced computational overhead,while effectively adjusting channel-wise importance and further refining feature representation.Experiments on the NEU-DET dataset show that,compared with the original YOLOv11n,the improved model achieves improvements of 4.1%and 2.2%in terms of mAP@0.5 and mAP@0.5:0.95,respectively,and the GFLOPs value decreases from the original value of 6.4 to 6.2.Furthermore,when compared to current mainstream models,Mamba-YOLOT and RTDETR-R34,our method achieves superior performance with 6.5%and 8.9%higher mAP@0.5,respectively,while maintaining a more compact parameter footprint.These results collectively validate the effectiveness and efficiency of our proposed approach.展开更多
Recognising human-object interactions(HOI)is a challenging task for traditional machine learning models,including convolutional neural networks(CNNs).Existing models show limited transferability across complex dataset...Recognising human-object interactions(HOI)is a challenging task for traditional machine learning models,including convolutional neural networks(CNNs).Existing models show limited transferability across complex datasets such as D3D-HOI and SYSU 3D HOI.The conventional architecture of CNNs restricts their ability to handle HOI scenarios with high complexity.HOI recognition requires improved feature extraction methods to overcome the current limitations in accuracy and scalability.This work proposes a Novel quantum gate-enabled hybrid CNN(QEH-CNN)for effectiveHOI recognition.Themodel enhancesCNNperformance by integrating quantumcomputing components.The framework begins with bilateral image filtering,followed bymulti-object tracking(MOT)and Felzenszwalb superpixel segmentation.A watershed algorithm refines object boundaries by cleaning merged superpixels.Feature extraction combines a histogram of oriented gradients(HOG),Global Image Statistics for Texture(GIST)descriptors,and a novel 23-joint keypoint extractionmethod using relative joint angles and joint proximitymeasures.A fuzzy optimization process refines the extracted features before feeding them into the QEH-CNNmodel.The proposed model achieves 95.06%accuracy on the 3D-D3D-HOI dataset and 97.29%on the SYSU3DHOI dataset.Theintegration of quantum computing enhances feature optimization,leading to improved accuracy and overall model efficiency.展开更多
Inspections of power transmission lines(PTLs)conducted using unmanned aerial vehicles(UAVs)are complicated by the fine structure of the lines and complex backgrounds,making accurate and efficient segmentation challeng...Inspections of power transmission lines(PTLs)conducted using unmanned aerial vehicles(UAVs)are complicated by the fine structure of the lines and complex backgrounds,making accurate and efficient segmentation challenging.This study presents the Wavelet-Guided Transformer U-Net(WGT-UNet)model,a new hybrid net-work that combines Convolutional Neural Networks(CNNs),Discrete Wavelet Transform(DWT),and Transformer architectures.The model’s primary contribution is based on spatial and channel attention mechanisms derived from wavelet subbands to guide the Transformer’s self-attention structure.Thus,low and high frequency components are separated at each stage using DWT,suppressing structural noise and making linear objects more prominent.The developed design is supported by multi-component hybrid cost functions that simultaneously solve class imbalance,edge sharpness,structural integrity,and spatial regularity issues.Furthermore,high segmentation success has been achieved in producing sharp boundaries and continuous line structures with the DWT-guided attention mechanism.Experiments conducted on the TTPLA dataset reveal that the version using the ConvNeXt backbone outperforms the current state-of-the-art approaches with an F1-Score of 79.33%and an Intersection over Union(IoU)value of 68.38%.The models and visual outputs of the developed method and all compared models can be accessed at https://github.com/burhanbarakli/WGT-UNET.展开更多
Salient object detection(SOD)models struggle to simultaneously preserve global structure,maintain sharp object boundaries,and sustain computational efficiency in complex scenes.In this study,we propose SPSALNet,a task...Salient object detection(SOD)models struggle to simultaneously preserve global structure,maintain sharp object boundaries,and sustain computational efficiency in complex scenes.In this study,we propose SPSALNet,a task-driven two-stage(macro–micro)architecture that restructures the SOD process around superpixel representations.In the proposed approach,a“split-and-enhance”principle,introduced to our knowledge for the first time in the SOD literature,hierarchically classifies superpixels and then applies targeted refinement only to ambiguous or error-prone regions.At the macro stage,the image is partitioned into content-adaptive superpixel regions,and each superpixel is represented by a high-dimensional region-level feature vector.These representations define a regional decomposition problem in which superpixels are assigned to three classes:background,object interior,and transition regions.Superpixel tokens interact with a global feature vector from a deep network backbone through a cross-attention module and are projected into an enriched embedding space that jointly encodes local topology and global context.At the micro stage,the model employs a U-Net-based refinement process that allocates computational resources only to ambiguous transition regions.The image and distance–similarity maps derived from superpixels are processed through a dual-encoder pathway.Subsequently,channel-aware fusion blocks adaptively combine information from these two sources,producing sharper and more stable object boundaries.Experimental results show that SPSALNet achieves high accuracy with lower computational cost compared to recent competing methods.On the PASCAL-S and DUT-OMRON datasets,SPSALNet exhibits a clear performance advantage across all key metrics,and it ranks first on accuracy-oriented measures on HKU-IS.On the challenging DUT-OMRON benchmark,SPSALNet reaches a MAE of 0.034.Across all datasets,it preserves object boundaries and regional structure in a stable and competitive manner.展开更多
In industrial manufacturing,efficient surface defect detection is crucial for ensuring product quality and production safety.Traditional inspectionmethods are often slow,subjective,and prone to errors,while classicalm...In industrial manufacturing,efficient surface defect detection is crucial for ensuring product quality and production safety.Traditional inspectionmethods are often slow,subjective,and prone to errors,while classicalmachine vision techniques strugglewith complex backgrounds and small defects.To address these challenges,this study proposes an improved YOLOv11 model for detecting defects on hot-rolled steel strips using the NEU-DET dataset.Three key improvements are introduced in the proposed model.First,a lightweight Guided Attention Feature Module(GAFM)is incorporated to enhance multi-scale feature fusion,allowing the model to better capture and integrate semantic and spatial information across different layers,which improves its ability to detect defects of varying sizes.Second,an Aggregated Attention(AA)mechanism is employed to strengthen the representation of critical defect features while effectively suppressing irrelevant background information,particularly enhancing the detection of small,low-contrast,or complex defects.Third,Ghost Dynamic Convolution(GDC)is applied to reduce computational cost by generating low-cost ghost features and dynamically reweighting convolutional kernels,enabling faster inference without sacrificing feature quality or detection accuracy.Extensive experiments demonstrate that the proposed model achieves a mean Average Precision(mAP)of 87.2%,compared to 81.5%for the baseline,while lowering computational cost from6.3Giga Floating-point Operations Per Second(GFLOPs)to 5.1 GFLOPs.These results indicate that the improved YOLOv11 is both accurate and computationally efficient,making it suitable for real-time industrial surface defect detection and contributing to the development of practical,high-performance inspection systems.展开更多
Advanced traffic monitoring systems encounter substantial challenges in vehicle detection and classification due to the limitations of conventional methods,which often demand extensive computational resources and stru...Advanced traffic monitoring systems encounter substantial challenges in vehicle detection and classification due to the limitations of conventional methods,which often demand extensive computational resources and struggle with diverse data acquisition techniques.This research presents a novel approach for vehicle classification and recognition in aerial image sequences,integrating multiple advanced techniques to enhance detection accuracy.The proposed model begins with preprocessing using Multiscale Retinex(MSR)to enhance image quality,followed by Expectation-Maximization(EM)Segmentation for precise foreground object identification.Vehicle detection is performed using the state-of-the-art YOLOv10 framework,while feature extraction incorporates Maximally Stable Extremal Regions(MSER),Dense Scale-Invariant Feature Transform(Dense SIFT),and Zernike Moments Features to capture distinct object characteristics.Feature optimization is further refined through a Hybrid Swarm-based Optimization algorithm,ensuring optimal feature selection for improved classification performance.The final classification is conducted using a Vision Transformer,leveraging its robust learning capabilities for enhanced accuracy.Experimental evaluations on benchmark datasets,including UAVDT and the Unmanned Aerial Vehicle Intruder Dataset(UAVID),demonstrate the superiority of the proposed approach,achieving an accuracy of 94.40%on UAVDT and 93.57%on UAVID.The results highlight the efficacy of the model in significantly enhancing vehicle detection and classification in aerial imagery,outperforming existing methodologies and offering a statistically validated improvement for intelligent traffic monitoring systems compared to existing approaches.展开更多
文摘Rice kernel chalkiness is an impor-tant quality character.Being the un-transparent portions in grain en-dosperm,chalkiness iS always mea-sured by some subjective eye-judgingmethods domestically and interna-tionally.Results measured by suchmethods aye subjective,inaccurate,and unstable.This research is in-
基金supported by the National Natural Science Foundation of China(No.62276204)the Fundamental Research Funds for the Central Universities,China(No.YJSJ24011)+1 种基金the Natural Science Basic Research Program of Shaanxi,China(Nos.2022JM-340 and 2023-JC-QN-0710)the China Postdoctoral Science Foundation(Nos.2020T130494 and 2018M633470)。
文摘Visible and infrared(RGB-IR)fusion object detection plays an important role in security,disaster relief,etc.In recent years,deep-learning-based RGB-IR fusion detection methods have been developing rapidly,but still struggle to deal with the complex and changing scenarios captured by drones,mainly due to two reasons:(A)RGB-IR fusion detectors are susceptible to inferior inputs that degrade performance and stability.(B)RGB-IR fusion detectors are susceptible to redundant features that reduce accuracy and efficiency.In this paper,an innovative RGB-IR fusion detection framework based on global-local feature optimization,named GLFDet,is proposed to improve the detection performance and efficiency of drone-captured objects.The key components of GLFDet include a Global Feature Optimization(GFO)module,a Local Feature Optimization(LFO)module and a Channel Separation Fusion(CSF)module.Specifically,GFO calculates the information content of the input image from the frequency domain and optimizes the features holistically.Then,LFO dynamically selects high-value features and filters out low-value features before fusion,which significantly improves the efficiency of fusion.Finally,CSF fuses the RGB and IR features across the corresponding channels,which avoids the rearrangement of the channel relationships and enhances the model stability.Extensive experimental results show that the proposed method achieves the best performance on three popular RGB-IR datasets Drone Vehicle,VEDAI,and LLVIP.In addition,GLFDet is more lightweight than other comparable models,making it more appealing to edge devices such as drones.The code is available at https://github.com/lao chen330/GLFDet.
基金funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R410),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Human object detection and recognition is essential for elderly monitoring and assisted living however,models relying solely on pose or scene context often struggle in cluttered or visually ambiguous settings.To address this,we present SCENET-3D,a transformer-drivenmultimodal framework that unifies human-centric skeleton features with scene-object semantics for intelligent robotic vision through a three-stage pipeline.In the first stage,scene analysis,rich geometric and texture descriptors are extracted from RGB frames,including surface-normal histograms,angles between neighboring normals,Zernike moments,directional standard deviation,and Gabor-filter responses.In the second stage,scene-object analysis,non-human objects are segmented and represented using local feature descriptors and complementary surface-normal information.In the third stage,human-pose estimation,silhouettes are processed through an enhanced MoveNet to obtain 2D anatomical keypoints,which are fused with depth information and converted into RGB-based point clouds to construct pseudo-3D skeletons.Features from all three stages are fused and fed in a transformer encoder with multi-head attention to resolve visually similar activities.Experiments on UCLA(95.8%),ETRI-Activity3D(89.4%),andCAD-120(91.2%)demonstrate that combining pseudo-3D skeletonswith rich scene-object fusion significantly improves generalizable activity recognition,enabling safer elderly care,natural human–robot interaction,and robust context-aware robotic perception in real-world environments.
基金supported by Ho Chi Minh City Open University,Vietnam and Suan Sunandha Rajabhat Univeristy,Thailand.
文摘Ensuring the reliability of power transmission networks depends heavily on the early detection of faults in key components such as insulators,which serve both mechanical and electrical functions.Even a single defective insulator can lead to equipment breakdown,costly service interruptions,and increased maintenance demands.While unmanned aerial vehicles(UAVs)enable rapid and cost-effective collection of high-resolution imagery,accurate defect identification remains challenging due to cluttered backgrounds,variable lighting,and the diverse appearance of faults.To address these issues,we introduce a real-time inspection framework that integrates an enhanced YOLOv10 detector with a Hybrid Quantum-Enhanced Graph Neural Network(HQGNN).The YOLOv10 module,fine-tuned on domainspecific UAV datasets,improves detection precision,while the HQGNN ensures multi-object tracking and temporal consistency across video frames.This synergy enables reliable and efficient identification of faulty insulators under complex environmental conditions.Experimental results show that the proposed YOLOv10-HQGNN model surpasses existing methods across all metrics,achieving Recall of 0.85 and Average Precision(AP)of 0.83,with clear gains in both accuracy and throughput.These advancements support automated,proactive maintenance strategies that minimize downtime and contribute to a safer,smarter energy infrastructure.
基金supported by the National Science and Technology Council of under Grant NSTC 114-2221-E-130-007.
文摘This paper presents an intelligent patrol and security robot integrating 2D LiDAR and RGB-D vision sensors to achieve semantic simultaneous localization and mapping(SLAM),real-time object recognition,and dynamic obstacle avoidance.The system employs the YOLOv7 deep-learning framework for semantic detection and SLAM for localization and mapping,fusing geometric and visual data to build a high-fidelity 2D semantic map.This map enables the robot to identify and project object information for improved situational awareness.Experimental results show that object recognition reached 95.4%mAP@0.5.Semantic completeness increased from 68.7%(single view)to 94.1%(multi-view)with an average position error of 3.1 cm.During navigation,the robot achieved 98.0%reliability,avoided moving obstacles in 90.0%of encounters,and replanned paths in 0.42 s on average.The integration of LiDAR-based SLAMwith deep-learning–driven semantic perception establishes a robust foundation for intelligent,adaptive,and safe robotic navigation in dynamic environments.
基金supported by the National Natural Science Foundation of China(Grant Nos.62572057,62272049,U24A20331)Beijing Natural Science Foundation(Grant Nos.4232026,4242020)Academic Research Projects of Beijing Union University(Grant No.ZK10202404).
文摘Traffic sign detection is a critical component of driving systems.Single-stage network-based traffic sign detection algorithms,renowned for their fast detection speeds and high accuracy,have become the dominant approach in current practices.However,in complex and dynamic traffic scenes,particularly with smaller traffic sign objects,challenges such as missed and false detections can lead to reduced overall detection accuracy.To address this issue,this paper proposes a detection algorithm that integrates edge and shape information.Recognizing that traffic signs have specific shapes and distinct edge contours,this paper introduces an edge feature extraction branch within the backbone network,enabling adaptive fusion with features of the same hierarchical level.Additionally,a shape prior convolution module is designed to replaces the first two convolutional modules of the backbone network,aimed at enhancing the model's perception ability for specific shape objects and reducing its sensitivity to background noise.The algorithm was evaluated on the CCTSDB and TT100k datasets,and compared to YOLOv8s,the mAP50 values increased by 3.0%and 10.4%,respectively,demonstrating the effectiveness of the proposed method in improving the accuracy of traffic sign detection.
文摘The initial noise present in the depth images obtained with RGB-D sensors is a combination of hardware limitations in addition to the environmental factors,due to the limited capabilities of sensors,which also produce poor computer vision results.The common image denoising techniques tend to remove significant image details and also remove noise,provided they are based on space and frequency filtering.The updated framework presented in this paper is a novel denoising model that makes use of Boruta-driven feature selection using a Long Short-Term Memory Autoencoder(LSTMAE).The Boruta algorithm identifies the most useful depth features that are used to maximize the spatial structure integrity and reduce redundancy.An LSTMAE is then used to process these selected features and model depth pixel sequences to generate robust,noise-resistant representations.The system uses the encoder to encode the input data into a latent space that has been compressed before it is decoded to retrieve the clean image.Experiments on a benchmark data set show that the suggested technique attains a PSNR of 45 dB and an SSIM of 0.90,which is 10 dB higher than the performance of conventional convolutional autoencoders and 15 times higher than that of the wavelet-based models.Moreover,the feature selection step will decrease the input dimensionality by 40%,resulting in a 37.5%reduction in training time and a real-time inference rate of 200 FPS.Boruta-LSTMAE framework,therefore,offers a highly efficient and scalable system for depth image denoising,with a high potential to be applied to close-range 3D systems,such as robotic manipulation and gesture-based interfaces.
基金supported in part by the Natural Science Foundation of Shaanxi Province of China under Grant 2024JC-YBQN-0695.
文摘Regular detection of pavement cracks is essential for infrastructure maintenance.However,existing methods often ignore the challenges such as the continuous evolution of crack features between video frames and the difficulty of defect quantification.To this end,this paper proposes an integrated framework for pavement crack detection,segmentation,tracking and counting based on Transformer.Firstly,we design theVitSeg-Det network,which is an integrated detection and segmentation network that can accurately locate and segment tiny cracks in complex scenes.Second,the TransTra-Count system is developed to automatically count the number of defects by combining defect tracking with width estimation.Finally,we conduct experimental verification on three datasets.The results show that the proposed method is superior to the existing deep learning methods in detection accuracy.In addition,the actual scene video test shows that the framework can accurately label the defect location and output the number of defects in real time.
基金funded by the Hainan Province Science and Technology Special Fund under Grant ZDYF2024GXJS292.
文摘Deep learning has made significant progress in the field of oriented object detection for remote sensing images.However,existing methods still face challenges when dealing with difficult tasks such as multi-scale targets,complex backgrounds,and small objects in remote sensing.Maintaining model lightweight to address resource constraints in remote sensing scenarios while improving task completion for remote sensing tasks remains a research hotspot.Therefore,we propose an enhanced multi-scale feature extraction lightweight network EM-YOLO based on the YOLOv8s architecture,specifically optimized for the characteristics of large target scale variations,diverse orientations,and numerous small objects in remote sensing images.Our innovations lie in two main aspects:First,a dynamic snake convolution(DSC)is introduced into the backbone network to enhance the model’s feature extraction capability for oriented targets.Second,an innovative focusing-diffusion module is designed in the feature fusion neck to effectively integrate multi-scale feature information.Finally,we introduce Layer-Adaptive Sparsity for magnitude-based Pruning(LASP)method to perform lightweight network pruning to better complete tasks in resource-constrained scenarios.Experimental results on the lightweight platform Orin demonstrate that the proposed method significantly outperforms the original YOLOv8s model in oriented remote sensing object detection tasks,and achieves comparable or superior performance to state-of-the-art methods on three authoritative remote sensing datasets(DOTA v1.0,DOTA v1.5,and HRSC2016).
文摘In modern industrial production,foreign object detection in complex environments is crucial to ensure product quality and production safety.Detection systems based on deep-learning image processing algorithms often face challenges with handling high-resolution images and achieving accurate detection against complex backgrounds.To address these issues,this study employs the PatchCore unsupervised anomaly detection algorithm combined with data augmentation techniques to enhance the system’s generalization capability across varying lighting conditions,viewing angles,and object scales.The proposed method is evaluated in a complex industrial detection scenario involving the bogie of an electric multiple unit(EMU).A dataset consisting of complex backgrounds,diverse lighting conditions,and multiple viewing angles is constructed to validate the performance of the detection system in real industrial environments.Experimental results show that the proposed model achieves an average area under the receiver operating characteristic curve(AUROC)of 0.92 and an average F1 score of 0.85.Combined with data augmentation,the proposed model exhibits improvements in AUROC by 0.06 and F1 score by 0.03,demonstrating enhanced accuracy and robustness for foreign object detection in complex industrial settings.In addition,the effects of key factors on detection performance are systematically analyzed,providing practical guidance for parameter selection in real industrial applications.
基金supported in part by the by Chongqing Research Program of Basic Research and Frontier Technology under Grant CSTB2025NSCQ-GPX1309.
文摘Small object detection has been a focus of attention since the emergence of deep learning-based object detection.Although classical object detection frameworks have made significant contributions to the development of object detection,there are still many issues to be resolved in detecting small objects due to the inherent complexity and diversity of real-world visual scenes.In particular,the YOLO(You Only Look Once)series of detection models,renowned for their real-time performance,have undergone numerous adaptations aimed at improving the detection of small targets.In this survey,we summarize the state-of-the-art YOLO-based small object detection methods.This review presents a systematic categorization of YOLO-based approaches for small-object detection,organized into four methodological avenues,namely attention-based feature enhancement,detection-head optimization,loss function,and multi-scale feature fusion strategies.We then examine the principal challenges addressed by each category.Finally,we analyze the performance of thesemethods on public benchmarks and,by comparing current approaches,identify limitations and outline directions for future research.
文摘In recent years,with the rapid advancement of artificial intelligence,object detection algorithms have made significant strides in accuracy and computational efficiency.Notably,research and applications of Anchor-Free models have opened new avenues for real-time target detection in optical remote sensing images(ORSIs).However,in the realmof adversarial attacks,developing adversarial techniques tailored to Anchor-Freemodels remains challenging.Adversarial examples generated based on Anchor-Based models often exhibit poor transferability to these new model architectures.Furthermore,the growing diversity of Anchor-Free models poses additional hurdles to achieving robust transferability of adversarial attacks.This study presents an improved cross-conv-block feature fusion You Only Look Once(YOLO)architecture,meticulously engineered to facilitate the extraction ofmore comprehensive semantic features during the backpropagation process.To address the asymmetry between densely distributed objects in ORSIs and the corresponding detector outputs,a novel dense bounding box attack strategy is proposed.This approach leverages dense target bounding boxes loss in the calculation of adversarial loss functions.Furthermore,by integrating translation-invariant(TI)and momentum-iteration(MI)adversarial methodologies,the proposed framework significantly improves the transferability of adversarial attacks.Experimental results demonstrate that our method achieves superior adversarial attack performance,with adversarial transferability rates(ATR)of 67.53%on the NWPU VHR-10 dataset and 90.71%on the HRSC2016 dataset.Compared to ensemble adversarial attack and cascaded adversarial attack approaches,our method generates adversarial examples in an average of 0.64 s,representing an approximately 14.5%improvement in efficiency under equivalent conditions.
基金supported in part by the National Natural Science Foundation of China under Grant 52432012in part by the Shanghai Science and Technology Project with 25ZR1402508。
文摘The use of Unmanned Aerial Vehicles(UAVs)for defect detection on railway slopes is becoming increasingly widespread due to their ability to capture high-resolution images over large,inaccessible,and topographically complex areas.However,current UAV-based detection methods face several critical limitations,including constrained deployment frequency,limited availability of annotated defect data,and the lack of mature risk assessment frameworks.To address these challenges,this study introduces a novel approach that integrates diffusion models with Large Language Models(LLMs)to generate highquality synthetic defect images tailored to railway slope scenarios.Furthermore,an improved transformerbased architecture is proposed,incorporating attention mechanisms and LLM-guided diffusion-generated imagery to enhance defect recognition performance under complex environmental conditions.Experimental evaluations conducted on a dataset of 300 field-collected images from high-risk railway slopes demonstrate that the proposed method significantly outperforms existing baselines in terms of precision,recall,and robustness,indicating strong applicability for real-world railway infrastructure monitoring and disaster prevention.
文摘Breast cancer screening programs rely heavily on mammography for early detection;however,diagnostic performance is strongly affected by inter-reader variability,breast density,and the limitations of conven-tional computer-aided detection systems.Recent advances in deep learning have enabled more robust and scalable solutions for large-scale screening,yet a systematic comparison of modern object detection architectures on nationally representative datasets remains limited.This study presents a comprehensive quantitative comparison of prominent deep learning–based object detection architectures for Artificial Intelligence-assisted mammography analysis using the MammosighTR dataset,developed within the Turkish National Breast Cancer Screening Program.The dataset comprises 12,740 patient cases collected between 2016 and 2022,annotated with BI-RADS categories,breast density levels,and lesion localization labels.A total of 31 models were evaluated,including One-Stage,Two-Stage,and Transformer-based architectures,under a unified experimental framework at both patient and breast levels.The results demonstrate that Two-Stage architectures consistently outperform One-Stage models,achieving approximately 2%–4%higher Macro F1-Scores and more balanced precision–recall trade-offs,with Double-Head R-CNN and Dynamic R-CNN yielding the highest overall performance(Macro F1≈0.84–0.86).This advantage is primarily attributed to the region proposal mechanism and improved class balance inherent to Two-Stage designs.One-Stage detectors exhibited higher sensitivity and faster inference,reaching Recall values above 0.88,but experienced minor reductions in Precision and overall accuracy(≈1%–2%)compared with Two-Stage models.Among Transformer-based architectures,Deformable DEtection TRansformer demonstrated strong robustness and consistency across datasets,achieving Macro F1-Scores comparable to CNN-based detectors(≈0.83–0.85)while exhibiting minimal performance degradation under distributional shifts.Breast density–based analysis revealed increased misclassification rates in medium-density categories(types B and C),whereas Transformer-based architectures maintained more stable performance in high-density type D tissue.These findings quantitatively confirm that both architectural design and tissue characteristics play a decisive role in diagnostic accuracy.Overall,the study provides a reproducible benchmark and highlights the potential of hybrid approaches that combine the accuracy of Two-Stage detectors with the contextual modeling capability of Transformer architectures for clinically reliable breast cancer screening systems.
基金funded by Ministry of Education Humanities and Social Science Research Project,grant number 23YJAZH034The Postgraduate Research and Practice Innovation Program of Jiangsu Province,grant number SJCX25_17National Computer Basic Education Research Project in Higher Education Institutions,grant number 2024-AFCEC-056,2024-AFCEC-057.
文摘To solve the false detection and missed detection problems caused by various types and sizes of defects in the detection of steel surface defects,similar defects and background features,and similarities between different defects,this paper proposes a lightweight detection model named multiscale edge and squeeze-and-excitation attention detection network(MSESE),which is built upon the You Only Look Once version 11 nano(YOLOv11n).To address the difficulty of locating defect edges,we first propose an edge enhancement module(EEM),apply it to the process of multiscale feature extraction,and then propose a multiscale edge enhancement module(MSEEM).By obtaining defect features from different scales and enhancing their edge contours,the module uses the dual-domain selection mechanism to effectively focus on the important areas in the image to ensure that the feature images have richer information and clearer contour features.By fusing the squeeze-and-excitation attention mechanism with the EEM,we obtain a lighter module that can enhance the representation of edge features,which is named the edge enhancement module with squeeze-and-excitation attention(EEMSE).This module was subsequently integrated into the detection head.The enhanced detection head achieves improved edge feature enhancement with reduced computational overhead,while effectively adjusting channel-wise importance and further refining feature representation.Experiments on the NEU-DET dataset show that,compared with the original YOLOv11n,the improved model achieves improvements of 4.1%and 2.2%in terms of mAP@0.5 and mAP@0.5:0.95,respectively,and the GFLOPs value decreases from the original value of 6.4 to 6.2.Furthermore,when compared to current mainstream models,Mamba-YOLOT and RTDETR-R34,our method achieves superior performance with 6.5%and 8.9%higher mAP@0.5,respectively,while maintaining a more compact parameter footprint.These results collectively validate the effectiveness and efficiency of our proposed approach.
基金supported and funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R410),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Recognising human-object interactions(HOI)is a challenging task for traditional machine learning models,including convolutional neural networks(CNNs).Existing models show limited transferability across complex datasets such as D3D-HOI and SYSU 3D HOI.The conventional architecture of CNNs restricts their ability to handle HOI scenarios with high complexity.HOI recognition requires improved feature extraction methods to overcome the current limitations in accuracy and scalability.This work proposes a Novel quantum gate-enabled hybrid CNN(QEH-CNN)for effectiveHOI recognition.Themodel enhancesCNNperformance by integrating quantumcomputing components.The framework begins with bilateral image filtering,followed bymulti-object tracking(MOT)and Felzenszwalb superpixel segmentation.A watershed algorithm refines object boundaries by cleaning merged superpixels.Feature extraction combines a histogram of oriented gradients(HOG),Global Image Statistics for Texture(GIST)descriptors,and a novel 23-joint keypoint extractionmethod using relative joint angles and joint proximitymeasures.A fuzzy optimization process refines the extracted features before feeding them into the QEH-CNNmodel.The proposed model achieves 95.06%accuracy on the 3D-D3D-HOI dataset and 97.29%on the SYSU3DHOI dataset.Theintegration of quantum computing enhances feature optimization,leading to improved accuracy and overall model efficiency.
文摘Inspections of power transmission lines(PTLs)conducted using unmanned aerial vehicles(UAVs)are complicated by the fine structure of the lines and complex backgrounds,making accurate and efficient segmentation challenging.This study presents the Wavelet-Guided Transformer U-Net(WGT-UNet)model,a new hybrid net-work that combines Convolutional Neural Networks(CNNs),Discrete Wavelet Transform(DWT),and Transformer architectures.The model’s primary contribution is based on spatial and channel attention mechanisms derived from wavelet subbands to guide the Transformer’s self-attention structure.Thus,low and high frequency components are separated at each stage using DWT,suppressing structural noise and making linear objects more prominent.The developed design is supported by multi-component hybrid cost functions that simultaneously solve class imbalance,edge sharpness,structural integrity,and spatial regularity issues.Furthermore,high segmentation success has been achieved in producing sharp boundaries and continuous line structures with the DWT-guided attention mechanism.Experiments conducted on the TTPLA dataset reveal that the version using the ConvNeXt backbone outperforms the current state-of-the-art approaches with an F1-Score of 79.33%and an Intersection over Union(IoU)value of 68.38%.The models and visual outputs of the developed method and all compared models can be accessed at https://github.com/burhanbarakli/WGT-UNET.
文摘Salient object detection(SOD)models struggle to simultaneously preserve global structure,maintain sharp object boundaries,and sustain computational efficiency in complex scenes.In this study,we propose SPSALNet,a task-driven two-stage(macro–micro)architecture that restructures the SOD process around superpixel representations.In the proposed approach,a“split-and-enhance”principle,introduced to our knowledge for the first time in the SOD literature,hierarchically classifies superpixels and then applies targeted refinement only to ambiguous or error-prone regions.At the macro stage,the image is partitioned into content-adaptive superpixel regions,and each superpixel is represented by a high-dimensional region-level feature vector.These representations define a regional decomposition problem in which superpixels are assigned to three classes:background,object interior,and transition regions.Superpixel tokens interact with a global feature vector from a deep network backbone through a cross-attention module and are projected into an enriched embedding space that jointly encodes local topology and global context.At the micro stage,the model employs a U-Net-based refinement process that allocates computational resources only to ambiguous transition regions.The image and distance–similarity maps derived from superpixels are processed through a dual-encoder pathway.Subsequently,channel-aware fusion blocks adaptively combine information from these two sources,producing sharper and more stable object boundaries.Experimental results show that SPSALNet achieves high accuracy with lower computational cost compared to recent competing methods.On the PASCAL-S and DUT-OMRON datasets,SPSALNet exhibits a clear performance advantage across all key metrics,and it ranks first on accuracy-oriented measures on HKU-IS.On the challenging DUT-OMRON benchmark,SPSALNet reaches a MAE of 0.034.Across all datasets,it preserves object boundaries and regional structure in a stable and competitive manner.
基金supported in part by the National Natural Science Foundation of China(Grant No.62071123)in part by the Natural Science Foundation of Fujian Province(Grant Nos.2024J01971,2022J05202)in part by the Young and Middle-Aged Teacher Education Research Project of Fujian Province(Grant No.JAT210370).
文摘In industrial manufacturing,efficient surface defect detection is crucial for ensuring product quality and production safety.Traditional inspectionmethods are often slow,subjective,and prone to errors,while classicalmachine vision techniques strugglewith complex backgrounds and small defects.To address these challenges,this study proposes an improved YOLOv11 model for detecting defects on hot-rolled steel strips using the NEU-DET dataset.Three key improvements are introduced in the proposed model.First,a lightweight Guided Attention Feature Module(GAFM)is incorporated to enhance multi-scale feature fusion,allowing the model to better capture and integrate semantic and spatial information across different layers,which improves its ability to detect defects of varying sizes.Second,an Aggregated Attention(AA)mechanism is employed to strengthen the representation of critical defect features while effectively suppressing irrelevant background information,particularly enhancing the detection of small,low-contrast,or complex defects.Third,Ghost Dynamic Convolution(GDC)is applied to reduce computational cost by generating low-cost ghost features and dynamically reweighting convolutional kernels,enabling faster inference without sacrificing feature quality or detection accuracy.Extensive experiments demonstrate that the proposed model achieves a mean Average Precision(mAP)of 87.2%,compared to 81.5%for the baseline,while lowering computational cost from6.3Giga Floating-point Operations Per Second(GFLOPs)to 5.1 GFLOPs.These results indicate that the improved YOLOv11 is both accurate and computationally efficient,making it suitable for real-time industrial surface defect detection and contributing to the development of practical,high-performance inspection systems.
文摘Advanced traffic monitoring systems encounter substantial challenges in vehicle detection and classification due to the limitations of conventional methods,which often demand extensive computational resources and struggle with diverse data acquisition techniques.This research presents a novel approach for vehicle classification and recognition in aerial image sequences,integrating multiple advanced techniques to enhance detection accuracy.The proposed model begins with preprocessing using Multiscale Retinex(MSR)to enhance image quality,followed by Expectation-Maximization(EM)Segmentation for precise foreground object identification.Vehicle detection is performed using the state-of-the-art YOLOv10 framework,while feature extraction incorporates Maximally Stable Extremal Regions(MSER),Dense Scale-Invariant Feature Transform(Dense SIFT),and Zernike Moments Features to capture distinct object characteristics.Feature optimization is further refined through a Hybrid Swarm-based Optimization algorithm,ensuring optimal feature selection for improved classification performance.The final classification is conducted using a Vision Transformer,leveraging its robust learning capabilities for enhanced accuracy.Experimental evaluations on benchmark datasets,including UAVDT and the Unmanned Aerial Vehicle Intruder Dataset(UAVID),demonstrate the superiority of the proposed approach,achieving an accuracy of 94.40%on UAVDT and 93.57%on UAVID.The results highlight the efficacy of the model in significantly enhancing vehicle detection and classification in aerial imagery,outperforming existing methodologies and offering a statistically validated improvement for intelligent traffic monitoring systems compared to existing approaches.