Camouflaged Object Detection(COD)aims to identify objects that share highly similar patterns—such as texture,intensity,and color—with their surrounding environment.Due to their intrinsic resemblance to the backgroun...Camouflaged Object Detection(COD)aims to identify objects that share highly similar patterns—such as texture,intensity,and color—with their surrounding environment.Due to their intrinsic resemblance to the background,camouflaged objects often exhibit vague boundaries and varying scales,making it challenging to accurately locate targets and delineate their indistinct edges.To address this,we propose a novel camouflaged object detection network called Edge-Guided and Multi-scale Fusion Network(EGMFNet),which leverages edge-guided multi-scale integration for enhanced performance.The model incorporates two innovative components:a Multi-scale Fusion Module(MSFM)and an Edge-Guided Attention Module(EGA).These designs exploit multi-scale features to uncover subtle cues between candidate objects and the background while emphasizing camouflaged object boundaries.Moreover,recognizing the rich contextual information in fused features,we introduce a Dual-Branch Global Context Module(DGCM)to refine features using extensive global context,thereby generatingmore informative representations.Experimental results on four benchmark datasets demonstrate that EGMFNet outperforms state-of-the-art methods across five evaluation metrics.Specifically,on COD10K,our EGMFNet-P improves F_(β)by 4.8 points and reduces mean absolute error(MAE)by 0.006 compared with ZoomNeXt;on NC4K,it achieves a 3.6-point increase in F_(β).OnCAMO and CHAMELEON,it obtains 4.5-point increases in F_(β),respectively.These consistent gains substantiate the superiority and robustness of EGMFNet.展开更多
The continuous decrease in global fishery resources has increased the importance of precise and efficient underwater fish monitoring technology.First,this study proposes an improved underwater target detection framewo...The continuous decrease in global fishery resources has increased the importance of precise and efficient underwater fish monitoring technology.First,this study proposes an improved underwater target detection framework based on YOLOv8,with the aim of enhancing detection accuracy and the ability to recognize multi-scale targets in blurry and complex underwater environments.A streamlined Vision Transformer(ViT)model is used as the feature extraction backbone,which retains global self-attention feature extraction and accelerates training efficiency.In addition,a detection head named Dynamic Head(DyHead)is introduced,which enhances the efficiency of processing various target sizes through multi-scale feature fusion and adaptive attention modules.Furthermore,a dynamic loss function adjustment method called SlideLoss is employed.This method utilizes sliding window technology to adaptively adjust parameters,which optimizes the detection of challenging targets.The experimental results on the RUOD dataset show that the proposed improved model not only significantly enhances the accuracy of target detection but also increases the efficiency of target detection.展开更多
Defect detection in printed circuit boards(PCB)remains challenging due to the difficulty of identifying small-scale defects,the inefficiency of conventional approaches,and the interference from complex backgrounds.To ...Defect detection in printed circuit boards(PCB)remains challenging due to the difficulty of identifying small-scale defects,the inefficiency of conventional approaches,and the interference from complex backgrounds.To address these issues,this paper proposes SIM-Net,an enhanced detection framework derived from YOLOv11.The model integrates SPDConv to preserve fine-grained features for small object detection,introduces a novel convolutional partial attention module(C2PAM)to suppress redundant background information and highlight salient regions,and employs a multi-scale fusion network(MFN)with a multi-grain contextual module(MGCT)to strengthen contextual representation and accelerate inference.Experimental evaluations demonstrate that SIM-Net achieves 92.4%mAP,92%accuracy,and 89.4%recall with an inference speed of 75.1 FPS,outperforming existing state-of-the-art methods.These results confirm the robustness and real-time applicability of SIM-Net for PCB defect inspection.展开更多
Accurate and efficient detection of building changes in remote sensing imagery is crucial for urban planning,disaster emergency response,and resource management.However,existing methods face challenges such as spectra...Accurate and efficient detection of building changes in remote sensing imagery is crucial for urban planning,disaster emergency response,and resource management.However,existing methods face challenges such as spectral similarity between buildings and backgrounds,sensor variations,and insufficient computational efficiency.To address these challenges,this paper proposes a novel Multi-scale Efficient Wavelet-based Change Detection Network(MewCDNet),which integrates the advantages of Convolutional Neural Networks and Transformers,balances computational costs,and achieves high-performance building change detection.The network employs EfficientNet-B4 as the backbone for hierarchical feature extraction,integrates multi-level feature maps through a multi-scale fusion strategy,and incorporates two key modules:Cross-temporal Difference Detection(CTDD)and Cross-scale Wavelet Refinement(CSWR).CTDD adopts a dual-branch architecture that combines pixel-wise differencing with semanticaware Euclidean distance weighting to enhance the distinction between true changes and background noise.CSWR integrates Haar-based Discrete Wavelet Transform with multi-head cross-attention mechanisms,enabling cross-scale feature fusion while significantly improving edge localization and suppressing spurious changes.Extensive experiments on four benchmark datasets demonstrate MewCDNet’s superiority over comparison methods:achieving F1 scores of 91.54%on LEVIR,93.70%on WHUCD,and 64.96%on S2Looking for building change detection.Furthermore,MewCDNet exhibits optimal performance on the multi-class⋅SYSU dataset(F1:82.71%),highlighting its exceptional generalization capability.展开更多
Tomato is a major economic crop worldwide,and diseases on tomato leaves can significantly reduce both yield and quality.Traditional manual inspection is inefficient and highly subjective,making it difficult to meet th...Tomato is a major economic crop worldwide,and diseases on tomato leaves can significantly reduce both yield and quality.Traditional manual inspection is inefficient and highly subjective,making it difficult to meet the requirements of early disease identification in complex natural environments.To address this issue,this study proposes an improved YOLO11-based model,YOLO-SPDNet(Scale Sequence Fusion,Position-Channel Attention,and Dual Enhancement Network).The model integrates the SEAM(Self-Ensembling Attention Mechanism)semantic enhancement module,the MLCA(Mixed Local Channel Attention)lightweight attention mechanism,and the SPA(Scale-Position-Detail Awareness)module composed of SSFF(Scale Sequence Feature Fusion),TFE(Triple Feature Encoding),and CPAM(Channel and Position Attention Mechanism).These enhancements strengthen fine-grained lesion detection while maintaining model lightweightness.Experimental results show that YOLO-SPDNet achieves an accuracy of 91.8%,a recall of 86.5%,and an mAP@0.5 of 90.6%on the test set,with a computational complexity of 12.5 GFLOPs.Furthermore,the model reaches a real-time inference speed of 987 FPS,making it suitable for deployment on mobile agricultural terminals and online monitoring systems.Comparative analysis and ablation studies further validate the reliability and practical applicability of the proposed model in complex natural scenes.展开更多
Distributed Denial of Service(DDoS)attacks are one of the severe threats to network infrastructure,sometimes bypassing traditional diagnosis algorithms because of their evolving complexity.PresentMachine Learning(ML)t...Distributed Denial of Service(DDoS)attacks are one of the severe threats to network infrastructure,sometimes bypassing traditional diagnosis algorithms because of their evolving complexity.PresentMachine Learning(ML)techniques for DDoS attack diagnosis normally apply network traffic statistical features such as packet sizes and inter-arrival times.However,such techniques sometimes fail to capture complicated relations among various traffic flows.In this paper,we present a new multi-scale ensemble strategy given the Graph Neural Networks(GNNs)for improving DDoS detection.Our technique divides traffic into macro-and micro-level elements,letting various GNN models to get the two corase-scale anomalies and subtle,stealthy attack models.Through modeling network traffic as graph-structured data,GNNs efficiently learn intricate relations among network entities.The proposed ensemble learning algorithm combines the results of several GNNs to improve generalization,robustness,and scalability.Extensive experiments on three benchmark datasets—UNSW-NB15,CICIDS2017,and CICDDoS2019—show that our approach outperforms traditional machine learning and deep learning models in detecting both high-rate and low-rate(stealthy)DDoS attacks,with significant improvements in accuracy and recall.These findings demonstrate the suggested method’s applicability and robustness for real-world implementation in contexts where several DDoS patterns coexist.展开更多
With the rapid expansion of drone applications,accurate detection of objects in aerial imagery has become crucial for intelligent transportation,urban management,and emergency rescue missions.However,existing methods ...With the rapid expansion of drone applications,accurate detection of objects in aerial imagery has become crucial for intelligent transportation,urban management,and emergency rescue missions.However,existing methods face numerous challenges in practical deployment,including scale variation handling,feature degradation,and complex backgrounds.To address these issues,we propose Edge-enhanced and Detail-Capturing You Only Look Once(EHDC-YOLO),a novel framework for object detection in Unmanned Aerial Vehicle(UAV)imagery.Based on the You Only Look Once version 11 nano(YOLOv11n)baseline,EHDC-YOLO systematically introduces several architectural enhancements:(1)a Multi-Scale Edge Enhancement(MSEE)module that leverages multi-scale pooling and edge information to enhance boundary feature extraction;(2)an Enhanced Feature Pyramid Network(EFPN)that integrates P2-level features with Cross Stage Partial(CSP)structures and OmniKernel convolutions for better fine-grained representation;and(3)Dynamic Head(DyHead)with multi-dimensional attention mechanisms for enhanced cross-scale modeling and perspective adaptability.Comprehensive experiments on the Vision meets Drones for Detection(VisDrone-DET)2019 dataset demonstrate that EHDC-YOLO achieves significant improvements,increasing mean Average Precision(mAP)@0.5 from 33.2%to 46.1%(an absolute improvement of 12.9 percentage points)and mAP@0.5:0.95 from 19.5%to 28.0%(an absolute improvement of 8.5 percentage points)compared with the YOLOv11n baseline,while maintaining a reasonable parameter count(2.81 M vs the baseline’s 2.58 M).Further ablation studies confirm the effectiveness of each proposed component,while visualization results highlight EHDC-YOLO’s superior performance in detecting objects and handling occlusions in complex drone scenarios.展开更多
Impact craters are important for understanding the evolution of lunar geologic and surface erosion rates,among other functions.However,the morphological characteristics of these micro impact craters are not obvious an...Impact craters are important for understanding the evolution of lunar geologic and surface erosion rates,among other functions.However,the morphological characteristics of these micro impact craters are not obvious and they are numerous,resulting in low detection accuracy by deep learning models.Therefore,we proposed a new multi-scale fusion crater detection algorithm(MSF-CDA)based on the YOLO11 to improve the accuracy of lunar impact crater detection,especially for small craters with a diameter of<1 km.Using the images taken by the LROC(Lunar Reconnaissance Orbiter Camera)at the Chang’e-4(CE-4)landing area,we constructed three separate datasets for craters with diameters of 0-70 m,70-140 m,and>140 m.We then trained three submodels separately with these three datasets.Additionally,we designed a slicing-amplifying-slicing strategy to enhance the ability to extract features from small craters.To handle redundant predictions,we proposed a new Non-Maximum Suppression with Area Filtering method to fuse the results in overlapping targets within the multi-scale submodels.Finally,our new MSF-CDA method achieved high detection performance,with the Precision,Recall,and F1 score having values of 0.991,0.987,and 0.989,respectively,perfectly addressing the problems induced by the lesser features and sample imbalance of small craters.Our MSF-CDA can provide strong data support for more in-depth study of the geological evolution of the lunar surface and finer geological age estimations.This strategy can also be used to detect other small objects with lesser features and sample imbalance problems.We detected approximately 500,000 impact craters in an area of approximately 214 km2 around the CE-4 landing area.By statistically analyzing the new data,we updated the distribution function of the number and diameter of impact craters.Finally,we identified the most suitable lighting conditions for detecting impact crater targets by analyzing the effect of different lighting conditions on the detection accuracy.展开更多
An improved model based on you only look once version 8(YOLOv8)is proposed to solve the problem of low detection accuracy due to the diversity of object sizes in optical remote sensing images.Firstly,the feature pyram...An improved model based on you only look once version 8(YOLOv8)is proposed to solve the problem of low detection accuracy due to the diversity of object sizes in optical remote sensing images.Firstly,the feature pyramid network(FPN)structure of the original YOLOv8 mode is replaced by the generalized-FPN(GFPN)structure in GiraffeDet to realize the"cross-layer"and"cross-scale"adaptive feature fusion,to enrich the semantic information and spatial information on the feature map to improve the target detection ability of the model.Secondly,a pyramid-pool module of multi atrous spatial pyramid pooling(MASPP)is designed by using the idea of atrous convolution and feature pyramid structure to extract multi-scale features,so as to improve the processing ability of the model for multi-scale objects.The experimental results show that the detection accuracy of the improved YOLOv8 model on DIOR dataset is 92%and mean average precision(mAP)is 87.9%,respectively 3.5%and 1.7%higher than those of the original model.It is proved the detection and classification ability of the proposed model on multi-dimensional optical remote sensing target has been improved.展开更多
Underwater target detection is extensively applied in domains such as underwater search and rescue,environmental monitoring,and marine resource surveys.It is crucial in enabling autonomous underwater robot operations ...Underwater target detection is extensively applied in domains such as underwater search and rescue,environmental monitoring,and marine resource surveys.It is crucial in enabling autonomous underwater robot operations and promoting ocean exploration.Nevertheless,low imaging quality,harsh underwater environments,and obscured objects considerably increase the difficulty of detecting underwater targets,making it difficult for current detection methods to achieve optimal performance.In order to enhance underwater object perception and improve target detection precision,we propose a lightweight underwater target detection method using You Only Look Once(YOLO)v8 with multi-scale cross-channel attention(MSCCA),named YOLOv8-UOD.In the proposed multiscale cross-channel attention module,multi-scale attention(MSA)augments the variety of attentional perception by extracting information from innately diverse sensory fields.The cross-channel strategy utilizes RepVGGbased channel shuffling(RCS)and one-shot aggregation(OSA)to rearrange feature map channels according to specific rules.It aggregates all features only once in the final feature mapping,resulting in the extraction of more comprehensive and valuable feature information.The experimental results show that the proposed YOLOv8-UOD achieves a mAP50 of 95.67%and FLOPs of 23.8 G on the Underwater Robot Picking Contest 2017(URPC2017)dataset,outperforming other methods in terms of detection precision and computational cost-efficiency.展开更多
To address the challenges of small target detection and significant scale variations in unmanned aerial vehicle(UAV)aerial imagery,which often lead to missed and false detections,we propose Multi-scale Feature Fusion ...To address the challenges of small target detection and significant scale variations in unmanned aerial vehicle(UAV)aerial imagery,which often lead to missed and false detections,we propose Multi-scale Feature Fusion YOLO(MFF-YOLO),an enhanced algorithm based on YOLOv8s.Our approach introduces a Multi-scale Feature Fusion Strategy(MFFS),comprising the Multiple Features C2f(MFC)module and the Scale Sequence Feature Fusion(SSFF)module,to improve feature integration across different network levels.This enables more effective capture of fine-grained details and sequential multi-scale features.Furthermore,we incorporate Inner-CIoU,an improved loss function that uses auxiliary bounding boxes to enhance the regression quality of small object boxes.To ensure practicality for UAV deployment,we apply the Layer-adaptive Magnitude-based pruning(LAMP)method to significantly reduce model size and computational cost.Experiments on the VisDrone2019 dataset show that MFF-YOLO achieves a 5.7% increase in mean average precision(mAP)over the baseline,while reducing parameters by 8.5 million and computation by 17.5%.The results demonstrate that our method effectively improves detection performance in UAV aerial scenarios.展开更多
Within the domain of Intelligent Group Systems(IGSs),this paper develops a resourceaware multitarget Constant False Alarm Rate(CFAR)detection framework for multisite MIMO radar systems.It underscores the necessity of ...Within the domain of Intelligent Group Systems(IGSs),this paper develops a resourceaware multitarget Constant False Alarm Rate(CFAR)detection framework for multisite MIMO radar systems.It underscores the necessity of managing finite transmit and receive antennas and transmit power systematically to enhance detection performance.To tackle the multidimensional resource optimization challenge,we introduce a Cooperative Transmit-Receive Antenna Selection and Power Allocation(CTRSPA)strategy.It employs a perception-action cycle that incorporates uncertain external support information to optimize worst-case detection performance with multiple targets.First,we derive a closed-form expression that incorporates uncertainty for the noncoherent integration squared-law detection probability using the Neyman-Pearson criterion.Subsequently,a joint optimization model for antenna selection and power allocation in CFAR detection is formulated,incorporating practical radar resource constraints.Mathematically,this represents an NPhard problem involving coupled continuous and Boolean variables.We propose a three-stage method—Reformulation,Node Picker,and Convex Power Allocation—that capitalizes on the independent convexity of the optimization model for each variable,ensuring a near-optimal result.Simulations confirm the approach's effectiveness,efficiency,and timeliness,particularly for large-scale radar networks,and reveal the impact of threat levels,system layout,and detection parameters on resource allocation.展开更多
Transportation systems are experiencing a significant transformation due to the integration of advanced technologies, including artificial intelligence and machine learning. In the context of intelligent transportatio...Transportation systems are experiencing a significant transformation due to the integration of advanced technologies, including artificial intelligence and machine learning. In the context of intelligent transportation systems (ITS) and Advanced Driver Assistance Systems (ADAS), the development of efficient and reliable traffic light detection mechanisms is crucial for enhancing road safety and traffic management. This paper presents an optimized convolutional neural network (CNN) framework designed to detect traffic lights in real-time within complex urban environments. Leveraging multi-scale pyramid feature maps, the proposed model addresses key challenges such as the detection of small, occluded, and low-resolution traffic lights amidst complex backgrounds. The integration of dilated convolutions, Region of Interest (ROI) alignment, and Soft Non-Maximum Suppression (Soft-NMS) further improves detection accuracy and reduces false positives. By optimizing computational efficiency and parameter complexity, the framework is designed to operate seamlessly on embedded systems, ensuring robust performance in real-world applications. Extensive experiments using real-world datasets demonstrate that our model significantly outperforms existing methods, providing a scalable solution for ITS and ADAS applications. This research contributes to the advancement of Artificial Intelligence-driven (AI-driven) pattern recognition in transportation systems and offers a mathematical approach to improving efficiency and safety in logistics and transportation networks.展开更多
Detecting abnormal cervical cells is crucial for early identification and timely treatment of cervical cancer.However,this task is challenging due to the morphological similarities between abnormal and normal cells an...Detecting abnormal cervical cells is crucial for early identification and timely treatment of cervical cancer.However,this task is challenging due to the morphological similarities between abnormal and normal cells and the significant variations in cell size.Pathologists often refer to surrounding cells to identify abnormalities.To emulate this slide examination behavior,this study proposes a Multi-Scale Feature Fusion Network(MSFF-Net)for detecting cervical abnormal cells.MSFF-Net employs a Cross-Scale Pooling Model(CSPM)to effectively capture diverse features and contextual information,ranging from local details to the overall structure.Additionally,a Multi-Scale Fusion Attention(MSFA)module is introduced to mitigate the impact of cell size variations by adaptively fusing local and global information at different scales.To handle the complex environment of cervical cell images,such as cell adhesion and overlapping,the Inner-CIoU loss function is utilized to more precisely measure the overlap between bounding boxes,thereby improving detection accuracy in such scenarios.Experimental results on the Comparison detector dataset demonstrate that MSFF-Net achieves a mean average precision(mAP)of 63.2%,outperforming state-of-the-art methods while maintaining a relatively small number of parameters(26.8 M).This study highlights the effectiveness of multi-scale feature fusion in enhancing the detection of cervical abnormal cells,contributing to more accurate and efficient cervical cancer screening.展开更多
With the rapid growth of socialmedia,the spread of fake news has become a growing problem,misleading the public and causing significant harm.As social media content is often composed of both images and text,the use of...With the rapid growth of socialmedia,the spread of fake news has become a growing problem,misleading the public and causing significant harm.As social media content is often composed of both images and text,the use of multimodal approaches for fake news detection has gained significant attention.To solve the problems existing in previous multi-modal fake news detection algorithms,such as insufficient feature extraction and insufficient use of semantic relations between modes,this paper proposes the MFFFND-Co(Multimodal Feature Fusion Fake News Detection with Co-Attention Block)model.First,the model deeply explores the textual content,image content,and frequency domain features.Then,it employs a Co-Attention mechanism for cross-modal fusion.Additionally,a semantic consistency detectionmodule is designed to quantify semantic deviations,thereby enhancing the performance of fake news detection.Experimentally verified on two commonly used datasets,Twitter and Weibo,the model achieved F1 scores of 90.0% and 94.0%,respectively,significantly outperforming the pre-modified MFFFND(Multimodal Feature Fusion Fake News Detection with Attention Block)model and surpassing other baseline models.This improves the accuracy of detecting fake information in artificial intelligence detection and engineering software detection.展开更多
Focused on the task of fast and accurate armored target detection in ground battlefield,a detection method based on multi-scale representation network(MS-RN) and shape-fixed Guided Anchor(SF-GA)scheme is proposed.Firs...Focused on the task of fast and accurate armored target detection in ground battlefield,a detection method based on multi-scale representation network(MS-RN) and shape-fixed Guided Anchor(SF-GA)scheme is proposed.Firstly,considering the large-scale variation and camouflage of armored target,a new MS-RN integrating contextual information in battlefield environment is designed.The MS-RN extracts deep features from templates with different scales and strengthens the detection ability of small targets.Armored targets of different sizes are detected on different representation features.Secondly,aiming at the accuracy and real-time detection requirements,improved shape-fixed Guided Anchor is used on feature maps of different scales to recommend regions of interests(ROIs).Different from sliding or random anchor,the SF-GA can filter out 80% of the regions while still improving the recall.A special detection dataset for armored target,named Armored Target Dataset(ARTD),is constructed,based on which the comparable experiments with state-of-art detection methods are conducted.Experimental results show that the proposed method achieves outstanding performance in detection accuracy and efficiency,especially when small armored targets are involved.展开更多
In the field of smart agriculture,accurate and efficient object detection technology is crucial for automated crop management.A particularly challenging task in this domain is small object detection,such as the identi...In the field of smart agriculture,accurate and efficient object detection technology is crucial for automated crop management.A particularly challenging task in this domain is small object detection,such as the identification of immature fruits or early stage disease spots.These objects pose significant difficulties due to their small pixel coverage,limited feature information,substantial scale variations,and high susceptibility to complex background interference.These challenges frequently result in inadequate accuracy and robustness in current detection models.This study addresses two critical needs in the cashew cultivation industry—fruitmaturity and anthracnose detection—by proposing an improved YOLOv11-NSDDil model.The method introduces three key technological innovations:(1)The SDDil module is designed and integrated into the backbone network.This module combines depthwise separable convolution with the SimAM attention mechanism to expand the receptive field and enhance contextual semantic capture at a low computational cost,effectively alleviating the feature deficiency problem caused by limited pixel coverage of small objects.Simultaneously,the SDmodule dynamically enhances discriminative features and suppresses background noise,significantly improving the model’s feature discrimination capability in complex environments;(2)The introduction of the DynamicScalSeq-Zoom_cat neck network,significantly improving multi-scale feature fusion;and(3)The optimization of the Minimum Point Distance Intersection over Union(MPDIoU)loss function,which enhances bounding box localization accuracy byminimizing vertex distance.Experimental results on a self-constructed cashew dataset containing 1123 images demonstrate significant performance improvements in the enhanced model:mAP50 reaches 0.825,a 4.6% increase compared to the originalYOLOv11;mAP50-95 improves to 0.624,a 6.5% increase;and recall rises to 0.777,a 2.4%increase.This provides a reliable technical solution for intelligent quality inspection of agricultural products and holds broad application prospects.展开更多
The application of deep learning for target detection in aerial images captured by Unmanned Aerial Vehicles(UAV)has emerged as a prominent research focus.Due to the considerable distance between UAVs and the photograp...The application of deep learning for target detection in aerial images captured by Unmanned Aerial Vehicles(UAV)has emerged as a prominent research focus.Due to the considerable distance between UAVs and the photographed objects,coupled with complex shooting environments,existing models often struggle to achieve accurate real-time target detection.In this paper,a You Only Look Once v8(YOLOv8)model is modified from four aspects:the detection head,the up-sampling module,the feature extraction module,and the parameter optimization of positive sample screening,and the YOLO-S3DT model is proposed to improve the performance of the model for detecting small targets in aerial images.Experimental results show that all detection indexes of the proposed model are significantly improved without increasing the number of model parameters and with the limited growth of computation.Moreover,this model also has the best performance compared to other detecting models,demonstrating its advancement within this category of tasks.展开更多
Aiming at the scale adaptation of automatic driving target detection algorithms in low illumination environments and the shortcomings in target occlusion processing,this paper proposes a YOLO-LKSDS automatic driving d...Aiming at the scale adaptation of automatic driving target detection algorithms in low illumination environments and the shortcomings in target occlusion processing,this paper proposes a YOLO-LKSDS automatic driving detection model.Firstly,the Contrast-Limited Adaptive Histogram Equalisation(CLAHE)image enhancement algorithm is improved to increase the image contrast and enhance the detailed features of the target;then,on the basis of the YOLOv5 model,the Kmeans++clustering algorithm is introduced to obtain a suitable anchor frame,and SPPELAN spatial pyramid pooling is improved to enhance the accuracy and robustness of the model for multi-scale target detection.Finally,an improved SEAM(Separated and Enhancement Attention Module)attention mechanism is combined with the DIOU-NMS algorithm to optimize the model’s performance when dealing with occlusion and dense scenes.Compared with the original model,the improved YOLO-LKSDS model achieves a 13.3%improvement in accuracy,a 1.7%improvement in mAP,and 240,000 fewer parameters on the BDD100K dataset.In order to validate the generalization of the improved algorithm,we selected the KITTI dataset for experimentation,which shows that YOLOv5’s accuracy improves by 21.1%,recall by 36.6%,and mAP50 by 29.5%,respectively,on the KITTI dataset.The deployment of this paper’s algorithm is verified by an edge computing platform,where the average speed of detection reaches 24.4 FPS while power consumption remains below 9 W,demonstrating high real-time capability and energy efficiency.展开更多
To solve the false detection and missed detection problems caused by various types and sizes of defects in the detection of steel surface defects,similar defects and background features,and similarities between differ...To solve the false detection and missed detection problems caused by various types and sizes of defects in the detection of steel surface defects,similar defects and background features,and similarities between different defects,this paper proposes a lightweight detection model named multiscale edge and squeeze-and-excitation attention detection network(MSESE),which is built upon the You Only Look Once version 11 nano(YOLOv11n).To address the difficulty of locating defect edges,we first propose an edge enhancement module(EEM),apply it to the process of multiscale feature extraction,and then propose a multiscale edge enhancement module(MSEEM).By obtaining defect features from different scales and enhancing their edge contours,the module uses the dual-domain selection mechanism to effectively focus on the important areas in the image to ensure that the feature images have richer information and clearer contour features.By fusing the squeeze-and-excitation attention mechanism with the EEM,we obtain a lighter module that can enhance the representation of edge features,which is named the edge enhancement module with squeeze-and-excitation attention(EEMSE).This module was subsequently integrated into the detection head.The enhanced detection head achieves improved edge feature enhancement with reduced computational overhead,while effectively adjusting channel-wise importance and further refining feature representation.Experiments on the NEU-DET dataset show that,compared with the original YOLOv11n,the improved model achieves improvements of 4.1%and 2.2%in terms of mAP@0.5 and mAP@0.5:0.95,respectively,and the GFLOPs value decreases from the original value of 6.4 to 6.2.Furthermore,when compared to current mainstream models,Mamba-YOLOT and RTDETR-R34,our method achieves superior performance with 6.5%and 8.9%higher mAP@0.5,respectively,while maintaining a more compact parameter footprint.These results collectively validate the effectiveness and efficiency of our proposed approach.展开更多
基金financially supported byChongqingUniversity of Technology Graduate Innovation Foundation(Grant No.gzlcx20253267).
文摘Camouflaged Object Detection(COD)aims to identify objects that share highly similar patterns—such as texture,intensity,and color—with their surrounding environment.Due to their intrinsic resemblance to the background,camouflaged objects often exhibit vague boundaries and varying scales,making it challenging to accurately locate targets and delineate their indistinct edges.To address this,we propose a novel camouflaged object detection network called Edge-Guided and Multi-scale Fusion Network(EGMFNet),which leverages edge-guided multi-scale integration for enhanced performance.The model incorporates two innovative components:a Multi-scale Fusion Module(MSFM)and an Edge-Guided Attention Module(EGA).These designs exploit multi-scale features to uncover subtle cues between candidate objects and the background while emphasizing camouflaged object boundaries.Moreover,recognizing the rich contextual information in fused features,we introduce a Dual-Branch Global Context Module(DGCM)to refine features using extensive global context,thereby generatingmore informative representations.Experimental results on four benchmark datasets demonstrate that EGMFNet outperforms state-of-the-art methods across five evaluation metrics.Specifically,on COD10K,our EGMFNet-P improves F_(β)by 4.8 points and reduces mean absolute error(MAE)by 0.006 compared with ZoomNeXt;on NC4K,it achieves a 3.6-point increase in F_(β).OnCAMO and CHAMELEON,it obtains 4.5-point increases in F_(β),respectively.These consistent gains substantiate the superiority and robustness of EGMFNet.
基金supported by the National Natural Science Foundation of China(No.52106080)the Jilin City Science and Technology Innovation Development Plan Project(No.20240302014)+2 种基金the Jilin Provincial Department of Education Science and Technology Research Project(No.JJKH20230135K)the Jilin Province Science and Technology Development Plan Project(No.YDZJ202401640ZYTS)the Northeast Electric Power University Teaching Reform Research Project(No.J2427)。
文摘The continuous decrease in global fishery resources has increased the importance of precise and efficient underwater fish monitoring technology.First,this study proposes an improved underwater target detection framework based on YOLOv8,with the aim of enhancing detection accuracy and the ability to recognize multi-scale targets in blurry and complex underwater environments.A streamlined Vision Transformer(ViT)model is used as the feature extraction backbone,which retains global self-attention feature extraction and accelerates training efficiency.In addition,a detection head named Dynamic Head(DyHead)is introduced,which enhances the efficiency of processing various target sizes through multi-scale feature fusion and adaptive attention modules.Furthermore,a dynamic loss function adjustment method called SlideLoss is employed.This method utilizes sliding window technology to adaptively adjust parameters,which optimizes the detection of challenging targets.The experimental results on the RUOD dataset show that the proposed improved model not only significantly enhances the accuracy of target detection but also increases the efficiency of target detection.
文摘Defect detection in printed circuit boards(PCB)remains challenging due to the difficulty of identifying small-scale defects,the inefficiency of conventional approaches,and the interference from complex backgrounds.To address these issues,this paper proposes SIM-Net,an enhanced detection framework derived from YOLOv11.The model integrates SPDConv to preserve fine-grained features for small object detection,introduces a novel convolutional partial attention module(C2PAM)to suppress redundant background information and highlight salient regions,and employs a multi-scale fusion network(MFN)with a multi-grain contextual module(MGCT)to strengthen contextual representation and accelerate inference.Experimental evaluations demonstrate that SIM-Net achieves 92.4%mAP,92%accuracy,and 89.4%recall with an inference speed of 75.1 FPS,outperforming existing state-of-the-art methods.These results confirm the robustness and real-time applicability of SIM-Net for PCB defect inspection.
基金supported by the Henan Province Key R&D Project under Grant 241111210400the Henan Provincial Science and Technology Research Project under Grants 252102211047,252102211062,252102211055 and 232102210069+2 种基金the Jiangsu Provincial Scheme Double Initiative Plan JSS-CBS20230474,the XJTLU RDF-21-02-008the Science and Technology Innovation Project of Zhengzhou University of Light Industry under Grant 23XNKJTD0205the Higher Education Teaching Reform Research and Practice Project of Henan Province under Grant 2024SJGLX0126。
文摘Accurate and efficient detection of building changes in remote sensing imagery is crucial for urban planning,disaster emergency response,and resource management.However,existing methods face challenges such as spectral similarity between buildings and backgrounds,sensor variations,and insufficient computational efficiency.To address these challenges,this paper proposes a novel Multi-scale Efficient Wavelet-based Change Detection Network(MewCDNet),which integrates the advantages of Convolutional Neural Networks and Transformers,balances computational costs,and achieves high-performance building change detection.The network employs EfficientNet-B4 as the backbone for hierarchical feature extraction,integrates multi-level feature maps through a multi-scale fusion strategy,and incorporates two key modules:Cross-temporal Difference Detection(CTDD)and Cross-scale Wavelet Refinement(CSWR).CTDD adopts a dual-branch architecture that combines pixel-wise differencing with semanticaware Euclidean distance weighting to enhance the distinction between true changes and background noise.CSWR integrates Haar-based Discrete Wavelet Transform with multi-head cross-attention mechanisms,enabling cross-scale feature fusion while significantly improving edge localization and suppressing spurious changes.Extensive experiments on four benchmark datasets demonstrate MewCDNet’s superiority over comparison methods:achieving F1 scores of 91.54%on LEVIR,93.70%on WHUCD,and 64.96%on S2Looking for building change detection.Furthermore,MewCDNet exhibits optimal performance on the multi-class⋅SYSU dataset(F1:82.71%),highlighting its exceptional generalization capability.
基金Tianmin Tianyuan Boutique Vegetable Industry Technology Service Station(Grant No.2024120011003081)Development of Environmental Monitoring and Traceability System for Wuqing Agricultural Production Areas(Grant No.2024120011001866)。
文摘Tomato is a major economic crop worldwide,and diseases on tomato leaves can significantly reduce both yield and quality.Traditional manual inspection is inefficient and highly subjective,making it difficult to meet the requirements of early disease identification in complex natural environments.To address this issue,this study proposes an improved YOLO11-based model,YOLO-SPDNet(Scale Sequence Fusion,Position-Channel Attention,and Dual Enhancement Network).The model integrates the SEAM(Self-Ensembling Attention Mechanism)semantic enhancement module,the MLCA(Mixed Local Channel Attention)lightweight attention mechanism,and the SPA(Scale-Position-Detail Awareness)module composed of SSFF(Scale Sequence Feature Fusion),TFE(Triple Feature Encoding),and CPAM(Channel and Position Attention Mechanism).These enhancements strengthen fine-grained lesion detection while maintaining model lightweightness.Experimental results show that YOLO-SPDNet achieves an accuracy of 91.8%,a recall of 86.5%,and an mAP@0.5 of 90.6%on the test set,with a computational complexity of 12.5 GFLOPs.Furthermore,the model reaches a real-time inference speed of 987 FPS,making it suitable for deployment on mobile agricultural terminals and online monitoring systems.Comparative analysis and ablation studies further validate the reliability and practical applicability of the proposed model in complex natural scenes.
文摘Distributed Denial of Service(DDoS)attacks are one of the severe threats to network infrastructure,sometimes bypassing traditional diagnosis algorithms because of their evolving complexity.PresentMachine Learning(ML)techniques for DDoS attack diagnosis normally apply network traffic statistical features such as packet sizes and inter-arrival times.However,such techniques sometimes fail to capture complicated relations among various traffic flows.In this paper,we present a new multi-scale ensemble strategy given the Graph Neural Networks(GNNs)for improving DDoS detection.Our technique divides traffic into macro-and micro-level elements,letting various GNN models to get the two corase-scale anomalies and subtle,stealthy attack models.Through modeling network traffic as graph-structured data,GNNs efficiently learn intricate relations among network entities.The proposed ensemble learning algorithm combines the results of several GNNs to improve generalization,robustness,and scalability.Extensive experiments on three benchmark datasets—UNSW-NB15,CICIDS2017,and CICDDoS2019—show that our approach outperforms traditional machine learning and deep learning models in detecting both high-rate and low-rate(stealthy)DDoS attacks,with significant improvements in accuracy and recall.These findings demonstrate the suggested method’s applicability and robustness for real-world implementation in contexts where several DDoS patterns coexist.
文摘With the rapid expansion of drone applications,accurate detection of objects in aerial imagery has become crucial for intelligent transportation,urban management,and emergency rescue missions.However,existing methods face numerous challenges in practical deployment,including scale variation handling,feature degradation,and complex backgrounds.To address these issues,we propose Edge-enhanced and Detail-Capturing You Only Look Once(EHDC-YOLO),a novel framework for object detection in Unmanned Aerial Vehicle(UAV)imagery.Based on the You Only Look Once version 11 nano(YOLOv11n)baseline,EHDC-YOLO systematically introduces several architectural enhancements:(1)a Multi-Scale Edge Enhancement(MSEE)module that leverages multi-scale pooling and edge information to enhance boundary feature extraction;(2)an Enhanced Feature Pyramid Network(EFPN)that integrates P2-level features with Cross Stage Partial(CSP)structures and OmniKernel convolutions for better fine-grained representation;and(3)Dynamic Head(DyHead)with multi-dimensional attention mechanisms for enhanced cross-scale modeling and perspective adaptability.Comprehensive experiments on the Vision meets Drones for Detection(VisDrone-DET)2019 dataset demonstrate that EHDC-YOLO achieves significant improvements,increasing mean Average Precision(mAP)@0.5 from 33.2%to 46.1%(an absolute improvement of 12.9 percentage points)and mAP@0.5:0.95 from 19.5%to 28.0%(an absolute improvement of 8.5 percentage points)compared with the YOLOv11n baseline,while maintaining a reasonable parameter count(2.81 M vs the baseline’s 2.58 M).Further ablation studies confirm the effectiveness of each proposed component,while visualization results highlight EHDC-YOLO’s superior performance in detecting objects and handling occlusions in complex drone scenarios.
基金the National Key Research and Development Program of China (Grant No.2022YFF0711400)the National Space Science Data Center Youth Open Project (Grant No. NSSDC2302001)
文摘Impact craters are important for understanding the evolution of lunar geologic and surface erosion rates,among other functions.However,the morphological characteristics of these micro impact craters are not obvious and they are numerous,resulting in low detection accuracy by deep learning models.Therefore,we proposed a new multi-scale fusion crater detection algorithm(MSF-CDA)based on the YOLO11 to improve the accuracy of lunar impact crater detection,especially for small craters with a diameter of<1 km.Using the images taken by the LROC(Lunar Reconnaissance Orbiter Camera)at the Chang’e-4(CE-4)landing area,we constructed three separate datasets for craters with diameters of 0-70 m,70-140 m,and>140 m.We then trained three submodels separately with these three datasets.Additionally,we designed a slicing-amplifying-slicing strategy to enhance the ability to extract features from small craters.To handle redundant predictions,we proposed a new Non-Maximum Suppression with Area Filtering method to fuse the results in overlapping targets within the multi-scale submodels.Finally,our new MSF-CDA method achieved high detection performance,with the Precision,Recall,and F1 score having values of 0.991,0.987,and 0.989,respectively,perfectly addressing the problems induced by the lesser features and sample imbalance of small craters.Our MSF-CDA can provide strong data support for more in-depth study of the geological evolution of the lunar surface and finer geological age estimations.This strategy can also be used to detect other small objects with lesser features and sample imbalance problems.We detected approximately 500,000 impact craters in an area of approximately 214 km2 around the CE-4 landing area.By statistically analyzing the new data,we updated the distribution function of the number and diameter of impact craters.Finally,we identified the most suitable lighting conditions for detecting impact crater targets by analyzing the effect of different lighting conditions on the detection accuracy.
基金supported by the National Natural Science Foundation of China(No.62241109)the Tianjin Science and Technology Commissioner Project(No.20YDTPJC01110)。
文摘An improved model based on you only look once version 8(YOLOv8)is proposed to solve the problem of low detection accuracy due to the diversity of object sizes in optical remote sensing images.Firstly,the feature pyramid network(FPN)structure of the original YOLOv8 mode is replaced by the generalized-FPN(GFPN)structure in GiraffeDet to realize the"cross-layer"and"cross-scale"adaptive feature fusion,to enrich the semantic information and spatial information on the feature map to improve the target detection ability of the model.Secondly,a pyramid-pool module of multi atrous spatial pyramid pooling(MASPP)is designed by using the idea of atrous convolution and feature pyramid structure to extract multi-scale features,so as to improve the processing ability of the model for multi-scale objects.The experimental results show that the detection accuracy of the improved YOLOv8 model on DIOR dataset is 92%and mean average precision(mAP)is 87.9%,respectively 3.5%and 1.7%higher than those of the original model.It is proved the detection and classification ability of the proposed model on multi-dimensional optical remote sensing target has been improved.
基金supported in part by the National Natural Science Foundation of China Grants 62402085,61972062,62306060the Liaoning Doctoral Research Start-Up Fund 2023-BS-078+1 种基金the Dalian Youth Science and Technology Star Project 2023RQ023the Liaoning Basic Research Project 2023JH2/101300191.
文摘Underwater target detection is extensively applied in domains such as underwater search and rescue,environmental monitoring,and marine resource surveys.It is crucial in enabling autonomous underwater robot operations and promoting ocean exploration.Nevertheless,low imaging quality,harsh underwater environments,and obscured objects considerably increase the difficulty of detecting underwater targets,making it difficult for current detection methods to achieve optimal performance.In order to enhance underwater object perception and improve target detection precision,we propose a lightweight underwater target detection method using You Only Look Once(YOLO)v8 with multi-scale cross-channel attention(MSCCA),named YOLOv8-UOD.In the proposed multiscale cross-channel attention module,multi-scale attention(MSA)augments the variety of attentional perception by extracting information from innately diverse sensory fields.The cross-channel strategy utilizes RepVGGbased channel shuffling(RCS)and one-shot aggregation(OSA)to rearrange feature map channels according to specific rules.It aggregates all features only once in the final feature mapping,resulting in the extraction of more comprehensive and valuable feature information.The experimental results show that the proposed YOLOv8-UOD achieves a mAP50 of 95.67%and FLOPs of 23.8 G on the Underwater Robot Picking Contest 2017(URPC2017)dataset,outperforming other methods in terms of detection precision and computational cost-efficiency.
基金supported by the National Natural Science Foundation of China(No.61976028).
文摘To address the challenges of small target detection and significant scale variations in unmanned aerial vehicle(UAV)aerial imagery,which often lead to missed and false detections,we propose Multi-scale Feature Fusion YOLO(MFF-YOLO),an enhanced algorithm based on YOLOv8s.Our approach introduces a Multi-scale Feature Fusion Strategy(MFFS),comprising the Multiple Features C2f(MFC)module and the Scale Sequence Feature Fusion(SSFF)module,to improve feature integration across different network levels.This enables more effective capture of fine-grained details and sequential multi-scale features.Furthermore,we incorporate Inner-CIoU,an improved loss function that uses auxiliary bounding boxes to enhance the regression quality of small object boxes.To ensure practicality for UAV deployment,we apply the Layer-adaptive Magnitude-based pruning(LAMP)method to significantly reduce model size and computational cost.Experiments on the VisDrone2019 dataset show that MFF-YOLO achieves a 5.7% increase in mean average precision(mAP)over the baseline,while reducing parameters by 8.5 million and computation by 17.5%.The results demonstrate that our method effectively improves detection performance in UAV aerial scenarios.
基金supported by the National Natural Science Foundation of China(Nos.62071482 and 62471348)the Shaanxi Association of Science and Technology Youth Talent Support Program Project,China(No.20230137)+1 种基金the Innovative Talents Cultivate Program for Technology Innovation Team of Shaanxi Province,China(No.2024RS-CXTD-08)the Youth Innovation Team of Shaanxi Universities,China。
文摘Within the domain of Intelligent Group Systems(IGSs),this paper develops a resourceaware multitarget Constant False Alarm Rate(CFAR)detection framework for multisite MIMO radar systems.It underscores the necessity of managing finite transmit and receive antennas and transmit power systematically to enhance detection performance.To tackle the multidimensional resource optimization challenge,we introduce a Cooperative Transmit-Receive Antenna Selection and Power Allocation(CTRSPA)strategy.It employs a perception-action cycle that incorporates uncertain external support information to optimize worst-case detection performance with multiple targets.First,we derive a closed-form expression that incorporates uncertainty for the noncoherent integration squared-law detection probability using the Neyman-Pearson criterion.Subsequently,a joint optimization model for antenna selection and power allocation in CFAR detection is formulated,incorporating practical radar resource constraints.Mathematically,this represents an NPhard problem involving coupled continuous and Boolean variables.We propose a three-stage method—Reformulation,Node Picker,and Convex Power Allocation—that capitalizes on the independent convexity of the optimization model for each variable,ensuring a near-optimal result.Simulations confirm the approach's effectiveness,efficiency,and timeliness,particularly for large-scale radar networks,and reveal the impact of threat levels,system layout,and detection parameters on resource allocation.
基金funded by the Deanship of Scientific Research at Northern Border University,Arar,Saudi Arabia through research group No.(RG-NBU-2022-1234).
文摘Transportation systems are experiencing a significant transformation due to the integration of advanced technologies, including artificial intelligence and machine learning. In the context of intelligent transportation systems (ITS) and Advanced Driver Assistance Systems (ADAS), the development of efficient and reliable traffic light detection mechanisms is crucial for enhancing road safety and traffic management. This paper presents an optimized convolutional neural network (CNN) framework designed to detect traffic lights in real-time within complex urban environments. Leveraging multi-scale pyramid feature maps, the proposed model addresses key challenges such as the detection of small, occluded, and low-resolution traffic lights amidst complex backgrounds. The integration of dilated convolutions, Region of Interest (ROI) alignment, and Soft Non-Maximum Suppression (Soft-NMS) further improves detection accuracy and reduces false positives. By optimizing computational efficiency and parameter complexity, the framework is designed to operate seamlessly on embedded systems, ensuring robust performance in real-world applications. Extensive experiments using real-world datasets demonstrate that our model significantly outperforms existing methods, providing a scalable solution for ITS and ADAS applications. This research contributes to the advancement of Artificial Intelligence-driven (AI-driven) pattern recognition in transportation systems and offers a mathematical approach to improving efficiency and safety in logistics and transportation networks.
基金funded by the China Chongqing Municipal Science and Technology Bureau,grant numbers 2024TIAD-CYKJCXX0121,2024NSCQ-LZX0135Chongqing Municipal Commission of Housing and Urban-Rural Development,grant number CKZ2024-87+3 种基金the Chongqing University of Technology graduate education high-quality development project,grant number gzlsz202401the Chongqing University of Technology-Chongqing LINGLUE Technology Co.,Ltd.,Electronic Information(Artificial Intelligence)graduate joint training basethe Postgraduate Education and Teaching Reform Research Project in Chongqing,grant number yjg213116the Chongqing University of Technology-CISDI Chongqing Information Technology Co.,Ltd.,Computer Technology graduate joint training base.
文摘Detecting abnormal cervical cells is crucial for early identification and timely treatment of cervical cancer.However,this task is challenging due to the morphological similarities between abnormal and normal cells and the significant variations in cell size.Pathologists often refer to surrounding cells to identify abnormalities.To emulate this slide examination behavior,this study proposes a Multi-Scale Feature Fusion Network(MSFF-Net)for detecting cervical abnormal cells.MSFF-Net employs a Cross-Scale Pooling Model(CSPM)to effectively capture diverse features and contextual information,ranging from local details to the overall structure.Additionally,a Multi-Scale Fusion Attention(MSFA)module is introduced to mitigate the impact of cell size variations by adaptively fusing local and global information at different scales.To handle the complex environment of cervical cell images,such as cell adhesion and overlapping,the Inner-CIoU loss function is utilized to more precisely measure the overlap between bounding boxes,thereby improving detection accuracy in such scenarios.Experimental results on the Comparison detector dataset demonstrate that MSFF-Net achieves a mean average precision(mAP)of 63.2%,outperforming state-of-the-art methods while maintaining a relatively small number of parameters(26.8 M).This study highlights the effectiveness of multi-scale feature fusion in enhancing the detection of cervical abnormal cells,contributing to more accurate and efficient cervical cancer screening.
基金supported by Communication University of China(HG23035)partly supported by the Fundamental Research Funds for the Central Universities(CUC230A013).
文摘With the rapid growth of socialmedia,the spread of fake news has become a growing problem,misleading the public and causing significant harm.As social media content is often composed of both images and text,the use of multimodal approaches for fake news detection has gained significant attention.To solve the problems existing in previous multi-modal fake news detection algorithms,such as insufficient feature extraction and insufficient use of semantic relations between modes,this paper proposes the MFFFND-Co(Multimodal Feature Fusion Fake News Detection with Co-Attention Block)model.First,the model deeply explores the textual content,image content,and frequency domain features.Then,it employs a Co-Attention mechanism for cross-modal fusion.Additionally,a semantic consistency detectionmodule is designed to quantify semantic deviations,thereby enhancing the performance of fake news detection.Experimentally verified on two commonly used datasets,Twitter and Weibo,the model achieved F1 scores of 90.0% and 94.0%,respectively,significantly outperforming the pre-modified MFFFND(Multimodal Feature Fusion Fake News Detection with Attention Block)model and surpassing other baseline models.This improves the accuracy of detecting fake information in artificial intelligence detection and engineering software detection.
基金supported by the National Key Research and Development Program of China under grant 2016YFC0802904National Natural Science Foundation of China under grant61671470the Postdoctoral Science Foundation Funded Project of China under grant 2017M623423。
文摘Focused on the task of fast and accurate armored target detection in ground battlefield,a detection method based on multi-scale representation network(MS-RN) and shape-fixed Guided Anchor(SF-GA)scheme is proposed.Firstly,considering the large-scale variation and camouflage of armored target,a new MS-RN integrating contextual information in battlefield environment is designed.The MS-RN extracts deep features from templates with different scales and strengthens the detection ability of small targets.Armored targets of different sizes are detected on different representation features.Secondly,aiming at the accuracy and real-time detection requirements,improved shape-fixed Guided Anchor is used on feature maps of different scales to recommend regions of interests(ROIs).Different from sliding or random anchor,the SF-GA can filter out 80% of the regions while still improving the recall.A special detection dataset for armored target,named Armored Target Dataset(ARTD),is constructed,based on which the comparable experiments with state-of-art detection methods are conducted.Experimental results show that the proposed method achieves outstanding performance in detection accuracy and efficiency,especially when small armored targets are involved.
基金supported by Hebei North University Doctoral Research Fund Project(No.BSJJ202315)the Youth Research Fund Project of Higher Education Institutions in Hebei Province(No.QN2024146).
文摘In the field of smart agriculture,accurate and efficient object detection technology is crucial for automated crop management.A particularly challenging task in this domain is small object detection,such as the identification of immature fruits or early stage disease spots.These objects pose significant difficulties due to their small pixel coverage,limited feature information,substantial scale variations,and high susceptibility to complex background interference.These challenges frequently result in inadequate accuracy and robustness in current detection models.This study addresses two critical needs in the cashew cultivation industry—fruitmaturity and anthracnose detection—by proposing an improved YOLOv11-NSDDil model.The method introduces three key technological innovations:(1)The SDDil module is designed and integrated into the backbone network.This module combines depthwise separable convolution with the SimAM attention mechanism to expand the receptive field and enhance contextual semantic capture at a low computational cost,effectively alleviating the feature deficiency problem caused by limited pixel coverage of small objects.Simultaneously,the SDmodule dynamically enhances discriminative features and suppresses background noise,significantly improving the model’s feature discrimination capability in complex environments;(2)The introduction of the DynamicScalSeq-Zoom_cat neck network,significantly improving multi-scale feature fusion;and(3)The optimization of the Minimum Point Distance Intersection over Union(MPDIoU)loss function,which enhances bounding box localization accuracy byminimizing vertex distance.Experimental results on a self-constructed cashew dataset containing 1123 images demonstrate significant performance improvements in the enhanced model:mAP50 reaches 0.825,a 4.6% increase compared to the originalYOLOv11;mAP50-95 improves to 0.624,a 6.5% increase;and recall rises to 0.777,a 2.4%increase.This provides a reliable technical solution for intelligent quality inspection of agricultural products and holds broad application prospects.
文摘The application of deep learning for target detection in aerial images captured by Unmanned Aerial Vehicles(UAV)has emerged as a prominent research focus.Due to the considerable distance between UAVs and the photographed objects,coupled with complex shooting environments,existing models often struggle to achieve accurate real-time target detection.In this paper,a You Only Look Once v8(YOLOv8)model is modified from four aspects:the detection head,the up-sampling module,the feature extraction module,and the parameter optimization of positive sample screening,and the YOLO-S3DT model is proposed to improve the performance of the model for detecting small targets in aerial images.Experimental results show that all detection indexes of the proposed model are significantly improved without increasing the number of model parameters and with the limited growth of computation.Moreover,this model also has the best performance compared to other detecting models,demonstrating its advancement within this category of tasks.
基金supported by the Key R&D Program of Shaanxi Province(No.2025CYYBXM-078).
文摘Aiming at the scale adaptation of automatic driving target detection algorithms in low illumination environments and the shortcomings in target occlusion processing,this paper proposes a YOLO-LKSDS automatic driving detection model.Firstly,the Contrast-Limited Adaptive Histogram Equalisation(CLAHE)image enhancement algorithm is improved to increase the image contrast and enhance the detailed features of the target;then,on the basis of the YOLOv5 model,the Kmeans++clustering algorithm is introduced to obtain a suitable anchor frame,and SPPELAN spatial pyramid pooling is improved to enhance the accuracy and robustness of the model for multi-scale target detection.Finally,an improved SEAM(Separated and Enhancement Attention Module)attention mechanism is combined with the DIOU-NMS algorithm to optimize the model’s performance when dealing with occlusion and dense scenes.Compared with the original model,the improved YOLO-LKSDS model achieves a 13.3%improvement in accuracy,a 1.7%improvement in mAP,and 240,000 fewer parameters on the BDD100K dataset.In order to validate the generalization of the improved algorithm,we selected the KITTI dataset for experimentation,which shows that YOLOv5’s accuracy improves by 21.1%,recall by 36.6%,and mAP50 by 29.5%,respectively,on the KITTI dataset.The deployment of this paper’s algorithm is verified by an edge computing platform,where the average speed of detection reaches 24.4 FPS while power consumption remains below 9 W,demonstrating high real-time capability and energy efficiency.
基金funded by Ministry of Education Humanities and Social Science Research Project,grant number 23YJAZH034The Postgraduate Research and Practice Innovation Program of Jiangsu Province,grant number SJCX25_17National Computer Basic Education Research Project in Higher Education Institutions,grant number 2024-AFCEC-056,2024-AFCEC-057.
文摘To solve the false detection and missed detection problems caused by various types and sizes of defects in the detection of steel surface defects,similar defects and background features,and similarities between different defects,this paper proposes a lightweight detection model named multiscale edge and squeeze-and-excitation attention detection network(MSESE),which is built upon the You Only Look Once version 11 nano(YOLOv11n).To address the difficulty of locating defect edges,we first propose an edge enhancement module(EEM),apply it to the process of multiscale feature extraction,and then propose a multiscale edge enhancement module(MSEEM).By obtaining defect features from different scales and enhancing their edge contours,the module uses the dual-domain selection mechanism to effectively focus on the important areas in the image to ensure that the feature images have richer information and clearer contour features.By fusing the squeeze-and-excitation attention mechanism with the EEM,we obtain a lighter module that can enhance the representation of edge features,which is named the edge enhancement module with squeeze-and-excitation attention(EEMSE).This module was subsequently integrated into the detection head.The enhanced detection head achieves improved edge feature enhancement with reduced computational overhead,while effectively adjusting channel-wise importance and further refining feature representation.Experiments on the NEU-DET dataset show that,compared with the original YOLOv11n,the improved model achieves improvements of 4.1%and 2.2%in terms of mAP@0.5 and mAP@0.5:0.95,respectively,and the GFLOPs value decreases from the original value of 6.4 to 6.2.Furthermore,when compared to current mainstream models,Mamba-YOLOT and RTDETR-R34,our method achieves superior performance with 6.5%and 8.9%higher mAP@0.5,respectively,while maintaining a more compact parameter footprint.These results collectively validate the effectiveness and efficiency of our proposed approach.