Drone-based small object detection is of great significance in practical applications such as military actions, disaster rescue, transportation, etc. However, the severe scale differences in objects captured by drones...Drone-based small object detection is of great significance in practical applications such as military actions, disaster rescue, transportation, etc. However, the severe scale differences in objects captured by drones and lack of detail information for small-scale objects make drone-based small object detection a formidable challenge. To address these issues, we first develop a mathematical model to explore how changing receptive fields impacts the polynomial fitting results. Subsequently, based on the obtained conclusions, we propose a simple but effective Hybrid Receptive Field Network (HRFNet), whose modules include Hybrid Feature Augmentation (HFA), Hybrid Feature Pyramid (HFP) and Dual Scale Head (DSH). Specifically, HFA employs parallel dilated convolution kernels of different sizes to extend shallow features with different receptive fields, committed to improving the multi-scale adaptability of the network;HFP enhances the perception of small objects by capturing contextual information across layers, while DSH reconstructs the original prediction head utilizing a set of high-resolution features and ultrahigh-resolution features. In addition, in order to train HRFNet, the corresponding dual-scale loss function is designed. Finally, comprehensive evaluation results on public benchmarks such as VisDrone-DET and TinyPerson demonstrate the robustness of the proposed method. Most impressively, the proposed HRFNet achieves a mAP of 51.0 on VisDrone-DET with 29.3 M parameters, which outperforms the extant state-of-the-art detectors. HRFNet also performs excellently in complex scenarios captured by drones, achieving the best performance on the CS-Drone dataset we built.展开更多
Aiming at the problems of low detection accuracy and large model size of existing object detection algorithms applied to complex road scenes,an improved you only look once version 8(YOLOv8)object detection algorithm f...Aiming at the problems of low detection accuracy and large model size of existing object detection algorithms applied to complex road scenes,an improved you only look once version 8(YOLOv8)object detection algorithm for infrared images,F-YOLOv8,is proposed.First,a spatial-to-depth network replaces the traditional backbone network's strided convolution or pooling layer.At the same time,it combines with the channel attention mechanism so that the neural network focuses on the channels with large weight values to better extract low-resolution image feature information;then an improved feature pyramid network of lightweight bidirectional feature pyramid network(L-BiFPN)is proposed,which can efficiently fuse features of different scales.In addition,a loss function of insertion of union based on the minimum point distance(MPDIoU)is introduced for bounding box regression,which obtains faster convergence speed and more accurate regression results.Experimental results on the FLIR dataset show that the improved algorithm can accurately detect infrared road targets in real time with 3%and 2.2%enhancement in mean average precision at 50%IoU(mAP50)and mean average precision at 50%—95%IoU(mAP50-95),respectively,and 38.1%,37.3%and 16.9%reduction in the number of model parameters,the model weight,and floating-point operations per second(FLOPs),respectively.To further demonstrate the detection capability of the improved algorithm,it is tested on the public dataset PASCAL VOC,and the results show that F-YOLO has excellent generalized detection performance.展开更多
Object detection plays a critical role in drone imagery analysis,especially in remote sensing applications where accurate and efficient detection of small objects is essential.Despite significant advancements in drone...Object detection plays a critical role in drone imagery analysis,especially in remote sensing applications where accurate and efficient detection of small objects is essential.Despite significant advancements in drone imagery detection,most models still struggle with small object detection due to challenges such as object size,complex backgrounds.To address these issues,we propose a robust detection model based on You Only Look Once(YOLO)that balances accuracy and efficiency.The model mainly contains several major innovation:feature selection pyramid network,Inner-Shape Intersection over Union(ISIoU)loss function and small object detection head.To overcome the limitations of traditional fusion methods in handling multi-level features,we introduce a Feature Selection Pyramid Network integrated into the Neck component,which preserves shallow feature details critical for detecting small objects.Additionally,recognizing that deep network structures often neglect or degrade small object features,we design a specialized small object detection head in the shallow layers to enhance detection accuracy for these challenging targets.To effectively model both local and global dependencies,we introduce a Conv-Former module that simulates Transformer mechanisms using a convolutional structure,thereby improving feature enhancement.Furthermore,we employ ISIoU to address object imbalance and scale variation This approach accelerates model conver-gence and improves regression accuracy.Experimental results show that,compared to the baseline model,the proposed method significantly improves small object detection performance on the VisDrone2019 dataset,with mAP@50 increasing by 4.9%and mAP@50-95 rising by 6.7%.This model also outperforms other state-of-the-art algorithms,demonstrating its reliability and effectiveness in both small object detection and remote sensing image fusion tasks.展开更多
Visible-infrared object detection leverages the day-night stable object perception capability of infrared images to enhance detection robustness in low-light environments by fusing the complementary information of vis...Visible-infrared object detection leverages the day-night stable object perception capability of infrared images to enhance detection robustness in low-light environments by fusing the complementary information of visible and infrared images.However,the inherent differences in the imaging mechanisms of visible and infrared modalities make effective cross-modal fusion challenging.Furthermore,constrained by the physical characteristics of sensors and thermal diffusion effects,infrared images generally suffer from blurred object contours and missing details,making it difficult to extract object features effectively.To address these issues,we propose an infrared-visible image fusion network that realizesmultimodal information fusion of infrared and visible images through a carefully designedmultiscale fusion strategy.First,we design an adaptive gray-radiance enhancement(AGRE)module to strengthen the detail representation in infrared images,improving their usability in complex lighting scenarios.Next,we introduce a channelspatial feature interaction(CSFI)module,which achieves efficient complementarity between the RGB and infrared(IR)modalities via dynamic channel switching and a spatial attention mechanism.Finally,we propose a multi-scale enhanced cross-attention fusion(MSECA)module,which optimizes the fusion ofmulti-level features through dynamic convolution and gating mechanisms and captures long-range complementary relationships of cross-modal features on a global scale,thereby enhancing the expressiveness of the fused features.Experiments on the KAIST,M3FD,and FLIR datasets demonstrate that our method delivers outstanding performance in daytime and nighttime scenarios.On the KAIST dataset,the miss rate drops to 5.99%,and further to 4.26% in night scenes.On the FLIR and M3FD datasets,it achieves AP50 scores of 79.4% and 88.9%,respectively.展开更多
Accurate detection of small objects is critically important in high-stakes applications such as military reconnaissance and emergency rescue.However,low resolution,occlusion,and background interference make small obje...Accurate detection of small objects is critically important in high-stakes applications such as military reconnaissance and emergency rescue.However,low resolution,occlusion,and background interference make small object detection a complex and demanding task.One effective approach to overcome these issues is the integration of multimodal image data to enhance detection capabilities.This paper proposes a novel small object detection method that utilizes three types of multimodal image combinations,such as Hyperspectral-Multispectral(HSMS),Hyperspectral-Synthetic Aperture Radar(HS-SAR),and HS-SAR-Digital Surface Model(HS-SAR-DSM).The detection process is done by the proposed Jaccard Deep Q-Net(JDQN),which integrates the Jaccard similarity measure with a Deep Q-Network(DQN)using regression modeling.To produce the final output,a Deep Maxout Network(DMN)is employed to fuse the detection results obtained from each modality.The effectiveness of the proposed JDQN is validated using performance metrics,such as accuracy,Mean Squared Error(MSE),precision,and Root Mean Squared Error(RMSE).Experimental results demonstrate that the proposed JDQN method outperforms existing approaches,achieving the highest accuracy of 0.907,a precision of 0.904,the lowest normalized MSE of 0.279,and a normalized RMSE of 0.528.展开更多
Currently,challenges such as small object size and occlusion lead to a lack of accuracy and robustness in small object detection.Since small objects occupy only a few pixels in an image,the extracted features are limi...Currently,challenges such as small object size and occlusion lead to a lack of accuracy and robustness in small object detection.Since small objects occupy only a few pixels in an image,the extracted features are limited,and mainstream downsampling convolution operations further exacerbate feature loss.Additionally,due to the occlusionprone nature of small objects and their higher sensitivity to localization deviations,conventional Intersection over Union(IoU)loss functions struggle to achieve stable convergence.To address these limitations,LR-Net is proposed for small object detection.Specifically,the proposed Lossless Feature Fusion(LFF)method transfers spatial features into the channel domain while leveraging a hybrid attentionmechanism to focus on critical features,mitigating feature loss caused by downsampling.Furthermore,RSIoU is proposed to enhance the convergence performance of IoU-based losses for small objects.RSIoU corrects the inherent convergence direction issues in SIoU and proposes a penalty term as a Dynamic Focusing Mechanism parameter,enabling it to dynamically emphasize the loss contribution of small object samples.Ultimately,RSIoU significantly improves the convergence performance of the loss function for small objects,particularly under occlusion scenarios.Experiments demonstrate that LR-Net achieves significant improvements across variousmetrics onmultiple datasets compared with YOLOv8n,achieving a 3.7% increase in mean Average Precision(AP)on the VisDrone2019 dataset,along with improvements of 3.3% on the AI-TOD dataset and 1.2% on the COCO dataset.展开更多
Real-time detection of surface defects on cables is crucial for ensuring the safe operation of power systems.However,existing methods struggle with small target sizes,complex backgrounds,low-quality image acquisition,...Real-time detection of surface defects on cables is crucial for ensuring the safe operation of power systems.However,existing methods struggle with small target sizes,complex backgrounds,low-quality image acquisition,and interference from contamination.To address these challenges,this paper proposes the Real-time Cable Defect Detection Network(RC2DNet),which achieves an optimal balance between detection accuracy and computational efficiency.Unlike conventional approaches,RC2DNet introduces a small object feature extraction module that enhances the semantic representation of small targets through feature pyramids,multi-level feature fusion,and an adaptive weighting mechanism.Additionally,a boundary feature enhancement module is designed,incorporating boundary-aware convolution,a novel boundary attention mechanism,and an improved loss function to significantly enhance boundary localization accuracy.Experimental results demonstrate that RC2DNet outperforms state-of-the-art methods in precision,recall,F1-score,mean Intersection over Union(mIoU),and frame rate,enabling real-time and highly accurate cable defect detection in complex backgrounds.展开更多
When detecting objects in Unmanned Aerial Vehicle(UAV)taken images,large number of objects and high proportion of small objects bring huge challenges for detection algorithms based on the You Only Look Once(YOLO)frame...When detecting objects in Unmanned Aerial Vehicle(UAV)taken images,large number of objects and high proportion of small objects bring huge challenges for detection algorithms based on the You Only Look Once(YOLO)framework,rendering them challenging to deal with tasks that demand high precision.To address these problems,this paper proposes a high-precision object detection algorithm based on YOLOv10s.Firstly,a Multi-branch Enhancement Coordinate Attention(MECA)module is proposed to enhance feature extraction capability.Secondly,a Multilayer Feature Reconstruction(MFR)mechanism is designed to fully exploit multilayer features,which can enrich object information as well as remove redundant information.Finally,an MFR Path Aggregation Network(MFR-Neck)is constructed,which integrates multi-scale features to improve the network's ability to perceive objects of var-ying sizes.The experimental results demonstrate that the proposed algorithm increases the average detection accuracy by 14.15%on the Vis Drone dataset compared to YOLOv10s,effectively enhancing object detection precision in UAV-taken images.展开更多
To overcome the obstacles of poor feature extraction and little prior information on the appearance of infrared dim small targets,we propose a multi-domain attention-guided pyramid network(MAGPNet).Specifically,we des...To overcome the obstacles of poor feature extraction and little prior information on the appearance of infrared dim small targets,we propose a multi-domain attention-guided pyramid network(MAGPNet).Specifically,we design three modules to ensure that salient features of small targets can be acquired and retained in the multi-scale feature maps.To improve the adaptability of the network for targets of different sizes,we design a kernel aggregation attention block with a receptive field attention branch and weight the feature maps under different perceptual fields with attention mechanism.Based on the research on human vision system,we further propose an adaptive local contrast measure module to enhance the local features of infrared small targets.With this parameterized component,we can implement the information aggregation of multi-scale contrast saliency maps.Finally,to fully utilize the information within spatial and channel domains in feature maps of different scales,we propose the mixed spatial-channel attention-guided fusion module to achieve high-quality fusion effects while ensuring that the small target features can be preserved at deep layers.Experiments on public datasets demonstrate that our MAGPNet can achieve a better performance over other state-of-the-art methods in terms of the intersection of union,Precision,Recall,and F-measure.In addition,we conduct detailed ablation studies to verify the effectiveness of each component in our network.展开更多
A novel dual-branch decoding fusion convolutional neural network model(DDFNet)specifically designed for real-time salient object detection(SOD)on steel surfaces is proposed.DDFNet is based on a standard encoder–decod...A novel dual-branch decoding fusion convolutional neural network model(DDFNet)specifically designed for real-time salient object detection(SOD)on steel surfaces is proposed.DDFNet is based on a standard encoder–decoder architecture.DDFNet integrates three key innovations:first,we introduce a novel,lightweight multi-scale progressive aggregation residual network that effectively suppresses background interference and refines defect details,enabling efficient salient feature extraction.Then,we propose an innovative dual-branch decoding fusion structure,comprising the refined defect representation branch and the enhanced defect representation branch,which enhance accuracy in defect region identification and feature representation.Additionally,to further improve the detection of small and complex defects,we incorporate a multi-scale attention fusion module.Experimental results on the public ESDIs-SOD dataset show that DDFNet,with only 3.69 million parameters,achieves detection performance comparable to current state-of-the-art models,demonstrating its potential for real-time industrial applications.Furthermore,our DDFNet-L variant consistently outperforms leading methods in detection performance.The code is available at https://github.com/13140W/DDFNet.展开更多
The increasing prevalence of violent incidents in public spaces has created an urgent need for intelligent surveillance systems capable of detecting dangerous objects in real time.While traditional video surveillance ...The increasing prevalence of violent incidents in public spaces has created an urgent need for intelligent surveillance systems capable of detecting dangerous objects in real time.While traditional video surveillance relies on human monitoring,this approach suffers from limitations such as fatigue and delayed response times.This study addresses these challenges by developing an automated detection system using advanced deep learning techniques to enhance public safety.Our approach leverages state-of-the-art convolutional neural networks(CNNs),specifically You Only Look Once version 4(YOLOv4)and EfficientDet,for real-time object detection.The system was trained on a comprehensive dataset of over 50,000 images,enhanced through data augmentation techniques to improve robustness across varying lighting conditions and viewing angles.Cloud-based deployment on Amazon Web Services(AWS)ensured scalability and efficient processing.Experimental evaluations demonstrated high performance,with YOLOv4 achieving 92%accuracy and processing images in 0.45 s,while EfficientDet reached 93%accuracy with a slightly longer processing time of 0.55 s per image.Field tests in high-traffic environments such as train stations and shopping malls confirmed the system’s reliability,with a false alarm rate of only 4.5%.The integration of automatic alerts enabled rapid security responses to potential threats.The proposed CNN-based system provides an effective solution for real-time detection of dangerous objects in video surveillance,significantly improving response times and public safety.While YOLOv4 proved more suitable for speed-critical applications,EfficientDet offered marginally better accuracy.Future work will focus on optimizing the system for low-light conditions and further reducing false positives.This research contributes to the advancement of AI-driven surveillance technologies,offering a scalable framework adaptable to various security scenarios.展开更多
基金supported by the National Natural Science Foundation of China(Nos.62276204 and 62203343)the Fundamental Research Funds for the Central Universities(No.YJSJ24011)+1 种基金the Natural Science Basic Research Program of Shanxi,China(Nos.2022JM-340 and 2023-JC-QN-0710)the China Postdoctoral Science Foundation(Nos.2020T130494 and 2018M633470).
文摘Drone-based small object detection is of great significance in practical applications such as military actions, disaster rescue, transportation, etc. However, the severe scale differences in objects captured by drones and lack of detail information for small-scale objects make drone-based small object detection a formidable challenge. To address these issues, we first develop a mathematical model to explore how changing receptive fields impacts the polynomial fitting results. Subsequently, based on the obtained conclusions, we propose a simple but effective Hybrid Receptive Field Network (HRFNet), whose modules include Hybrid Feature Augmentation (HFA), Hybrid Feature Pyramid (HFP) and Dual Scale Head (DSH). Specifically, HFA employs parallel dilated convolution kernels of different sizes to extend shallow features with different receptive fields, committed to improving the multi-scale adaptability of the network;HFP enhances the perception of small objects by capturing contextual information across layers, while DSH reconstructs the original prediction head utilizing a set of high-resolution features and ultrahigh-resolution features. In addition, in order to train HRFNet, the corresponding dual-scale loss function is designed. Finally, comprehensive evaluation results on public benchmarks such as VisDrone-DET and TinyPerson demonstrate the robustness of the proposed method. Most impressively, the proposed HRFNet achieves a mAP of 51.0 on VisDrone-DET with 29.3 M parameters, which outperforms the extant state-of-the-art detectors. HRFNet also performs excellently in complex scenarios captured by drones, achieving the best performance on the CS-Drone dataset we built.
基金supported by the National Natural Science Foundation of China(No.62103298)。
文摘Aiming at the problems of low detection accuracy and large model size of existing object detection algorithms applied to complex road scenes,an improved you only look once version 8(YOLOv8)object detection algorithm for infrared images,F-YOLOv8,is proposed.First,a spatial-to-depth network replaces the traditional backbone network's strided convolution or pooling layer.At the same time,it combines with the channel attention mechanism so that the neural network focuses on the channels with large weight values to better extract low-resolution image feature information;then an improved feature pyramid network of lightweight bidirectional feature pyramid network(L-BiFPN)is proposed,which can efficiently fuse features of different scales.In addition,a loss function of insertion of union based on the minimum point distance(MPDIoU)is introduced for bounding box regression,which obtains faster convergence speed and more accurate regression results.Experimental results on the FLIR dataset show that the improved algorithm can accurately detect infrared road targets in real time with 3%and 2.2%enhancement in mean average precision at 50%IoU(mAP50)and mean average precision at 50%—95%IoU(mAP50-95),respectively,and 38.1%,37.3%and 16.9%reduction in the number of model parameters,the model weight,and floating-point operations per second(FLOPs),respectively.To further demonstrate the detection capability of the improved algorithm,it is tested on the public dataset PASCAL VOC,and the results show that F-YOLO has excellent generalized detection performance.
文摘Object detection plays a critical role in drone imagery analysis,especially in remote sensing applications where accurate and efficient detection of small objects is essential.Despite significant advancements in drone imagery detection,most models still struggle with small object detection due to challenges such as object size,complex backgrounds.To address these issues,we propose a robust detection model based on You Only Look Once(YOLO)that balances accuracy and efficiency.The model mainly contains several major innovation:feature selection pyramid network,Inner-Shape Intersection over Union(ISIoU)loss function and small object detection head.To overcome the limitations of traditional fusion methods in handling multi-level features,we introduce a Feature Selection Pyramid Network integrated into the Neck component,which preserves shallow feature details critical for detecting small objects.Additionally,recognizing that deep network structures often neglect or degrade small object features,we design a specialized small object detection head in the shallow layers to enhance detection accuracy for these challenging targets.To effectively model both local and global dependencies,we introduce a Conv-Former module that simulates Transformer mechanisms using a convolutional structure,thereby improving feature enhancement.Furthermore,we employ ISIoU to address object imbalance and scale variation This approach accelerates model conver-gence and improves regression accuracy.Experimental results show that,compared to the baseline model,the proposed method significantly improves small object detection performance on the VisDrone2019 dataset,with mAP@50 increasing by 4.9%and mAP@50-95 rising by 6.7%.This model also outperforms other state-of-the-art algorithms,demonstrating its reliability and effectiveness in both small object detection and remote sensing image fusion tasks.
基金supported by the National Natural Science Foundation of China(Grant No.62302086)the Natural Science Foundation of Liaoning Province(Grant No.2023-MSBA-070)the Fundamental Research Funds for the Central Universities(Grant No.N2317005).
文摘Visible-infrared object detection leverages the day-night stable object perception capability of infrared images to enhance detection robustness in low-light environments by fusing the complementary information of visible and infrared images.However,the inherent differences in the imaging mechanisms of visible and infrared modalities make effective cross-modal fusion challenging.Furthermore,constrained by the physical characteristics of sensors and thermal diffusion effects,infrared images generally suffer from blurred object contours and missing details,making it difficult to extract object features effectively.To address these issues,we propose an infrared-visible image fusion network that realizesmultimodal information fusion of infrared and visible images through a carefully designedmultiscale fusion strategy.First,we design an adaptive gray-radiance enhancement(AGRE)module to strengthen the detail representation in infrared images,improving their usability in complex lighting scenarios.Next,we introduce a channelspatial feature interaction(CSFI)module,which achieves efficient complementarity between the RGB and infrared(IR)modalities via dynamic channel switching and a spatial attention mechanism.Finally,we propose a multi-scale enhanced cross-attention fusion(MSECA)module,which optimizes the fusion ofmulti-level features through dynamic convolution and gating mechanisms and captures long-range complementary relationships of cross-modal features on a global scale,thereby enhancing the expressiveness of the fused features.Experiments on the KAIST,M3FD,and FLIR datasets demonstrate that our method delivers outstanding performance in daytime and nighttime scenarios.On the KAIST dataset,the miss rate drops to 5.99%,and further to 4.26% in night scenes.On the FLIR and M3FD datasets,it achieves AP50 scores of 79.4% and 88.9%,respectively.
文摘Accurate detection of small objects is critically important in high-stakes applications such as military reconnaissance and emergency rescue.However,low resolution,occlusion,and background interference make small object detection a complex and demanding task.One effective approach to overcome these issues is the integration of multimodal image data to enhance detection capabilities.This paper proposes a novel small object detection method that utilizes three types of multimodal image combinations,such as Hyperspectral-Multispectral(HSMS),Hyperspectral-Synthetic Aperture Radar(HS-SAR),and HS-SAR-Digital Surface Model(HS-SAR-DSM).The detection process is done by the proposed Jaccard Deep Q-Net(JDQN),which integrates the Jaccard similarity measure with a Deep Q-Network(DQN)using regression modeling.To produce the final output,a Deep Maxout Network(DMN)is employed to fuse the detection results obtained from each modality.The effectiveness of the proposed JDQN is validated using performance metrics,such as accuracy,Mean Squared Error(MSE),precision,and Root Mean Squared Error(RMSE).Experimental results demonstrate that the proposed JDQN method outperforms existing approaches,achieving the highest accuracy of 0.907,a precision of 0.904,the lowest normalized MSE of 0.279,and a normalized RMSE of 0.528.
基金supported by Chongqing Municipal Commission of Housing and Urban-Rural Development(Grant No.CKZ2024-87)China Chongqing Municipal Science and Technology Bureau(Grant No.2024TIAD-CYKJCXX0121).
文摘Currently,challenges such as small object size and occlusion lead to a lack of accuracy and robustness in small object detection.Since small objects occupy only a few pixels in an image,the extracted features are limited,and mainstream downsampling convolution operations further exacerbate feature loss.Additionally,due to the occlusionprone nature of small objects and their higher sensitivity to localization deviations,conventional Intersection over Union(IoU)loss functions struggle to achieve stable convergence.To address these limitations,LR-Net is proposed for small object detection.Specifically,the proposed Lossless Feature Fusion(LFF)method transfers spatial features into the channel domain while leveraging a hybrid attentionmechanism to focus on critical features,mitigating feature loss caused by downsampling.Furthermore,RSIoU is proposed to enhance the convergence performance of IoU-based losses for small objects.RSIoU corrects the inherent convergence direction issues in SIoU and proposes a penalty term as a Dynamic Focusing Mechanism parameter,enabling it to dynamically emphasize the loss contribution of small object samples.Ultimately,RSIoU significantly improves the convergence performance of the loss function for small objects,particularly under occlusion scenarios.Experiments demonstrate that LR-Net achieves significant improvements across variousmetrics onmultiple datasets compared with YOLOv8n,achieving a 3.7% increase in mean Average Precision(AP)on the VisDrone2019 dataset,along with improvements of 3.3% on the AI-TOD dataset and 1.2% on the COCO dataset.
基金supported by the National Natural Science Foundation of China under Grant 62306128the Basic Science Research Project of Jiangsu Provincial Department of Education under Grant 23KJD520003the Leading Innovation Project of Changzhou Science and Technology Bureau under Grant CQ20230072.
文摘Real-time detection of surface defects on cables is crucial for ensuring the safe operation of power systems.However,existing methods struggle with small target sizes,complex backgrounds,low-quality image acquisition,and interference from contamination.To address these challenges,this paper proposes the Real-time Cable Defect Detection Network(RC2DNet),which achieves an optimal balance between detection accuracy and computational efficiency.Unlike conventional approaches,RC2DNet introduces a small object feature extraction module that enhances the semantic representation of small targets through feature pyramids,multi-level feature fusion,and an adaptive weighting mechanism.Additionally,a boundary feature enhancement module is designed,incorporating boundary-aware convolution,a novel boundary attention mechanism,and an improved loss function to significantly enhance boundary localization accuracy.Experimental results demonstrate that RC2DNet outperforms state-of-the-art methods in precision,recall,F1-score,mean Intersection over Union(mIoU),and frame rate,enabling real-time and highly accurate cable defect detection in complex backgrounds.
基金co-supported by the National Natural Science Foundation of China(No.62103190)the Natural Science Foundation of Jiangsu Province,China(No.BK20230923)。
文摘When detecting objects in Unmanned Aerial Vehicle(UAV)taken images,large number of objects and high proportion of small objects bring huge challenges for detection algorithms based on the You Only Look Once(YOLO)framework,rendering them challenging to deal with tasks that demand high precision.To address these problems,this paper proposes a high-precision object detection algorithm based on YOLOv10s.Firstly,a Multi-branch Enhancement Coordinate Attention(MECA)module is proposed to enhance feature extraction capability.Secondly,a Multilayer Feature Reconstruction(MFR)mechanism is designed to fully exploit multilayer features,which can enrich object information as well as remove redundant information.Finally,an MFR Path Aggregation Network(MFR-Neck)is constructed,which integrates multi-scale features to improve the network's ability to perceive objects of var-ying sizes.The experimental results demonstrate that the proposed algorithm increases the average detection accuracy by 14.15%on the Vis Drone dataset compared to YOLOv10s,effectively enhancing object detection precision in UAV-taken images.
基金the Industry-University-Research Cooperation Fund Project of the Eighth Research Institute of China Aerospace Science and Technology Corporation(No.USCAST2021-5)。
文摘To overcome the obstacles of poor feature extraction and little prior information on the appearance of infrared dim small targets,we propose a multi-domain attention-guided pyramid network(MAGPNet).Specifically,we design three modules to ensure that salient features of small targets can be acquired and retained in the multi-scale feature maps.To improve the adaptability of the network for targets of different sizes,we design a kernel aggregation attention block with a receptive field attention branch and weight the feature maps under different perceptual fields with attention mechanism.Based on the research on human vision system,we further propose an adaptive local contrast measure module to enhance the local features of infrared small targets.With this parameterized component,we can implement the information aggregation of multi-scale contrast saliency maps.Finally,to fully utilize the information within spatial and channel domains in feature maps of different scales,we propose the mixed spatial-channel attention-guided fusion module to achieve high-quality fusion effects while ensuring that the small target features can be preserved at deep layers.Experiments on public datasets demonstrate that our MAGPNet can achieve a better performance over other state-of-the-art methods in terms of the intersection of union,Precision,Recall,and F-measure.In addition,we conduct detailed ablation studies to verify the effectiveness of each component in our network.
基金supported in part by the National Key R&D Program of China(Grant No.2023YFB3307604)the Shanxi Province Basic Research Program Youth Science Research Project(Grant Nos.202303021212054 and 202303021212046)+3 种基金the Key Projects Supported by Hebei Natural Science Foundation(Grant No.E2024203125)the National Science Foundation of China(Grant No.52105391)the Hebei Provincial Science and Technology Major Project(Grant No.23280101Z)the National Key Laboratory of Metal Forming Technology and Heavy Equipment Open Fund(Grant No.S2308100.W17).
文摘A novel dual-branch decoding fusion convolutional neural network model(DDFNet)specifically designed for real-time salient object detection(SOD)on steel surfaces is proposed.DDFNet is based on a standard encoder–decoder architecture.DDFNet integrates three key innovations:first,we introduce a novel,lightweight multi-scale progressive aggregation residual network that effectively suppresses background interference and refines defect details,enabling efficient salient feature extraction.Then,we propose an innovative dual-branch decoding fusion structure,comprising the refined defect representation branch and the enhanced defect representation branch,which enhance accuracy in defect region identification and feature representation.Additionally,to further improve the detection of small and complex defects,we incorporate a multi-scale attention fusion module.Experimental results on the public ESDIs-SOD dataset show that DDFNet,with only 3.69 million parameters,achieves detection performance comparable to current state-of-the-art models,demonstrating its potential for real-time industrial applications.Furthermore,our DDFNet-L variant consistently outperforms leading methods in detection performance.The code is available at https://github.com/13140W/DDFNet.
文摘The increasing prevalence of violent incidents in public spaces has created an urgent need for intelligent surveillance systems capable of detecting dangerous objects in real time.While traditional video surveillance relies on human monitoring,this approach suffers from limitations such as fatigue and delayed response times.This study addresses these challenges by developing an automated detection system using advanced deep learning techniques to enhance public safety.Our approach leverages state-of-the-art convolutional neural networks(CNNs),specifically You Only Look Once version 4(YOLOv4)and EfficientDet,for real-time object detection.The system was trained on a comprehensive dataset of over 50,000 images,enhanced through data augmentation techniques to improve robustness across varying lighting conditions and viewing angles.Cloud-based deployment on Amazon Web Services(AWS)ensured scalability and efficient processing.Experimental evaluations demonstrated high performance,with YOLOv4 achieving 92%accuracy and processing images in 0.45 s,while EfficientDet reached 93%accuracy with a slightly longer processing time of 0.55 s per image.Field tests in high-traffic environments such as train stations and shopping malls confirmed the system’s reliability,with a false alarm rate of only 4.5%.The integration of automatic alerts enabled rapid security responses to potential threats.The proposed CNN-based system provides an effective solution for real-time detection of dangerous objects in video surveillance,significantly improving response times and public safety.While YOLOv4 proved more suitable for speed-critical applications,EfficientDet offered marginally better accuracy.Future work will focus on optimizing the system for low-light conditions and further reducing false positives.This research contributes to the advancement of AI-driven surveillance technologies,offering a scalable framework adaptable to various security scenarios.