In foggy traffic scenarios, existing object detection algorithms face challenges such as low detection accuracy, poor robustness, occlusion, missed detections, and false detections. To address this issue, a multi-scal...In foggy traffic scenarios, existing object detection algorithms face challenges such as low detection accuracy, poor robustness, occlusion, missed detections, and false detections. To address this issue, a multi-scale object detection algorithm based on an improved YOLOv8 has been proposed. Firstly, a lightweight attention mechanism, Triplet Attention, is introduced to enhance the algorithm’s ability to extract multi-dimensional and multi-scale features, thereby improving the receptive capability of the feature maps. Secondly, the Diverse Branch Block (DBB) is integrated into the CSP Bottleneck with two Convolutions (C2F) module to strengthen the fusion of semantic information across different layers. Thirdly, a new decoupled detection head is proposed by redesigning the original network head based on the Diverse Branch Block module to improve detection accuracy and reduce missed and false detections. Finally, the Minimum Point Distance based Intersection-over-Union (MPDIoU) is used to replace the original YOLOv8 Complete Intersection-over-Union (CIoU) to accelerate the network’s training convergence. Comparative experiments and dehazing pre-processing tests were conducted on the RTTS and VOC-Fog datasets. Compared to the baseline YOLOv8 model, the improved algorithm achieved mean Average Precision (mAP) improvements of 4.6% and 3.8%, respectively. After defogging pre-processing, the mAP increased by 5.3% and 4.4%, respectively. The experimental results demonstrate that the improved algorithm exhibits high practicality and effectiveness in foggy traffic scenarios.展开更多
While moving ahead with the object detection technology, especially deep neural networks, many related tasks, such as medical application and industrial automation, have achieved great success. However, the detection ...While moving ahead with the object detection technology, especially deep neural networks, many related tasks, such as medical application and industrial automation, have achieved great success. However, the detection of objects with multiple aspect ratios and scales is still a key problem. This paper proposes a top-down and bottom-up feature pyramid network(TDBU-FPN),which combines multi-scale feature representation and anchor generation at multiple aspect ratios. First, in order to build the multi-scale feature map, this paper puts a number of fully convolutional layers after the backbone. Second, to link neighboring feature maps, top-down and bottom-up flows are adopted to introduce context information via top-down flow and supplement suboriginal information via bottom-up flow. The top-down flow refers to the deconvolution procedure, and the bottom-up flow refers to the pooling procedure. Third, the problem of adapting different object aspect ratios is tackled via many anchor shapes with different aspect ratios on each multi-scale feature map. The proposed method is evaluated on the pattern analysis, statistical modeling and computational learning visual object classes(PASCAL VOC)dataset and reaches an accuracy of 79%, which exhibits a 1.8% improvement with a detection speed of 23 fps.展开更多
Multi-scale object detection is a research hotspot,and it has critical applications in many secure systems.Although the object detection algorithms have constantly been progressing recently,how to perform highly accur...Multi-scale object detection is a research hotspot,and it has critical applications in many secure systems.Although the object detection algorithms have constantly been progressing recently,how to perform highly accurate and reliable multi-class object detection is still a challenging task due to the influence of many factors,such as the deformation and occlusion of the object in the actual scene.The more interference factors,the more complicated the semantic information,so we need a deeper network to extract deep information.However,deep neural networks often suffer from network degradation.To prevent the occurrence of degradation on deep neural networks,we put forth a new model using a newly-designed Pre-ReLU,which inserts a ReLU layer before the convolution layer for the sake of preventing network degradation and ensuring the performance of deep networks.This structure can transfer the semantic information more smoothly from the shallow to the deep layer.However,the deep networks will encounter not only degradation,but also a decline in efficiency.Therefore,to speed up the two-stage detector,we divide the feature map into many groups so as to diminish the number of parameters.Correspondingly,calculation speed has been enhanced,achieving a balance between speed and accuracy.Through mathematical demonstration,a Balanced Loss(BL)is proposed by a balance factor to decrease the weight of the negative sample during the training phase to balance the positives and negatives.Finally,our detector demonstrates rosy results in a range of experiments and gains an mAP of 73.38 on PASCAL VOC2007,which approaches the requirement of many security systems.展开更多
Computer vision-based traffic object detection plays a critical role in road traffic safety.Under hazy weather conditions,images captured by road monitoring systems exhibit three main challenges:significant scale vari...Computer vision-based traffic object detection plays a critical role in road traffic safety.Under hazy weather conditions,images captured by road monitoring systems exhibit three main challenges:significant scale variations,abundant background noise,and diverse perspectives.These factors lead to insufficient detection accuracy and limited real-time performance in object detection algorithms.We propose AMC-YOLO an improved YOLOv11-based traffic detection algorithm to address these challenges.In this work,we replace the C3k block's bottleneck module with our novel attention-gate convolution(AGConv),which improves contextual information capture,enhances feature extraction,and reduces computational redundancy.Additionally,we introduce the multi-dilation sharing convolution(MDSC)module to prevent feature information loss during pooling operations,enhancing the model's sensitivity to multi-scale features.We design a lightweight and efficient cross-channel feature fusion module(CCFM)for the path aggregation neck to adaptively adjust feature weights and optimize the model's overall performance.Experimental results demonstrate that AMC-YOLO achieves a 1.1%improvement in mAP@0.5 and a 2.7%increase in mAP@0.5:0.95 compared to YOLOv11n.On graphics processing unit(GPU)hardware,it achieves real-time performance at 376(FPS)with only 2.6 million parameters,ensuring high-precision traffic detection while meeting deployment requirements on resource-constrained devices.展开更多
The ubiquity of mobile devices has driven advancements in mobile object detection.However,challenges in multi-scale object detection in open,complex environments persist due to limited computational resources.Traditio...The ubiquity of mobile devices has driven advancements in mobile object detection.However,challenges in multi-scale object detection in open,complex environments persist due to limited computational resources.Traditional approaches like network compression,quantization,and lightweight design often sacrifice accuracy or feature representation robustness.This article introduces the Fast Multi-scale Channel Shuffling Network(FMCSNet),a novel lightweight detection model optimized for mobile devices.FMCSNet integrates a fully convolutional Multilayer Perceptron(MLP)module,offering global perception without significantly increasing parameters,effectively bridging the gap between CNNs and Vision Transformers.FMCSNet achieves a delicate balance between computation and accuracy mainly by two key modules:the ShiftMLP module,including a shift operation and an MLP module,and a Partial group Convolutional(PGConv)module,reducing computation while enhancing information exchange between channels.With a computational complexity of 1.4G FLOPs and 1.3M parameters,FMCSNet outperforms CNN-based and DWConv-based ShuffleNetv2 by 1%and 4.5%mAP on the Pascal VOC 2007 dataset,respectively.Additionally,FMCSNet achieves a mAP of 30.0(0.5:0.95 IoU threshold)with only 2.5G FLOPs and 2.0M parameters.It achieves 32 FPS on low-performance i5-series CPUs,meeting real-time detection requirements.The versatility of the PGConv module’s adaptability across scenarios further highlights FMCSNet as a promising solution for real-time mobile object detection.展开更多
Accurate and real-time road defect detection is essential for ensuring traffic safety and infrastructure maintenance.However,existing vision-based methods often struggle with small,sparse,and low-resolution defects un...Accurate and real-time road defect detection is essential for ensuring traffic safety and infrastructure maintenance.However,existing vision-based methods often struggle with small,sparse,and low-resolution defects under complex road conditions.To address these limitations,we propose Multi-Scale Guided Detection YOLO(MGD-YOLO),a novel lightweight and high-performance object detector built upon You Only Look Once Version 5(YOLOv5).The proposed model integrates three key components:(1)a Multi-Scale Dilated Attention(MSDA)module to enhance semantic feature extraction across varying receptive fields;(2)Depthwise Separable Convolution(DSC)to reduce computational cost and improve model generalization;and(3)a Visual Global Attention Upsampling(VGAU)module that leverages high-level contextual information to refine low-level features for precise localization.Extensive experiments on three public road defect benchmarks demonstrate that MGD-YOLO outperforms state-of-the-art models in both detection accuracy and efficiency.Notably,our model achieves 87.9%accuracy in crack detection,88.3%overall precision on TD-RD dataset,while maintaining fast inference speed and a compact architecture.These results highlight the potential of MGD-YOLO for deployment in real-time,resource-constrained scenarios,paving the way for practical and scalable intelligent road maintenance systems.展开更多
Detecting abnormal cervical cells is crucial for early identification and timely treatment of cervical cancer.However,this task is challenging due to the morphological similarities between abnormal and normal cells an...Detecting abnormal cervical cells is crucial for early identification and timely treatment of cervical cancer.However,this task is challenging due to the morphological similarities between abnormal and normal cells and the significant variations in cell size.Pathologists often refer to surrounding cells to identify abnormalities.To emulate this slide examination behavior,this study proposes a Multi-Scale Feature Fusion Network(MSFF-Net)for detecting cervical abnormal cells.MSFF-Net employs a Cross-Scale Pooling Model(CSPM)to effectively capture diverse features and contextual information,ranging from local details to the overall structure.Additionally,a Multi-Scale Fusion Attention(MSFA)module is introduced to mitigate the impact of cell size variations by adaptively fusing local and global information at different scales.To handle the complex environment of cervical cell images,such as cell adhesion and overlapping,the Inner-CIoU loss function is utilized to more precisely measure the overlap between bounding boxes,thereby improving detection accuracy in such scenarios.Experimental results on the Comparison detector dataset demonstrate that MSFF-Net achieves a mean average precision(mAP)of 63.2%,outperforming state-of-the-art methods while maintaining a relatively small number of parameters(26.8 M).This study highlights the effectiveness of multi-scale feature fusion in enhancing the detection of cervical abnormal cells,contributing to more accurate and efficient cervical cancer screening.展开更多
With the rapid growth of socialmedia,the spread of fake news has become a growing problem,misleading the public and causing significant harm.As social media content is often composed of both images and text,the use of...With the rapid growth of socialmedia,the spread of fake news has become a growing problem,misleading the public and causing significant harm.As social media content is often composed of both images and text,the use of multimodal approaches for fake news detection has gained significant attention.To solve the problems existing in previous multi-modal fake news detection algorithms,such as insufficient feature extraction and insufficient use of semantic relations between modes,this paper proposes the MFFFND-Co(Multimodal Feature Fusion Fake News Detection with Co-Attention Block)model.First,the model deeply explores the textual content,image content,and frequency domain features.Then,it employs a Co-Attention mechanism for cross-modal fusion.Additionally,a semantic consistency detectionmodule is designed to quantify semantic deviations,thereby enhancing the performance of fake news detection.Experimentally verified on two commonly used datasets,Twitter and Weibo,the model achieved F1 scores of 90.0% and 94.0%,respectively,significantly outperforming the pre-modified MFFFND(Multimodal Feature Fusion Fake News Detection with Attention Block)model and surpassing other baseline models.This improves the accuracy of detecting fake information in artificial intelligence detection and engineering software detection.展开更多
This paper proposes an automated detection framework for transmission facilities using a featureattention multi-scale robustness network(FAMSR-Net)with high-fidelity virtual images.The proposed framework exhibits thre...This paper proposes an automated detection framework for transmission facilities using a featureattention multi-scale robustness network(FAMSR-Net)with high-fidelity virtual images.The proposed framework exhibits three key characteristics.First,virtual images of the transmission facilities generated using StyleGAN2-ADA are co-trained with real images.This enables the neural network to learn various features of transmission facilities to improve the detection performance.Second,the convolutional block attention module is deployed in FAMSR-Net to effectively extract features from images and construct multi-dimensional feature maps,enabling the neural network to perform precise object detection in various environments.Third,an effective bounding box optimization method called Scylla-IoU is deployed on FAMSR-Net,considering the intersection over union,center point distance,angle,and shape of the bounding box.This enables the detection of power facilities of various sizes accurately.Extensive experiments demonstrated that FAMSRNet outperforms other neural networks in detecting power facilities.FAMSR-Net also achieved the highest detection accuracy when virtual images of the transmission facilities were co-trained in the training phase.The proposed framework is effective for the scheduled operation and maintenance of transmission facilities because an optical camera is currently the most promising tool for unmanned aerial vehicles.This ultimately contributes to improved inspection efficiency,reduced maintenance risks,and more reliable power delivery across extensive transmission facilities.展开更多
Transportation systems are experiencing a significant transformation due to the integration of advanced technologies, including artificial intelligence and machine learning. In the context of intelligent transportatio...Transportation systems are experiencing a significant transformation due to the integration of advanced technologies, including artificial intelligence and machine learning. In the context of intelligent transportation systems (ITS) and Advanced Driver Assistance Systems (ADAS), the development of efficient and reliable traffic light detection mechanisms is crucial for enhancing road safety and traffic management. This paper presents an optimized convolutional neural network (CNN) framework designed to detect traffic lights in real-time within complex urban environments. Leveraging multi-scale pyramid feature maps, the proposed model addresses key challenges such as the detection of small, occluded, and low-resolution traffic lights amidst complex backgrounds. The integration of dilated convolutions, Region of Interest (ROI) alignment, and Soft Non-Maximum Suppression (Soft-NMS) further improves detection accuracy and reduces false positives. By optimizing computational efficiency and parameter complexity, the framework is designed to operate seamlessly on embedded systems, ensuring robust performance in real-world applications. Extensive experiments using real-world datasets demonstrate that our model significantly outperforms existing methods, providing a scalable solution for ITS and ADAS applications. This research contributes to the advancement of Artificial Intelligence-driven (AI-driven) pattern recognition in transportation systems and offers a mathematical approach to improving efficiency and safety in logistics and transportation networks.展开更多
Drone-based small object detection is of great significance in practical applications such as military actions, disaster rescue, transportation, etc. However, the severe scale differences in objects captured by drones...Drone-based small object detection is of great significance in practical applications such as military actions, disaster rescue, transportation, etc. However, the severe scale differences in objects captured by drones and lack of detail information for small-scale objects make drone-based small object detection a formidable challenge. To address these issues, we first develop a mathematical model to explore how changing receptive fields impacts the polynomial fitting results. Subsequently, based on the obtained conclusions, we propose a simple but effective Hybrid Receptive Field Network (HRFNet), whose modules include Hybrid Feature Augmentation (HFA), Hybrid Feature Pyramid (HFP) and Dual Scale Head (DSH). Specifically, HFA employs parallel dilated convolution kernels of different sizes to extend shallow features with different receptive fields, committed to improving the multi-scale adaptability of the network;HFP enhances the perception of small objects by capturing contextual information across layers, while DSH reconstructs the original prediction head utilizing a set of high-resolution features and ultrahigh-resolution features. In addition, in order to train HRFNet, the corresponding dual-scale loss function is designed. Finally, comprehensive evaluation results on public benchmarks such as VisDrone-DET and TinyPerson demonstrate the robustness of the proposed method. Most impressively, the proposed HRFNet achieves a mAP of 51.0 on VisDrone-DET with 29.3 M parameters, which outperforms the extant state-of-the-art detectors. HRFNet also performs excellently in complex scenarios captured by drones, achieving the best performance on the CS-Drone dataset we built.展开更多
Recent years have seen a surge in interest in object detection on remote sensing images for applications such as surveillance andmanagement.However,challenges like small object detection,scale variation,and the presen...Recent years have seen a surge in interest in object detection on remote sensing images for applications such as surveillance andmanagement.However,challenges like small object detection,scale variation,and the presence of closely packed objects in these images hinder accurate detection.Additionally,the motion blur effect further complicates the identification of such objects.To address these issues,we propose enhanced YOLOv9 with a transformer head(YOLOv9-TH).The model introduces an additional prediction head for detecting objects of varying sizes and swaps the original prediction heads for transformer heads to leverage self-attention mechanisms.We further improve YOLOv9-TH using several strategies,including data augmentation,multi-scale testing,multi-model integration,and the introduction of an additional classifier.The cross-stage partial(CSP)method and the ghost convolution hierarchical graph(GCHG)are combined to improve detection accuracy by better utilizing feature maps,widening the receptive field,and precisely extracting multi-scale objects.Additionally,we incorporate the E-SimAM attention mechanism to address low-resolution feature loss.Extensive experiments on the VisDrone2021 and DIOR datasets demonstrate the effectiveness of YOLOv9-TH,showing good improvement in mAP compared to the best existing methods.The YOLOv9-TH-e achieved 54.2% of mAP50 on the VisDrone2021 dataset and 92.3% of mAP on the DIOR dataset.The results confirmthemodel’s robustness and suitability for real-world applications,particularly for small object detection in remote sensing images.展开更多
Remote sensing imagery,due to its high altitude,presents inherent challenges characterized by multiple scales,limited target areas,and intricate backgrounds.These inherent traits often lead to increased miss and false...Remote sensing imagery,due to its high altitude,presents inherent challenges characterized by multiple scales,limited target areas,and intricate backgrounds.These inherent traits often lead to increased miss and false detection rates when applying object recognition algorithms tailored for remote sensing imagery.Additionally,these complexities contribute to inaccuracies in target localization and hinder precise target categorization.This paper addresses these challenges by proposing a solution:The YOLO-MFD model(YOLO-MFD:Remote Sensing Image Object Detection withMulti-scale Fusion Dynamic Head).Before presenting our method,we delve into the prevalent issues faced in remote sensing imagery analysis.Specifically,we emphasize the struggles of existing object recognition algorithms in comprehensively capturing critical image features amidst varying scales and complex backgrounds.To resolve these issues,we introduce a novel approach.First,we propose the implementation of a lightweight multi-scale module called CEF.This module significantly improves the model’s ability to comprehensively capture important image features by merging multi-scale feature information.It effectively addresses the issues of missed detection and mistaken alarms that are common in remote sensing imagery.Second,an additional layer of small target detection heads is added,and a residual link is established with the higher-level feature extraction module in the backbone section.This allows the model to incorporate shallower information,significantly improving the accuracy of target localization in remotely sensed images.Finally,a dynamic head attentionmechanism is introduced.This allows themodel to exhibit greater flexibility and accuracy in recognizing shapes and targets of different sizes.Consequently,the precision of object detection is significantly improved.The trial results show that the YOLO-MFD model shows improvements of 6.3%,3.5%,and 2.5%over the original YOLOv8 model in Precision,map@0.5 and map@0.5:0.95,separately.These results illustrate the clear advantages of the method.展开更多
Accurately identifying small objects in high-resolution aerial images presents a complex and crucial task in thefield of small object detection on unmanned aerial vehicles(UAVs).This task is challenging due to variati...Accurately identifying small objects in high-resolution aerial images presents a complex and crucial task in thefield of small object detection on unmanned aerial vehicles(UAVs).This task is challenging due to variations inUAV flight altitude,differences in object scales,as well as factors like flight speed and motion blur.To enhancethe detection efficacy of small targets in drone aerial imagery,we propose an enhanced You Only Look Onceversion 7(YOLOv7)algorithm based on multi-scale spatial context.We build the MSC-YOLO model,whichincorporates an additional prediction head,denoted as P2,to improve adaptability for small objects.We replaceconventional downsampling with a Spatial-to-Depth Convolutional Combination(CSPDC)module to mitigatethe loss of intricate feature details related to small objects.Furthermore,we propose a Spatial Context Pyramidwith Multi-Scale Attention(SCPMA)module,which captures spatial and channel-dependent features of smalltargets acrossmultiple scales.This module enhances the perception of spatial contextual features and the utilizationof multiscale feature information.On the Visdrone2023 and UAVDT datasets,MSC-YOLO achieves remarkableresults,outperforming the baseline method YOLOv7 by 3.0%in terms ofmean average precision(mAP).The MSCYOLOalgorithm proposed in this paper has demonstrated satisfactory performance in detecting small targets inUAV aerial photography,providing strong support for practical applications.展开更多
To maintain the reliability of power systems,routine inspections using drones equipped with advanced object detection algorithms are essential for preempting power-related issues.The increasing resolution of drone-cap...To maintain the reliability of power systems,routine inspections using drones equipped with advanced object detection algorithms are essential for preempting power-related issues.The increasing resolution of drone-captured images has posed a challenge for traditional target detection methods,especially in identifying small objects in high-resolution images.This study presents an enhanced object detection algorithm based on the Faster Regionbased Convolutional Neural Network(Faster R-CNN)framework,specifically tailored for detecting small-scale electrical components like insulators,shock hammers,and screws in transmission line.The algorithm features an improved backbone network for Faster R-CNN,which significantly boosts the feature extraction network’s ability to detect fine details.The Region Proposal Network is optimized using a method of guided feature refinement(GFR),which achieves a balance between accuracy and speed.The incorporation of Generalized Intersection over Union(GIOU)and Region of Interest(ROI)Align further refines themodel’s accuracy.Experimental results demonstrate a notable improvement in mean Average Precision,reaching 89.3%,an 11.1%increase compared to the standard Faster R-CNN.This highlights the effectiveness of the proposed algorithm in identifying electrical components in high-resolution aerial images.展开更多
A novel dual-branch decoding fusion convolutional neural network model(DDFNet)specifically designed for real-time salient object detection(SOD)on steel surfaces is proposed.DDFNet is based on a standard encoder–decod...A novel dual-branch decoding fusion convolutional neural network model(DDFNet)specifically designed for real-time salient object detection(SOD)on steel surfaces is proposed.DDFNet is based on a standard encoder–decoder architecture.DDFNet integrates three key innovations:first,we introduce a novel,lightweight multi-scale progressive aggregation residual network that effectively suppresses background interference and refines defect details,enabling efficient salient feature extraction.Then,we propose an innovative dual-branch decoding fusion structure,comprising the refined defect representation branch and the enhanced defect representation branch,which enhance accuracy in defect region identification and feature representation.Additionally,to further improve the detection of small and complex defects,we incorporate a multi-scale attention fusion module.Experimental results on the public ESDIs-SOD dataset show that DDFNet,with only 3.69 million parameters,achieves detection performance comparable to current state-of-the-art models,demonstrating its potential for real-time industrial applications.Furthermore,our DDFNet-L variant consistently outperforms leading methods in detection performance.The code is available at https://github.com/13140W/DDFNet.展开更多
Dear Editor,This letter focuses on the fact that small objects with few pixels disappear in feature maps with large receptive fields, as the network deepens, in object detection tasks. Therefore, the detection of dens...Dear Editor,This letter focuses on the fact that small objects with few pixels disappear in feature maps with large receptive fields, as the network deepens, in object detection tasks. Therefore, the detection of dense small objects is challenging.展开更多
Top-view fisheye cameras are widely used in personnel surveillance for their broad field of view,but their unique imaging characteristics pose challenges like distortion,complex scenes,scale variations,and small objec...Top-view fisheye cameras are widely used in personnel surveillance for their broad field of view,but their unique imaging characteristics pose challenges like distortion,complex scenes,scale variations,and small objects near image edges.To tackle these,we proposed peripheral focus you only look once(PF-YOLO),an enhanced YOLOv8n-based method.Firstly,we introduced a cutting-patch data augmentation strategy to mitigate the problem of insufficient small-object samples in various scenes.Secondly,to enhance the model's focus on small objects near the edges,we designed the peripheral focus loss,which uses dynamic focus coefficients to provide greater gradient gains for these objects,improving their regression accuracy.Finally,we designed the three dimensional(3D)spatial-channel coordinate attention C2f module,enhancing spatial and channel perception,suppressing noise,and improving personnel detection.Experimental results demonstrate that PF-YOLO achieves strong performance on the challenging events for person detection from overhead fisheye images(CEPDTOF)and in-the-wild events for people detection and tracking from overhead fisheye cameras(WEPDTOF)datasets.Compared to the original YOLOv8n model,PFYOLO achieves improvements on CEPDTOF with increases of 2.1%,1.7%and 2.9%in mean average precision 50(mAP 50),mAP 50-95,and tively.On WEPDTOF,PF-YOLO achieves substantial improvements with increases of 31.4%,14.9%,61.1%and 21.0%in 91.2%and 57.2%,respectively.展开更多
In this paper,a two-stage light detection and ranging(LiDAR) three-dimensional(3D) object detection framework is presented,namely point-voxel dual transformer(PV-DT3D),which is a transformer-based method.In the propos...In this paper,a two-stage light detection and ranging(LiDAR) three-dimensional(3D) object detection framework is presented,namely point-voxel dual transformer(PV-DT3D),which is a transformer-based method.In the proposed PV-DT3D,point-voxel fusion features are used for proposal refinement.Specifically,keypoints are sampled from entire point cloud scene and used to encode representative scene features via a proposal-aware voxel set abstraction module.Subsequently,following the generation of proposals by the region proposal networks(RPN),the internal encoded keypoints are fed into the dual transformer encoder-decoder architecture.In 3D object detection,the proposed PV-DT3D takes advantage of both point-wise transformer and channel-wise architecture to capture contextual information from the spatial and channel dimensions.Experiments conducted on the highly competitive KITTI 3D car detection leaderboard show that the PV-DT3D achieves superior detection accuracy among state-of-the-art point-voxel-based methods.展开更多
Flooding and heavy rainfall under extreme weather conditions pose significant challenges to target detection algorithms.Traditional methods often struggle to address issues such as image blurring,dynamic noise interfe...Flooding and heavy rainfall under extreme weather conditions pose significant challenges to target detection algorithms.Traditional methods often struggle to address issues such as image blurring,dynamic noise interference,and variations in target scale.Conventional neural network(CNN)-based target detection approaches face notable limitations in such adverse weather scenarios,primarily due to the fixed geometric sampling structures that hinder adaptability to complex backgrounds and dynamically changing object appearances.To address these challenges,this paper proposes an optimized YOLOv9 model incorporating an improved deformable convolutional network(DCN)enhanced with a multi-scale dilated attention(MSDA)mechanism.Specifically,the DCN module enhances themodel’s adaptability to target deformation and noise interference by adaptively adjusting the sampling grid positions,while also integrating feature amplitude modulation to further improve robustness.Additionally,theMSDA module is introduced to capture contextual features acrossmultiple scales,effectively addressing issues related to target occlusion and scale variation commonly encountered in flood-affected environments.Experimental evaluations are conducted on the ISE-UFDS and UA-DETRAC datasets.The results demonstrate that the proposedmodel significantly outperforms state-of-the-art methods in key evaluation metrics,including precision,recall,F1-score,and mAP(Mean Average Precision).Notably,the model exhibits superior robustness and generalization performance under simulated severe weather conditions,offering reliable technical support for disaster emergency response systems.This study contributes to enhancing the accuracy and real-time capabilities of flood early warning systems,thereby supporting more effective disaster mitigation strategies.展开更多
基金supported by the National Natural Science Foundation of China(Grant Nos.62101275 and 62101274).
文摘In foggy traffic scenarios, existing object detection algorithms face challenges such as low detection accuracy, poor robustness, occlusion, missed detections, and false detections. To address this issue, a multi-scale object detection algorithm based on an improved YOLOv8 has been proposed. Firstly, a lightweight attention mechanism, Triplet Attention, is introduced to enhance the algorithm’s ability to extract multi-dimensional and multi-scale features, thereby improving the receptive capability of the feature maps. Secondly, the Diverse Branch Block (DBB) is integrated into the CSP Bottleneck with two Convolutions (C2F) module to strengthen the fusion of semantic information across different layers. Thirdly, a new decoupled detection head is proposed by redesigning the original network head based on the Diverse Branch Block module to improve detection accuracy and reduce missed and false detections. Finally, the Minimum Point Distance based Intersection-over-Union (MPDIoU) is used to replace the original YOLOv8 Complete Intersection-over-Union (CIoU) to accelerate the network’s training convergence. Comparative experiments and dehazing pre-processing tests were conducted on the RTTS and VOC-Fog datasets. Compared to the baseline YOLOv8 model, the improved algorithm achieved mean Average Precision (mAP) improvements of 4.6% and 3.8%, respectively. After defogging pre-processing, the mAP increased by 5.3% and 4.4%, respectively. The experimental results demonstrate that the improved algorithm exhibits high practicality and effectiveness in foggy traffic scenarios.
基金supported by the Program of Introducing Talents of Discipline to Universities(111 Plan)of China(B14010)the National Natural Science Foundation of China(31727901)
文摘While moving ahead with the object detection technology, especially deep neural networks, many related tasks, such as medical application and industrial automation, have achieved great success. However, the detection of objects with multiple aspect ratios and scales is still a key problem. This paper proposes a top-down and bottom-up feature pyramid network(TDBU-FPN),which combines multi-scale feature representation and anchor generation at multiple aspect ratios. First, in order to build the multi-scale feature map, this paper puts a number of fully convolutional layers after the backbone. Second, to link neighboring feature maps, top-down and bottom-up flows are adopted to introduce context information via top-down flow and supplement suboriginal information via bottom-up flow. The top-down flow refers to the deconvolution procedure, and the bottom-up flow refers to the pooling procedure. Third, the problem of adapting different object aspect ratios is tackled via many anchor shapes with different aspect ratios on each multi-scale feature map. The proposed method is evaluated on the pattern analysis, statistical modeling and computational learning visual object classes(PASCAL VOC)dataset and reaches an accuracy of 79%, which exhibits a 1.8% improvement with a detection speed of 23 fps.
基金supported by the Science and Technology Project of Sichuan(Nos.2019YFG0504,2021YFG0314,2020YFG0459)the National Natural Science Foundation of China(Grant Nos.61872066 and U19A2078).
文摘Multi-scale object detection is a research hotspot,and it has critical applications in many secure systems.Although the object detection algorithms have constantly been progressing recently,how to perform highly accurate and reliable multi-class object detection is still a challenging task due to the influence of many factors,such as the deformation and occlusion of the object in the actual scene.The more interference factors,the more complicated the semantic information,so we need a deeper network to extract deep information.However,deep neural networks often suffer from network degradation.To prevent the occurrence of degradation on deep neural networks,we put forth a new model using a newly-designed Pre-ReLU,which inserts a ReLU layer before the convolution layer for the sake of preventing network degradation and ensuring the performance of deep networks.This structure can transfer the semantic information more smoothly from the shallow to the deep layer.However,the deep networks will encounter not only degradation,but also a decline in efficiency.Therefore,to speed up the two-stage detector,we divide the feature map into many groups so as to diminish the number of parameters.Correspondingly,calculation speed has been enhanced,achieving a balance between speed and accuracy.Through mathematical demonstration,a Balanced Loss(BL)is proposed by a balance factor to decrease the weight of the negative sample during the training phase to balance the positives and negatives.Finally,our detector demonstrates rosy results in a range of experiments and gains an mAP of 73.38 on PASCAL VOC2007,which approaches the requirement of many security systems.
基金supported by the Wuhan Pilot construction of a strong Transportation Country Science and Technology Joint Research Project(No.2024-1-10).
文摘Computer vision-based traffic object detection plays a critical role in road traffic safety.Under hazy weather conditions,images captured by road monitoring systems exhibit three main challenges:significant scale variations,abundant background noise,and diverse perspectives.These factors lead to insufficient detection accuracy and limited real-time performance in object detection algorithms.We propose AMC-YOLO an improved YOLOv11-based traffic detection algorithm to address these challenges.In this work,we replace the C3k block's bottleneck module with our novel attention-gate convolution(AGConv),which improves contextual information capture,enhances feature extraction,and reduces computational redundancy.Additionally,we introduce the multi-dilation sharing convolution(MDSC)module to prevent feature information loss during pooling operations,enhancing the model's sensitivity to multi-scale features.We design a lightweight and efficient cross-channel feature fusion module(CCFM)for the path aggregation neck to adaptively adjust feature weights and optimize the model's overall performance.Experimental results demonstrate that AMC-YOLO achieves a 1.1%improvement in mAP@0.5 and a 2.7%increase in mAP@0.5:0.95 compared to YOLOv11n.On graphics processing unit(GPU)hardware,it achieves real-time performance at 376(FPS)with only 2.6 million parameters,ensuring high-precision traffic detection while meeting deployment requirements on resource-constrained devices.
基金funded by the National Natural Science Foundation of China under Grant No.62371187the Open Program of Hunan Intelligent Rehabilitation Robot and Auxiliary Equipment Engineering Technology Research Center under Grant No.2024JS101.
文摘The ubiquity of mobile devices has driven advancements in mobile object detection.However,challenges in multi-scale object detection in open,complex environments persist due to limited computational resources.Traditional approaches like network compression,quantization,and lightweight design often sacrifice accuracy or feature representation robustness.This article introduces the Fast Multi-scale Channel Shuffling Network(FMCSNet),a novel lightweight detection model optimized for mobile devices.FMCSNet integrates a fully convolutional Multilayer Perceptron(MLP)module,offering global perception without significantly increasing parameters,effectively bridging the gap between CNNs and Vision Transformers.FMCSNet achieves a delicate balance between computation and accuracy mainly by two key modules:the ShiftMLP module,including a shift operation and an MLP module,and a Partial group Convolutional(PGConv)module,reducing computation while enhancing information exchange between channels.With a computational complexity of 1.4G FLOPs and 1.3M parameters,FMCSNet outperforms CNN-based and DWConv-based ShuffleNetv2 by 1%and 4.5%mAP on the Pascal VOC 2007 dataset,respectively.Additionally,FMCSNet achieves a mAP of 30.0(0.5:0.95 IoU threshold)with only 2.5G FLOPs and 2.0M parameters.It achieves 32 FPS on low-performance i5-series CPUs,meeting real-time detection requirements.The versatility of the PGConv module’s adaptability across scenarios further highlights FMCSNet as a promising solution for real-time mobile object detection.
基金supported by Chengdu Jincheng College under the General Research Project Program(Project No.JG2024-1199)titled“Research on the Training Mechanism of Undergraduate Innovation Ability Based on Deep Integration of AI Industry-Education Collaboration”.
文摘Accurate and real-time road defect detection is essential for ensuring traffic safety and infrastructure maintenance.However,existing vision-based methods often struggle with small,sparse,and low-resolution defects under complex road conditions.To address these limitations,we propose Multi-Scale Guided Detection YOLO(MGD-YOLO),a novel lightweight and high-performance object detector built upon You Only Look Once Version 5(YOLOv5).The proposed model integrates three key components:(1)a Multi-Scale Dilated Attention(MSDA)module to enhance semantic feature extraction across varying receptive fields;(2)Depthwise Separable Convolution(DSC)to reduce computational cost and improve model generalization;and(3)a Visual Global Attention Upsampling(VGAU)module that leverages high-level contextual information to refine low-level features for precise localization.Extensive experiments on three public road defect benchmarks demonstrate that MGD-YOLO outperforms state-of-the-art models in both detection accuracy and efficiency.Notably,our model achieves 87.9%accuracy in crack detection,88.3%overall precision on TD-RD dataset,while maintaining fast inference speed and a compact architecture.These results highlight the potential of MGD-YOLO for deployment in real-time,resource-constrained scenarios,paving the way for practical and scalable intelligent road maintenance systems.
基金funded by the China Chongqing Municipal Science and Technology Bureau,grant numbers 2024TIAD-CYKJCXX0121,2024NSCQ-LZX0135Chongqing Municipal Commission of Housing and Urban-Rural Development,grant number CKZ2024-87+3 种基金the Chongqing University of Technology graduate education high-quality development project,grant number gzlsz202401the Chongqing University of Technology-Chongqing LINGLUE Technology Co.,Ltd.,Electronic Information(Artificial Intelligence)graduate joint training basethe Postgraduate Education and Teaching Reform Research Project in Chongqing,grant number yjg213116the Chongqing University of Technology-CISDI Chongqing Information Technology Co.,Ltd.,Computer Technology graduate joint training base.
文摘Detecting abnormal cervical cells is crucial for early identification and timely treatment of cervical cancer.However,this task is challenging due to the morphological similarities between abnormal and normal cells and the significant variations in cell size.Pathologists often refer to surrounding cells to identify abnormalities.To emulate this slide examination behavior,this study proposes a Multi-Scale Feature Fusion Network(MSFF-Net)for detecting cervical abnormal cells.MSFF-Net employs a Cross-Scale Pooling Model(CSPM)to effectively capture diverse features and contextual information,ranging from local details to the overall structure.Additionally,a Multi-Scale Fusion Attention(MSFA)module is introduced to mitigate the impact of cell size variations by adaptively fusing local and global information at different scales.To handle the complex environment of cervical cell images,such as cell adhesion and overlapping,the Inner-CIoU loss function is utilized to more precisely measure the overlap between bounding boxes,thereby improving detection accuracy in such scenarios.Experimental results on the Comparison detector dataset demonstrate that MSFF-Net achieves a mean average precision(mAP)of 63.2%,outperforming state-of-the-art methods while maintaining a relatively small number of parameters(26.8 M).This study highlights the effectiveness of multi-scale feature fusion in enhancing the detection of cervical abnormal cells,contributing to more accurate and efficient cervical cancer screening.
基金supported by Communication University of China(HG23035)partly supported by the Fundamental Research Funds for the Central Universities(CUC230A013).
文摘With the rapid growth of socialmedia,the spread of fake news has become a growing problem,misleading the public and causing significant harm.As social media content is often composed of both images and text,the use of multimodal approaches for fake news detection has gained significant attention.To solve the problems existing in previous multi-modal fake news detection algorithms,such as insufficient feature extraction and insufficient use of semantic relations between modes,this paper proposes the MFFFND-Co(Multimodal Feature Fusion Fake News Detection with Co-Attention Block)model.First,the model deeply explores the textual content,image content,and frequency domain features.Then,it employs a Co-Attention mechanism for cross-modal fusion.Additionally,a semantic consistency detectionmodule is designed to quantify semantic deviations,thereby enhancing the performance of fake news detection.Experimentally verified on two commonly used datasets,Twitter and Weibo,the model achieved F1 scores of 90.0% and 94.0%,respectively,significantly outperforming the pre-modified MFFFND(Multimodal Feature Fusion Fake News Detection with Attention Block)model and surpassing other baseline models.This improves the accuracy of detecting fake information in artificial intelligence detection and engineering software detection.
基金supported by the Korea Electric Power Corporation(R22TA14,Development of Drone Systemfor Diagnosis of Porcelain Insulators in Overhead Transmission Lines)the National Fire Agency of Korea(RS-2024-00408270,Fire Hazard Analysis and Fire Safety Standards Development for Transportation and Storage Stage of Reuse Battery)the Ministry of the Interior and Safety of Korea(RS-2024-00408982,Development of Intelligent Fire Detection and Sprinkler Facility Technology Reflecting the Characteristics of Logistics Facilities).
文摘This paper proposes an automated detection framework for transmission facilities using a featureattention multi-scale robustness network(FAMSR-Net)with high-fidelity virtual images.The proposed framework exhibits three key characteristics.First,virtual images of the transmission facilities generated using StyleGAN2-ADA are co-trained with real images.This enables the neural network to learn various features of transmission facilities to improve the detection performance.Second,the convolutional block attention module is deployed in FAMSR-Net to effectively extract features from images and construct multi-dimensional feature maps,enabling the neural network to perform precise object detection in various environments.Third,an effective bounding box optimization method called Scylla-IoU is deployed on FAMSR-Net,considering the intersection over union,center point distance,angle,and shape of the bounding box.This enables the detection of power facilities of various sizes accurately.Extensive experiments demonstrated that FAMSRNet outperforms other neural networks in detecting power facilities.FAMSR-Net also achieved the highest detection accuracy when virtual images of the transmission facilities were co-trained in the training phase.The proposed framework is effective for the scheduled operation and maintenance of transmission facilities because an optical camera is currently the most promising tool for unmanned aerial vehicles.This ultimately contributes to improved inspection efficiency,reduced maintenance risks,and more reliable power delivery across extensive transmission facilities.
基金funded by the Deanship of Scientific Research at Northern Border University,Arar,Saudi Arabia through research group No.(RG-NBU-2022-1234).
文摘Transportation systems are experiencing a significant transformation due to the integration of advanced technologies, including artificial intelligence and machine learning. In the context of intelligent transportation systems (ITS) and Advanced Driver Assistance Systems (ADAS), the development of efficient and reliable traffic light detection mechanisms is crucial for enhancing road safety and traffic management. This paper presents an optimized convolutional neural network (CNN) framework designed to detect traffic lights in real-time within complex urban environments. Leveraging multi-scale pyramid feature maps, the proposed model addresses key challenges such as the detection of small, occluded, and low-resolution traffic lights amidst complex backgrounds. The integration of dilated convolutions, Region of Interest (ROI) alignment, and Soft Non-Maximum Suppression (Soft-NMS) further improves detection accuracy and reduces false positives. By optimizing computational efficiency and parameter complexity, the framework is designed to operate seamlessly on embedded systems, ensuring robust performance in real-world applications. Extensive experiments using real-world datasets demonstrate that our model significantly outperforms existing methods, providing a scalable solution for ITS and ADAS applications. This research contributes to the advancement of Artificial Intelligence-driven (AI-driven) pattern recognition in transportation systems and offers a mathematical approach to improving efficiency and safety in logistics and transportation networks.
基金supported by the National Natural Science Foundation of China(Nos.62276204 and 62203343)the Fundamental Research Funds for the Central Universities(No.YJSJ24011)+1 种基金the Natural Science Basic Research Program of Shanxi,China(Nos.2022JM-340 and 2023-JC-QN-0710)the China Postdoctoral Science Foundation(Nos.2020T130494 and 2018M633470).
文摘Drone-based small object detection is of great significance in practical applications such as military actions, disaster rescue, transportation, etc. However, the severe scale differences in objects captured by drones and lack of detail information for small-scale objects make drone-based small object detection a formidable challenge. To address these issues, we first develop a mathematical model to explore how changing receptive fields impacts the polynomial fitting results. Subsequently, based on the obtained conclusions, we propose a simple but effective Hybrid Receptive Field Network (HRFNet), whose modules include Hybrid Feature Augmentation (HFA), Hybrid Feature Pyramid (HFP) and Dual Scale Head (DSH). Specifically, HFA employs parallel dilated convolution kernels of different sizes to extend shallow features with different receptive fields, committed to improving the multi-scale adaptability of the network;HFP enhances the perception of small objects by capturing contextual information across layers, while DSH reconstructs the original prediction head utilizing a set of high-resolution features and ultrahigh-resolution features. In addition, in order to train HRFNet, the corresponding dual-scale loss function is designed. Finally, comprehensive evaluation results on public benchmarks such as VisDrone-DET and TinyPerson demonstrate the robustness of the proposed method. Most impressively, the proposed HRFNet achieves a mAP of 51.0 on VisDrone-DET with 29.3 M parameters, which outperforms the extant state-of-the-art detectors. HRFNet also performs excellently in complex scenarios captured by drones, achieving the best performance on the CS-Drone dataset we built.
文摘Recent years have seen a surge in interest in object detection on remote sensing images for applications such as surveillance andmanagement.However,challenges like small object detection,scale variation,and the presence of closely packed objects in these images hinder accurate detection.Additionally,the motion blur effect further complicates the identification of such objects.To address these issues,we propose enhanced YOLOv9 with a transformer head(YOLOv9-TH).The model introduces an additional prediction head for detecting objects of varying sizes and swaps the original prediction heads for transformer heads to leverage self-attention mechanisms.We further improve YOLOv9-TH using several strategies,including data augmentation,multi-scale testing,multi-model integration,and the introduction of an additional classifier.The cross-stage partial(CSP)method and the ghost convolution hierarchical graph(GCHG)are combined to improve detection accuracy by better utilizing feature maps,widening the receptive field,and precisely extracting multi-scale objects.Additionally,we incorporate the E-SimAM attention mechanism to address low-resolution feature loss.Extensive experiments on the VisDrone2021 and DIOR datasets demonstrate the effectiveness of YOLOv9-TH,showing good improvement in mAP compared to the best existing methods.The YOLOv9-TH-e achieved 54.2% of mAP50 on the VisDrone2021 dataset and 92.3% of mAP on the DIOR dataset.The results confirmthemodel’s robustness and suitability for real-world applications,particularly for small object detection in remote sensing images.
基金the Scientific Research Fund of Hunan Provincial Education Department(23A0423).
文摘Remote sensing imagery,due to its high altitude,presents inherent challenges characterized by multiple scales,limited target areas,and intricate backgrounds.These inherent traits often lead to increased miss and false detection rates when applying object recognition algorithms tailored for remote sensing imagery.Additionally,these complexities contribute to inaccuracies in target localization and hinder precise target categorization.This paper addresses these challenges by proposing a solution:The YOLO-MFD model(YOLO-MFD:Remote Sensing Image Object Detection withMulti-scale Fusion Dynamic Head).Before presenting our method,we delve into the prevalent issues faced in remote sensing imagery analysis.Specifically,we emphasize the struggles of existing object recognition algorithms in comprehensively capturing critical image features amidst varying scales and complex backgrounds.To resolve these issues,we introduce a novel approach.First,we propose the implementation of a lightweight multi-scale module called CEF.This module significantly improves the model’s ability to comprehensively capture important image features by merging multi-scale feature information.It effectively addresses the issues of missed detection and mistaken alarms that are common in remote sensing imagery.Second,an additional layer of small target detection heads is added,and a residual link is established with the higher-level feature extraction module in the backbone section.This allows the model to incorporate shallower information,significantly improving the accuracy of target localization in remotely sensed images.Finally,a dynamic head attentionmechanism is introduced.This allows themodel to exhibit greater flexibility and accuracy in recognizing shapes and targets of different sizes.Consequently,the precision of object detection is significantly improved.The trial results show that the YOLO-MFD model shows improvements of 6.3%,3.5%,and 2.5%over the original YOLOv8 model in Precision,map@0.5 and map@0.5:0.95,separately.These results illustrate the clear advantages of the method.
基金the Key Research and Development Program of Hainan Province(Grant Nos.ZDYF2023GXJS163,ZDYF2024GXJS014)National Natural Science Foundation of China(NSFC)(Grant Nos.62162022,62162024)+2 种基金the Major Science and Technology Project of Hainan Province(Grant No.ZDKJ2020012)Hainan Provincial Natural Science Foundation of China(Grant No.620MS021)Youth Foundation Project of Hainan Natural Science Foundation(621QN211).
文摘Accurately identifying small objects in high-resolution aerial images presents a complex and crucial task in thefield of small object detection on unmanned aerial vehicles(UAVs).This task is challenging due to variations inUAV flight altitude,differences in object scales,as well as factors like flight speed and motion blur.To enhancethe detection efficacy of small targets in drone aerial imagery,we propose an enhanced You Only Look Onceversion 7(YOLOv7)algorithm based on multi-scale spatial context.We build the MSC-YOLO model,whichincorporates an additional prediction head,denoted as P2,to improve adaptability for small objects.We replaceconventional downsampling with a Spatial-to-Depth Convolutional Combination(CSPDC)module to mitigatethe loss of intricate feature details related to small objects.Furthermore,we propose a Spatial Context Pyramidwith Multi-Scale Attention(SCPMA)module,which captures spatial and channel-dependent features of smalltargets acrossmultiple scales.This module enhances the perception of spatial contextual features and the utilizationof multiscale feature information.On the Visdrone2023 and UAVDT datasets,MSC-YOLO achieves remarkableresults,outperforming the baseline method YOLOv7 by 3.0%in terms ofmean average precision(mAP).The MSCYOLOalgorithm proposed in this paper has demonstrated satisfactory performance in detecting small targets inUAV aerial photography,providing strong support for practical applications.
基金supported by the Shanghai Science and Technology Innovation Action Plan High-Tech Field Project(Grant No.22511100601)for the year 2022 and Technology Development Fund for People’s Livelihood Research(Research on Transmission Line Deep Foundation Pit Environmental Situation Awareness System Based on Multi-Source Data).
文摘To maintain the reliability of power systems,routine inspections using drones equipped with advanced object detection algorithms are essential for preempting power-related issues.The increasing resolution of drone-captured images has posed a challenge for traditional target detection methods,especially in identifying small objects in high-resolution images.This study presents an enhanced object detection algorithm based on the Faster Regionbased Convolutional Neural Network(Faster R-CNN)framework,specifically tailored for detecting small-scale electrical components like insulators,shock hammers,and screws in transmission line.The algorithm features an improved backbone network for Faster R-CNN,which significantly boosts the feature extraction network’s ability to detect fine details.The Region Proposal Network is optimized using a method of guided feature refinement(GFR),which achieves a balance between accuracy and speed.The incorporation of Generalized Intersection over Union(GIOU)and Region of Interest(ROI)Align further refines themodel’s accuracy.Experimental results demonstrate a notable improvement in mean Average Precision,reaching 89.3%,an 11.1%increase compared to the standard Faster R-CNN.This highlights the effectiveness of the proposed algorithm in identifying electrical components in high-resolution aerial images.
基金supported in part by the National Key R&D Program of China(Grant No.2023YFB3307604)the Shanxi Province Basic Research Program Youth Science Research Project(Grant Nos.202303021212054 and 202303021212046)+3 种基金the Key Projects Supported by Hebei Natural Science Foundation(Grant No.E2024203125)the National Science Foundation of China(Grant No.52105391)the Hebei Provincial Science and Technology Major Project(Grant No.23280101Z)the National Key Laboratory of Metal Forming Technology and Heavy Equipment Open Fund(Grant No.S2308100.W17).
文摘A novel dual-branch decoding fusion convolutional neural network model(DDFNet)specifically designed for real-time salient object detection(SOD)on steel surfaces is proposed.DDFNet is based on a standard encoder–decoder architecture.DDFNet integrates three key innovations:first,we introduce a novel,lightweight multi-scale progressive aggregation residual network that effectively suppresses background interference and refines defect details,enabling efficient salient feature extraction.Then,we propose an innovative dual-branch decoding fusion structure,comprising the refined defect representation branch and the enhanced defect representation branch,which enhance accuracy in defect region identification and feature representation.Additionally,to further improve the detection of small and complex defects,we incorporate a multi-scale attention fusion module.Experimental results on the public ESDIs-SOD dataset show that DDFNet,with only 3.69 million parameters,achieves detection performance comparable to current state-of-the-art models,demonstrating its potential for real-time industrial applications.Furthermore,our DDFNet-L variant consistently outperforms leading methods in detection performance.The code is available at https://github.com/13140W/DDFNet.
基金supported in part by the National Science Foundation of China(52371372)the Project of Science and Technology Commission of Shanghai Municipality,China(22JC1401400,21190780300)the 111 Project,China(D18003)
文摘Dear Editor,This letter focuses on the fact that small objects with few pixels disappear in feature maps with large receptive fields, as the network deepens, in object detection tasks. Therefore, the detection of dense small objects is challenging.
基金supported by National Natural Science Foundation of China(Nos.62171042,62102033,U24A20331)the R&D Program of Beijing Municipal Education Commission(No.KZ202211417048)+2 种基金the Project of Construction and Support for High-Level Innovative Teams of Beijing Municipal Institutions(No.BPHR20220121)Beijing Natural Science Foundation(Nos.4232026,4242020)the Academic Research Projects of Beijing Union University(Nos.ZKZD202302,ZK20202403)。
文摘Top-view fisheye cameras are widely used in personnel surveillance for their broad field of view,but their unique imaging characteristics pose challenges like distortion,complex scenes,scale variations,and small objects near image edges.To tackle these,we proposed peripheral focus you only look once(PF-YOLO),an enhanced YOLOv8n-based method.Firstly,we introduced a cutting-patch data augmentation strategy to mitigate the problem of insufficient small-object samples in various scenes.Secondly,to enhance the model's focus on small objects near the edges,we designed the peripheral focus loss,which uses dynamic focus coefficients to provide greater gradient gains for these objects,improving their regression accuracy.Finally,we designed the three dimensional(3D)spatial-channel coordinate attention C2f module,enhancing spatial and channel perception,suppressing noise,and improving personnel detection.Experimental results demonstrate that PF-YOLO achieves strong performance on the challenging events for person detection from overhead fisheye images(CEPDTOF)and in-the-wild events for people detection and tracking from overhead fisheye cameras(WEPDTOF)datasets.Compared to the original YOLOv8n model,PFYOLO achieves improvements on CEPDTOF with increases of 2.1%,1.7%and 2.9%in mean average precision 50(mAP 50),mAP 50-95,and tively.On WEPDTOF,PF-YOLO achieves substantial improvements with increases of 31.4%,14.9%,61.1%and 21.0%in 91.2%and 57.2%,respectively.
基金supported by the Natural Science Foundation of China (No.62103298)the South African National Research Foundation (Nos.132797 and 137951)。
文摘In this paper,a two-stage light detection and ranging(LiDAR) three-dimensional(3D) object detection framework is presented,namely point-voxel dual transformer(PV-DT3D),which is a transformer-based method.In the proposed PV-DT3D,point-voxel fusion features are used for proposal refinement.Specifically,keypoints are sampled from entire point cloud scene and used to encode representative scene features via a proposal-aware voxel set abstraction module.Subsequently,following the generation of proposals by the region proposal networks(RPN),the internal encoded keypoints are fed into the dual transformer encoder-decoder architecture.In 3D object detection,the proposed PV-DT3D takes advantage of both point-wise transformer and channel-wise architecture to capture contextual information from the spatial and channel dimensions.Experiments conducted on the highly competitive KITTI 3D car detection leaderboard show that the PV-DT3D achieves superior detection accuracy among state-of-the-art point-voxel-based methods.
基金financially supported by the National Key R&D Program of China(No.2022YFC3090603)R&DProgramof BeijingMunicipal EducationCommission(No.KZ202211417049)。
文摘Flooding and heavy rainfall under extreme weather conditions pose significant challenges to target detection algorithms.Traditional methods often struggle to address issues such as image blurring,dynamic noise interference,and variations in target scale.Conventional neural network(CNN)-based target detection approaches face notable limitations in such adverse weather scenarios,primarily due to the fixed geometric sampling structures that hinder adaptability to complex backgrounds and dynamically changing object appearances.To address these challenges,this paper proposes an optimized YOLOv9 model incorporating an improved deformable convolutional network(DCN)enhanced with a multi-scale dilated attention(MSDA)mechanism.Specifically,the DCN module enhances themodel’s adaptability to target deformation and noise interference by adaptively adjusting the sampling grid positions,while also integrating feature amplitude modulation to further improve robustness.Additionally,theMSDA module is introduced to capture contextual features acrossmultiple scales,effectively addressing issues related to target occlusion and scale variation commonly encountered in flood-affected environments.Experimental evaluations are conducted on the ISE-UFDS and UA-DETRAC datasets.The results demonstrate that the proposedmodel significantly outperforms state-of-the-art methods in key evaluation metrics,including precision,recall,F1-score,and mAP(Mean Average Precision).Notably,the model exhibits superior robustness and generalization performance under simulated severe weather conditions,offering reliable technical support for disaster emergency response systems.This study contributes to enhancing the accuracy and real-time capabilities of flood early warning systems,thereby supporting more effective disaster mitigation strategies.