An improved model based on you only look once version 8(YOLOv8)is proposed to solve the problem of low detection accuracy due to the diversity of object sizes in optical remote sensing images.Firstly,the feature pyram...An improved model based on you only look once version 8(YOLOv8)is proposed to solve the problem of low detection accuracy due to the diversity of object sizes in optical remote sensing images.Firstly,the feature pyramid network(FPN)structure of the original YOLOv8 mode is replaced by the generalized-FPN(GFPN)structure in GiraffeDet to realize the"cross-layer"and"cross-scale"adaptive feature fusion,to enrich the semantic information and spatial information on the feature map to improve the target detection ability of the model.Secondly,a pyramid-pool module of multi atrous spatial pyramid pooling(MASPP)is designed by using the idea of atrous convolution and feature pyramid structure to extract multi-scale features,so as to improve the processing ability of the model for multi-scale objects.The experimental results show that the detection accuracy of the improved YOLOv8 model on DIOR dataset is 92%and mean average precision(mAP)is 87.9%,respectively 3.5%and 1.7%higher than those of the original model.It is proved the detection and classification ability of the proposed model on multi-dimensional optical remote sensing target has been improved.展开更多
The application of deep learning for target detection in aerial images captured by Unmanned Aerial Vehicles(UAV)has emerged as a prominent research focus.Due to the considerable distance between UAVs and the photograp...The application of deep learning for target detection in aerial images captured by Unmanned Aerial Vehicles(UAV)has emerged as a prominent research focus.Due to the considerable distance between UAVs and the photographed objects,coupled with complex shooting environments,existing models often struggle to achieve accurate real-time target detection.In this paper,a You Only Look Once v8(YOLOv8)model is modified from four aspects:the detection head,the up-sampling module,the feature extraction module,and the parameter optimization of positive sample screening,and the YOLO-S3DT model is proposed to improve the performance of the model for detecting small targets in aerial images.Experimental results show that all detection indexes of the proposed model are significantly improved without increasing the number of model parameters and with the limited growth of computation.Moreover,this model also has the best performance compared to other detecting models,demonstrating its advancement within this category of tasks.展开更多
To address the issues of unknown target size,blurred edges,background interference and low contrast in infrared small target detection,this paper proposes a method based on density peaks searching and weighted multi-f...To address the issues of unknown target size,blurred edges,background interference and low contrast in infrared small target detection,this paper proposes a method based on density peaks searching and weighted multi-feature local difference.Firstly,an improved high-boost filter is used for preprocessing to eliminate background clutter and high-brightness interference,thereby increasing the probability of capturing real targets in the density peak search.Secondly,a triple-layer window is used to extract features from the area surrounding candidate targets,addressing the uncertainty of small target sizes.By calculating multi-feature local differences between the triple-layer windows,the problems of blurred target edges and low contrast are resolved.To balance the contribution of different features,intra-class distance is used to calculate weights,achieving weighted fusion of multi-feature local differences to obtain the weighted multi-feature local differences of candidate targets.The real targets are then extracted using the interquartile range.Experiments on datasets such as SIRST and IRSTD-IK show that the proposed method is suitable for various complex types and demonstrates good robustness and detection performance.展开更多
Unmanned aerial vehicle(UAV)imagery poses significant challenges for object detection due to extreme scale variations,high-density small targets(68%in VisDrone dataset),and complex backgrounds.While YOLO-series models...Unmanned aerial vehicle(UAV)imagery poses significant challenges for object detection due to extreme scale variations,high-density small targets(68%in VisDrone dataset),and complex backgrounds.While YOLO-series models achieve speed-accuracy trade-offs via fixed convolution kernels and manual feature fusion,their rigid architectures struggle with multi-scale adaptability,as exemplified by YOLOv8n’s 36.4%mAP and 13.9%small-object AP on VisDrone2019.This paper presents YOLO-LE,a lightweight framework addressing these limitations through three novel designs:(1)We introduce the C2f-Dy and LDown modules to enhance the backbone’s sensitivity to small-object features while reducing backbone parameters,thereby improving model efficiency.(2)An adaptive feature fusion module is designed to dynamically integrate multi-scale feature maps,optimizing the neck structure,reducing neck complexity,and enhancing overall model performance.(3)We replace the original loss function with a distributed focal loss and incorporate a lightweight self-attention mechanism to improve small-object recognition and bounding box regression accuracy.Experimental results demonstrate that YOLO-LE achieves 39.9%mAP@0.5 on VisDrone2019,representing a 9.6%improvement over YOLOv8n,while maintaining 8.5 GFLOPs computational efficiency.This provides an efficient solution for UAV object detection in complex scenarios.展开更多
Underwater target detection is extensively applied in domains such as underwater search and rescue,environmental monitoring,and marine resource surveys.It is crucial in enabling autonomous underwater robot operations ...Underwater target detection is extensively applied in domains such as underwater search and rescue,environmental monitoring,and marine resource surveys.It is crucial in enabling autonomous underwater robot operations and promoting ocean exploration.Nevertheless,low imaging quality,harsh underwater environments,and obscured objects considerably increase the difficulty of detecting underwater targets,making it difficult for current detection methods to achieve optimal performance.In order to enhance underwater object perception and improve target detection precision,we propose a lightweight underwater target detection method using You Only Look Once(YOLO)v8 with multi-scale cross-channel attention(MSCCA),named YOLOv8-UOD.In the proposed multiscale cross-channel attention module,multi-scale attention(MSA)augments the variety of attentional perception by extracting information from innately diverse sensory fields.The cross-channel strategy utilizes RepVGGbased channel shuffling(RCS)and one-shot aggregation(OSA)to rearrange feature map channels according to specific rules.It aggregates all features only once in the final feature mapping,resulting in the extraction of more comprehensive and valuable feature information.The experimental results show that the proposed YOLOv8-UOD achieves a mAP50 of 95.67%and FLOPs of 23.8 G on the Underwater Robot Picking Contest 2017(URPC2017)dataset,outperforming other methods in terms of detection precision and computational cost-efficiency.展开更多
Aiming at the problem that infrared small target detection faces low contrast between the background and the target and insufficient noise suppression ability under the complex cloud background,an infrared small targe...Aiming at the problem that infrared small target detection faces low contrast between the background and the target and insufficient noise suppression ability under the complex cloud background,an infrared small target detection method based on the tensor nuclear norm and direction residual weighting was proposed.Based on converting the infrared image into an infrared patch tensor model,from the perspective of the low-rank nature of the background tensor,and taking advantage of the difference in contrast between the background and the target in different directions,we designed a double-neighborhood local contrast based on direction residual weighting method(DNLCDRW)combined with the partial sum of tensor nuclear norm(PSTNN)to achieve effective background suppression and recovery of infrared small targets.Experiments show that the algorithm is effective in suppressing the background and improving the detection ability of the target.展开更多
In this paper,a reasoning enhancement method based on RGCN(Relational Graph Convolutional Network)is proposed to improve the detection capability of UAV(Unmanned Aerial Vehicle)on fast-moving military targets in urban...In this paper,a reasoning enhancement method based on RGCN(Relational Graph Convolutional Network)is proposed to improve the detection capability of UAV(Unmanned Aerial Vehicle)on fast-moving military targets in urban battlefield environments.By combining military images with the publicly available VisDrone2019 dataset,a new dataset called VisMilitary was built and multiple YOLO(You Only Look Once)models were tested on it.Due to the low confidence problem caused by fuzzy targets,the performance of traditional YOLO models on real battlefield images decreases significantly.Therefore,we propose an improved RGCN inference model,which improves the performance of the model in complex environments by optimizing the data processing and graph network architecture.Experimental results show that the proposed method achieves an improvement of 0.4%to 1.7%on mAP@0.50,which proves the effectiveness of the model in military target detection.The research of this paper provides a new technical path for UAV target detection in urban battlefield,and provides important enlightenment for the application of deep learning in military field.展开更多
Infrared small-target detection has important applications in many fields due to its high penetration capability and detection distance.This study introduces a detector called“YOLO-SDLUWD”which is based on the YOLOv...Infrared small-target detection has important applications in many fields due to its high penetration capability and detection distance.This study introduces a detector called“YOLO-SDLUWD”which is based on the YOLOv7 network,for small target detection in complex infrared backgrounds.The“SDLUWD”refers to the combination of the Spatial Depth layer followed Convolutional layer structure(SD-Conv)and a Linear Up-sampling fusion Path Aggregation Feature Pyramid Network(LU-PAFPN)and a training strategy based on the normalized Gaussian Wasserstein Distance loss(WD-loss)function.“YOLO-SDLUWD”aims to reduce detection accuracy when the maximum pooling downsampling layer in the backbone network loses important feature information,support the interaction and fusion of high-dimensional and low-dimensional feature information,and overcome the false alarm predictions induced by noise in small target images.The detector achieved a mAP@0.5 of 90.4%and mAP@0.5:0.95 of 48.5%on IRIS-AG,an increase of 9%-11%over YOLOv7-tiny,outperforming other state-of-the-art target detectors in terms of accuracy and speed.展开更多
Infrared images typically exhibit diverse backgrounds,each potentially containing noise and target-like interference elements.In complex backgrounds,infrared small targets are prone to be submerged by background noise...Infrared images typically exhibit diverse backgrounds,each potentially containing noise and target-like interference elements.In complex backgrounds,infrared small targets are prone to be submerged by background noise due to their low pixel proportion and limited available features,leading to detection failure.To address this problem,this paper proposes an Attention Shift-Invariant Cross-Evolutionary Feature Fusion Network(ASCFNet)tailored for the detection of infrared weak and small targets.The network architecture first designs a Multidimensional Lightweight Pixel-level Attention Module(MLPA),which alleviates the issue of small-target feature suppression during deep network propagation by combining channel reshaping,multi-scale parallel subnet architectures,and local cross-channel interactions.Then,a Multidimensional Shift-Invariant Recall Module(MSIR)is designed to ensure the network remains unaffected by minor input perturbations when processing infrared images,through focusing on the model’s shift invariance.Subsequently,a Cross-Evolutionary Feature Fusion structure(CEFF)is designed to allow flexible and efficient integration of multidimensional feature information from different network hierarchies,thereby achieving complementarity and enhancement among features.Experimental results on three public datasets,SIRST,NUDT-SIRST,and IRST640,demonstrate that our proposed network outperforms advanced algorithms in the field.Specifically,on the NUDT-SIRST dataset,the mAP50,mAP50-95,and metrics reached 99.26%,85.22%,and 99.31%,respectively.Visual evaluations of detection results in diverse scenarios indicate that our algorithm exhibits an increased detection rate and reduced false alarm rate.Our method balances accuracy and real-time performance,and achieves efficient and stable detection of infrared weak and small targets.展开更多
Underwater target detection in forward-looking sonar(FLS)images is a challenging but promising endeavor.The existing neural-based methods yield notable progress but there remains room for improvement due to overlookin...Underwater target detection in forward-looking sonar(FLS)images is a challenging but promising endeavor.The existing neural-based methods yield notable progress but there remains room for improvement due to overlooking the unique characteristics of underwater environments.Considering the problems of low imaging resolution,complex background environment,and large changes in target imaging of underwater sonar images,this paper specifically designs a sonar images target detection Network based on Progressive sensitivity capture,named ProNet.It progressively captures the sensitive regions in the current image where potential effective targets may exist.Guided by this basic idea,the primary technical innovation of this paper is the introduction of a foundational module structure for constructing a sonar target detection backbone network.This structure employs a multi-subspace mixed convolution module that initially maps sonar images into different subspaces and extracts local contextual features using varying convolutional receptive fields within these heterogeneous subspaces.Subsequently,a Scale-aware aggregation module effectively aggregates the heterogeneous features extracted from different subspaces.Finally,the multi-scale attention structure further enhances the relational perception of the aggregated features.We evaluated ProNet on three FLS datasets of varying scenes,and experimental results indicate that ProNet outperforms the current state-of-the-art sonar image and general target detectors.展开更多
Underwater imaging is frequently influenced by factors such as illumination,scattering,and refraction,which can result in low image contrast and blurriness.Moreover,the presence of numerous small,overlapping targets r...Underwater imaging is frequently influenced by factors such as illumination,scattering,and refraction,which can result in low image contrast and blurriness.Moreover,the presence of numerous small,overlapping targets reduces detection accuracy.To address these challenges,first,green channel images are preprocessed to rectify color bias while improving contrast and clarity.Se-cond,the YOLO-DBS network that employs deformable convolution is proposed to enhance feature learning from underwater blurry images.The ECA attention mechanism is also introduced to strengthen feature focus.Moreover,a bidirectional feature pyramid net-work is utilized for efficient multilayer feature fusion while removing nodes that contribute minimally to detection performance.In addition,the SIoU loss function that considers factors such as angular error and distance deviation is incorporated into the network.Validation on the RUOD dataset demonstrates that YOLO-DBS achieves approximately 3.1%improvement in mAP@0.5 compared with YOLOv8n and surpasses YOLOv9-tiny by 1.3%.YOLO-DBS reduces parameter count by 32%relative to YOLOv8n,thereby demonstrating superior performance in real-time detection on underwater observation platforms.展开更多
Addressing the challenges in detecting surface floating litter in artificial lakes,including complex environments,uneven illumination,and susceptibility to noise andweather,this paper proposes an efficient and lightwe...Addressing the challenges in detecting surface floating litter in artificial lakes,including complex environments,uneven illumination,and susceptibility to noise andweather,this paper proposes an efficient and lightweight Ghost-YOLO(You Only Look Once)v8 algorithm.The algorithmintegrates advanced attention mechanisms and a smalltarget detection head to significantly enhance detection performance and efficiency.Firstly,an SE(Squeeze-and-Excitation)mechanism is incorporated into the backbone network to fortify the extraction of resilient features and precise target localization.This mechanism models feature channel dependencies,enabling adaptive adjustment of channel importance,thereby improving recognition of floating litter targets.Secondly,a 160×160 small-target detection layer is designed in the feature fusion neck to mitigate semantic information loss due to varying target scales.This design enhances the fusion of deep and shallow semantic information,improving small target feature representation and enabling better capture and identification of tiny floating litter.Thirdly,to balance performance and efficiency,the GhostConv module replaces part of the conventional convolutions in the feature fusion neck.Additionally,a novel C2fGhost(CSPDarknet53 to 2-Stage Feature Pyramid Networks Ghost)module is introduced to further reduce network parameters.Lastly,to address the challenge of occlusion,a newloss function,WIoU(Wise Intersection over Union)v3 incorporating a flexible and non-monotonic concentration approach,is adopted to improve detection rates for surface floating litter.The outcomes of the experiments demonstrate that the Ghost-YOLO v8 model proposed in this paper performs well in the dataset Marine,significantly enhances precision and recall by 3.3 and 7.6 percentage points,respectively,in contrast with the base model,mAP@0.5 and mAP 0.5:0.95 improve by 5.3 and 4.4 percentage points and reduces the computational volume by 1.88MB,the FPS value hardly decreases,and the efficient real-time identification of floating debris on the water’s surface can be achieved costeffectively.展开更多
Under the influence of air humidity,dust,aerosols,etc.,in real scenes,haze presents an uneven state.In this way,the image quality and contrast will decrease.In this case,It is difficult to detect the target in the ima...Under the influence of air humidity,dust,aerosols,etc.,in real scenes,haze presents an uneven state.In this way,the image quality and contrast will decrease.In this case,It is difficult to detect the target in the image by the universal detection network.Thus,a dual subnet based on multi-task collaborative training(DSMCT)is proposed in this paper.Firstly,in the training phase,the Gated Context Aggregation Network(GCANet)is used as the supervisory network of YOLOX to promote the extraction of clean information in foggy scenes.In the test phase,only the YOLOX branch needs to be activated to ensure the detection speed of the model.Secondly,the deformable convolution module is used to improve GCANet to enhance the model’s ability to capture details of non-homogeneous fog.Finally,the Coordinate Attention mechanism is introduced into the Vision Transformer and the backbone network of YOLOX is redesigned.In this way,the feature extraction ability of the network for deep-level information can be enhanced.The experimental results on artificial fog data set FOG_VOC and real fog data set RTTS show that the map value of DSMCT reached 86.56%and 62.39%,respectively,which was 2.27%and 4.41%higher than the current most advanced detection model.The DSMCT network has high practicality and effectiveness for target detection in real foggy scenes.展开更多
In the field of remote sensing,the rapid and accurate acquisition of the category and location of airplanes has emerged as a prominent research.However,remote sensing fuzzy imaging and complex environmental interferen...In the field of remote sensing,the rapid and accurate acquisition of the category and location of airplanes has emerged as a prominent research.However,remote sensing fuzzy imaging and complex environmental interference affect airplane detection.Besides,the inconsistency in the size of remote sensing images and the low accuracy of small target detection are crucial challenges that need to be addressed.To tackle these issues,we propose a novel network SDaDCS(SAHI-data augmentation-dilation-channel and spatial attention)based on YOLOX model and the slicing aided hyper inference(SAHI)framework,a new data augmentation technique and dilation-channel and spatial(DCS)attention mechanism.Initially,we create a remote sensing dataset for airplane targets and introduce a new data augmentation technique based on the Rotate-Mixup and mixed data augmentation to enhance data diversity.The DCS attention mechanism,which comprises the dilated convolution block,channel attention and spatial attention,is designed to bolster the feature extraction and discrimination of the network.To address the challenges arised by the difficulties of detecting small targets,we integrate the YOLOX model with the SAHI framework.Experiment results show that,when compared to the original YOLOX model,the proposed SDaDCS remote sensing target detection algorithm enhances overall accuracy by 13.6%.The experimental results validate the effectiveness of the proposed algorithm.展开更多
Target detection is an important task in computer vision research, and such an anomaly detection and the topic of small target detection task is more concerned. However, there are still some problems in this kind of r...Target detection is an important task in computer vision research, and such an anomaly detection and the topic of small target detection task is more concerned. However, there are still some problems in this kind of researches, such as small target detection in complex environments is susceptible to background interference and poor detection results. To solve these issues, this study proposes a method which introduces the attention mechanism into the you only look once(YOLO) network. In addition, the amateur-produced mask dataset was created and experiments were conducted. The results showed that the detection effect of the proposed mothed is much better.展开更多
Infrared detection technology has the advantages of all-weather detection and good concealment,which is widely used in long-distance target detection and tracking systems.However,the complex background,the strong nois...Infrared detection technology has the advantages of all-weather detection and good concealment,which is widely used in long-distance target detection and tracking systems.However,the complex background,the strong noise,and the characteristics of small scale and weak intensity of targets bring great difficulties to the detection of infrared small targets.A multi-channel based on attention network is proposed in this paper,aimed at the problem of high missed detection rate and false alarm rate of traditional algorithms and the problem of large model,high complexity and poor detection performance of deep learning algorithms.First,given the difficulty in extracting the features of infrared multiscale and small dim targets,the multiple channels are designed based on dilated convolution to capture multiscale target features.Second,the coordinate attention block is incorporated in each channel to suppress background clutters adaptively and enhance target features.In addition,the fusion of shallow detail features and deep abstract semantic features is realized by synthesizing the contextual attention fusion block.Finally,it is verified that,compared with other state-of-the-art methods based on the datasets SIRST and MDFA,the proposed algorithm further improves the detection effect,and the model size and computational complexity are smaller.展开更多
In order to address the problem of high false alarm rate and low probabilities of infrared small target detection in complex low-altitude background,an infrared small target detection method based on improved weighted...In order to address the problem of high false alarm rate and low probabilities of infrared small target detection in complex low-altitude background,an infrared small target detection method based on improved weighted local contrast is proposed in this paper.First,the ratio information between the target and local background is utilized as an enhancement factor.The local contrast is calculated by incorporating the heterogeneity between the target and local background.Then,a local product weighted method is designed based on the spatial dissimilarity between target and background to further enhance target while suppressing background.Finally,the location of target is obtained by adaptive threshold segmentation.As experimental results demonstrate,the method shows superior performance in several evaluation metrics compared with six existing algorithms on different datasets containing targets such as unmanned aerial vehicles(UAV).展开更多
This paper expounds upon a novel target detection methodology distinguished by its elevated discriminatory efficacy,specifically tailored for environments characterized by markedly low luminance levels.Conventional me...This paper expounds upon a novel target detection methodology distinguished by its elevated discriminatory efficacy,specifically tailored for environments characterized by markedly low luminance levels.Conventional methodologies struggle with the challenges posed by luminosity fluctuations,especially in settings characterized by diminished radiance,further exacerbated by the utilization of suboptimal imaging instrumentation.The envisioned approach mandates a departure from the conventional YOLOX model,which exhibits inadequacies in mitigating these challenges.To enhance the efficacy of this approach in low-light conditions,the dehazing algorithm undergoes refinement,effecting a discerning regulation of the transmission rate at the pixel level,reducing it to values below 0.5,thereby resulting in an augmentation of image contrast.Subsequently,the coiflet wavelet transform is employed to discern and isolate high-discriminatory attributes by dismantling low-frequency image attributes and extracting high-frequency attributes across divergent axes.The utilization of CycleGAN serves to elevate the features of low-light imagery across an array of stylistic variances.Advanced computational methodologies are then employed to amalgamate and conflate intricate attributes originating from images characterized by distinct stylistic orientations,thereby augmenting the model’s erudition potential.Empirical validation conducted on the PASCAL VOC and MS COCO 2017 datasets substantiates pronounced advancements.The refined low-light enhancement algorithm yields a discernible 5.9%augmentation in the target detection evaluation index when compared to the original imagery.Mean Average Precision(mAP)undergoes enhancements of 9.45%and 0.052%in low-light visual renditions relative to conventional YOLOX outcomes.The envisaged approach presents a myriad of advantages over prevailing benchmark methodologies in the realm of target detection within environments marked by an acute scarcity of luminosity.展开更多
To address the challenges of missed detections in water surface target detection using solely visual algorithms in unmanned surface vehicle(USV)perception,this paper proposes a method based on the fusion of visual and...To address the challenges of missed detections in water surface target detection using solely visual algorithms in unmanned surface vehicle(USV)perception,this paper proposes a method based on the fusion of visual and LiDAR point-cloud projection for water surface target detection.Firstly,the visual recognition component employs an improved YOLOv7 algorithmbased on a self-built dataset for the detection of water surface targets.This algorithm modifies the original YOLOv7 architecture to a Slim-Neck structure,addressing the problemof excessive redundant information during feature extraction in the original YOLOv7 network model.Simultaneously,this modification simplifies the computational burden of the detector,reduces inference time,and maintains accuracy.Secondly,to tackle the issue of sample imbalance in the self-built dataset,slide loss function is introduced.Finally,this paper replaces the original Complete Intersection over Union(CIoU)loss function with the Minimum Point Distance Intersection over Union(MPDIoU)loss function in the YOLOv7 algorithm,which accelerates model learning and enhances robustness.To mitigate the problem of missed recognitions caused by complex water surface conditions in purely visual algorithms,this paper further adopts the fusion of LiDAR and camera data,projecting the threedimensional point-cloud data from LiDAR onto a two-dimensional pixel plane.This significantly reduces the rate of missed detections for water surface targets.展开更多
In order to solve the problems that the current synthetic aperture radar(SAR)image target detection method cannot adapt to targets of different sizes,and the complex image background leads to low detection accuracy,an...In order to solve the problems that the current synthetic aperture radar(SAR)image target detection method cannot adapt to targets of different sizes,and the complex image background leads to low detection accuracy,an improved SAR image small target detection method based on YOLOv7 was proposed in this study.The proposed method improved the feature extraction network by using Switchable Around Convolution(SAConv)in the backbone network to help the model capture target information at different scales,thus improving the feature extraction ability for small targets.Based on the attention mechanism,the DyHead module was embedded in the target detection head to reduce the impact of complex background,and better focus on the small targets.In addition,the NWD loss function was introduced and combined with CIoU loss.Compared to the CIoU loss function typically used in YOLOv7,the NWD loss function pays more attention to the processing of small targets,so as to further improve the detection ability of small targets.The experimental results on the HRSID dataset indicate that the proposed method achieved mAP@0.5 and mAP@0.95 scores of 93.5%and 71.5%,respectively.Compared to the baseline model,this represents an increase of 7.2%and 7.6%,respectively.The proposed method can effectively complete the task of SAR image small target detection.展开更多
基金supported by the National Natural Science Foundation of China(No.62241109)the Tianjin Science and Technology Commissioner Project(No.20YDTPJC01110)。
文摘An improved model based on you only look once version 8(YOLOv8)is proposed to solve the problem of low detection accuracy due to the diversity of object sizes in optical remote sensing images.Firstly,the feature pyramid network(FPN)structure of the original YOLOv8 mode is replaced by the generalized-FPN(GFPN)structure in GiraffeDet to realize the"cross-layer"and"cross-scale"adaptive feature fusion,to enrich the semantic information and spatial information on the feature map to improve the target detection ability of the model.Secondly,a pyramid-pool module of multi atrous spatial pyramid pooling(MASPP)is designed by using the idea of atrous convolution and feature pyramid structure to extract multi-scale features,so as to improve the processing ability of the model for multi-scale objects.The experimental results show that the detection accuracy of the improved YOLOv8 model on DIOR dataset is 92%and mean average precision(mAP)is 87.9%,respectively 3.5%and 1.7%higher than those of the original model.It is proved the detection and classification ability of the proposed model on multi-dimensional optical remote sensing target has been improved.
文摘The application of deep learning for target detection in aerial images captured by Unmanned Aerial Vehicles(UAV)has emerged as a prominent research focus.Due to the considerable distance between UAVs and the photographed objects,coupled with complex shooting environments,existing models often struggle to achieve accurate real-time target detection.In this paper,a You Only Look Once v8(YOLOv8)model is modified from four aspects:the detection head,the up-sampling module,the feature extraction module,and the parameter optimization of positive sample screening,and the YOLO-S3DT model is proposed to improve the performance of the model for detecting small targets in aerial images.Experimental results show that all detection indexes of the proposed model are significantly improved without increasing the number of model parameters and with the limited growth of computation.Moreover,this model also has the best performance compared to other detecting models,demonstrating its advancement within this category of tasks.
基金supported by the National Natural Science Foundation of China (No.52205548)。
文摘To address the issues of unknown target size,blurred edges,background interference and low contrast in infrared small target detection,this paper proposes a method based on density peaks searching and weighted multi-feature local difference.Firstly,an improved high-boost filter is used for preprocessing to eliminate background clutter and high-brightness interference,thereby increasing the probability of capturing real targets in the density peak search.Secondly,a triple-layer window is used to extract features from the area surrounding candidate targets,addressing the uncertainty of small target sizes.By calculating multi-feature local differences between the triple-layer windows,the problems of blurred target edges and low contrast are resolved.To balance the contribution of different features,intra-class distance is used to calculate weights,achieving weighted fusion of multi-feature local differences to obtain the weighted multi-feature local differences of candidate targets.The real targets are then extracted using the interquartile range.Experiments on datasets such as SIRST and IRSTD-IK show that the proposed method is suitable for various complex types and demonstrates good robustness and detection performance.
文摘Unmanned aerial vehicle(UAV)imagery poses significant challenges for object detection due to extreme scale variations,high-density small targets(68%in VisDrone dataset),and complex backgrounds.While YOLO-series models achieve speed-accuracy trade-offs via fixed convolution kernels and manual feature fusion,their rigid architectures struggle with multi-scale adaptability,as exemplified by YOLOv8n’s 36.4%mAP and 13.9%small-object AP on VisDrone2019.This paper presents YOLO-LE,a lightweight framework addressing these limitations through three novel designs:(1)We introduce the C2f-Dy and LDown modules to enhance the backbone’s sensitivity to small-object features while reducing backbone parameters,thereby improving model efficiency.(2)An adaptive feature fusion module is designed to dynamically integrate multi-scale feature maps,optimizing the neck structure,reducing neck complexity,and enhancing overall model performance.(3)We replace the original loss function with a distributed focal loss and incorporate a lightweight self-attention mechanism to improve small-object recognition and bounding box regression accuracy.Experimental results demonstrate that YOLO-LE achieves 39.9%mAP@0.5 on VisDrone2019,representing a 9.6%improvement over YOLOv8n,while maintaining 8.5 GFLOPs computational efficiency.This provides an efficient solution for UAV object detection in complex scenarios.
基金supported in part by the National Natural Science Foundation of China Grants 62402085,61972062,62306060the Liaoning Doctoral Research Start-Up Fund 2023-BS-078+1 种基金the Dalian Youth Science and Technology Star Project 2023RQ023the Liaoning Basic Research Project 2023JH2/101300191.
文摘Underwater target detection is extensively applied in domains such as underwater search and rescue,environmental monitoring,and marine resource surveys.It is crucial in enabling autonomous underwater robot operations and promoting ocean exploration.Nevertheless,low imaging quality,harsh underwater environments,and obscured objects considerably increase the difficulty of detecting underwater targets,making it difficult for current detection methods to achieve optimal performance.In order to enhance underwater object perception and improve target detection precision,we propose a lightweight underwater target detection method using You Only Look Once(YOLO)v8 with multi-scale cross-channel attention(MSCCA),named YOLOv8-UOD.In the proposed multiscale cross-channel attention module,multi-scale attention(MSA)augments the variety of attentional perception by extracting information from innately diverse sensory fields.The cross-channel strategy utilizes RepVGGbased channel shuffling(RCS)and one-shot aggregation(OSA)to rearrange feature map channels according to specific rules.It aggregates all features only once in the final feature mapping,resulting in the extraction of more comprehensive and valuable feature information.The experimental results show that the proposed YOLOv8-UOD achieves a mAP50 of 95.67%and FLOPs of 23.8 G on the Underwater Robot Picking Contest 2017(URPC2017)dataset,outperforming other methods in terms of detection precision and computational cost-efficiency.
基金Supported by the Key Laboratory Fund for Equipment Pre-Research(6142207210202)。
文摘Aiming at the problem that infrared small target detection faces low contrast between the background and the target and insufficient noise suppression ability under the complex cloud background,an infrared small target detection method based on the tensor nuclear norm and direction residual weighting was proposed.Based on converting the infrared image into an infrared patch tensor model,from the perspective of the low-rank nature of the background tensor,and taking advantage of the difference in contrast between the background and the target in different directions,we designed a double-neighborhood local contrast based on direction residual weighting method(DNLCDRW)combined with the partial sum of tensor nuclear norm(PSTNN)to achieve effective background suppression and recovery of infrared small targets.Experiments show that the algorithm is effective in suppressing the background and improving the detection ability of the target.
基金supported by the National Natural Science Foundation of China(61806024,62206257)the Jilin Province Science and Technology Development Plan Key Research and Development Project(20210204050YY)+1 种基金the Wuxi University Research Start-up Fund for Introduced Talents(2023r004,2023r006)Jiangsu Engineering Research Center of Hyperconvergence Application and Security of IoT Devices,Jiangsu Foreign Expert Workshop,Wuxi City Internet of Vehicles Key Laboratory.
文摘In this paper,a reasoning enhancement method based on RGCN(Relational Graph Convolutional Network)is proposed to improve the detection capability of UAV(Unmanned Aerial Vehicle)on fast-moving military targets in urban battlefield environments.By combining military images with the publicly available VisDrone2019 dataset,a new dataset called VisMilitary was built and multiple YOLO(You Only Look Once)models were tested on it.Due to the low confidence problem caused by fuzzy targets,the performance of traditional YOLO models on real battlefield images decreases significantly.Therefore,we propose an improved RGCN inference model,which improves the performance of the model in complex environments by optimizing the data processing and graph network architecture.Experimental results show that the proposed method achieves an improvement of 0.4%to 1.7%on mAP@0.50,which proves the effectiveness of the model in military target detection.The research of this paper provides a new technical path for UAV target detection in urban battlefield,and provides important enlightenment for the application of deep learning in military field.
基金supported by the National Key R&D Program“Development and Application Verification of Underwater Intelligent Defect Detection Robot System for Large Hydropower Station Dams”(Project No.2022YFB4703400)sub-topic 4“Research on Intelligent Identification and Diagnosis of Dam Defects and Fine Inspection Equipment and Technology of Hydropower Stations”(Project No.2022YFB4703404)supported in part by the National Natural Science Foundation of China under Grant 62371181in part by the Changzhou Science and Technology International Cooperation Program under Grant CZ20230029。
文摘Infrared small-target detection has important applications in many fields due to its high penetration capability and detection distance.This study introduces a detector called“YOLO-SDLUWD”which is based on the YOLOv7 network,for small target detection in complex infrared backgrounds.The“SDLUWD”refers to the combination of the Spatial Depth layer followed Convolutional layer structure(SD-Conv)and a Linear Up-sampling fusion Path Aggregation Feature Pyramid Network(LU-PAFPN)and a training strategy based on the normalized Gaussian Wasserstein Distance loss(WD-loss)function.“YOLO-SDLUWD”aims to reduce detection accuracy when the maximum pooling downsampling layer in the backbone network loses important feature information,support the interaction and fusion of high-dimensional and low-dimensional feature information,and overcome the false alarm predictions induced by noise in small target images.The detector achieved a mAP@0.5 of 90.4%and mAP@0.5:0.95 of 48.5%on IRIS-AG,an increase of 9%-11%over YOLOv7-tiny,outperforming other state-of-the-art target detectors in terms of accuracy and speed.
基金supported in part by the National Natural Science Foundation of China under Grant 62271302the Shanghai Municipal Natural Science Foundation under Grant 20ZR1423500.
文摘Infrared images typically exhibit diverse backgrounds,each potentially containing noise and target-like interference elements.In complex backgrounds,infrared small targets are prone to be submerged by background noise due to their low pixel proportion and limited available features,leading to detection failure.To address this problem,this paper proposes an Attention Shift-Invariant Cross-Evolutionary Feature Fusion Network(ASCFNet)tailored for the detection of infrared weak and small targets.The network architecture first designs a Multidimensional Lightweight Pixel-level Attention Module(MLPA),which alleviates the issue of small-target feature suppression during deep network propagation by combining channel reshaping,multi-scale parallel subnet architectures,and local cross-channel interactions.Then,a Multidimensional Shift-Invariant Recall Module(MSIR)is designed to ensure the network remains unaffected by minor input perturbations when processing infrared images,through focusing on the model’s shift invariance.Subsequently,a Cross-Evolutionary Feature Fusion structure(CEFF)is designed to allow flexible and efficient integration of multidimensional feature information from different network hierarchies,thereby achieving complementarity and enhancement among features.Experimental results on three public datasets,SIRST,NUDT-SIRST,and IRST640,demonstrate that our proposed network outperforms advanced algorithms in the field.Specifically,on the NUDT-SIRST dataset,the mAP50,mAP50-95,and metrics reached 99.26%,85.22%,and 99.31%,respectively.Visual evaluations of detection results in diverse scenarios indicate that our algorithm exhibits an increased detection rate and reduced false alarm rate.Our method balances accuracy and real-time performance,and achieves efficient and stable detection of infrared weak and small targets.
基金supported in part by Youth Innovation Promotion Association,Chinese Academy of Sciences under Grant 2022022in part by South China Sea Nova project of Hainan Province under Grant NHXXRCXM202340in part by the Scientific Research Foundation Project of Hainan Acoustics Laboratory under grant ZKNZ2024001.
文摘Underwater target detection in forward-looking sonar(FLS)images is a challenging but promising endeavor.The existing neural-based methods yield notable progress but there remains room for improvement due to overlooking the unique characteristics of underwater environments.Considering the problems of low imaging resolution,complex background environment,and large changes in target imaging of underwater sonar images,this paper specifically designs a sonar images target detection Network based on Progressive sensitivity capture,named ProNet.It progressively captures the sensitive regions in the current image where potential effective targets may exist.Guided by this basic idea,the primary technical innovation of this paper is the introduction of a foundational module structure for constructing a sonar target detection backbone network.This structure employs a multi-subspace mixed convolution module that initially maps sonar images into different subspaces and extracts local contextual features using varying convolutional receptive fields within these heterogeneous subspaces.Subsequently,a Scale-aware aggregation module effectively aggregates the heterogeneous features extracted from different subspaces.Finally,the multi-scale attention structure further enhances the relational perception of the aggregated features.We evaluated ProNet on three FLS datasets of varying scenes,and experimental results indicate that ProNet outperforms the current state-of-the-art sonar image and general target detectors.
基金funded by the Jilin City Science and Technology Innovation Development Plan Project(No.20240302014)the Jilin Provincial Department of Educa-tion Science and Technology Research Project(No.JJKH 20250879KJ)the Jilin Province Science and Tech-nology Development Plan Project(No.YDZJ202401640 ZYTS).
文摘Underwater imaging is frequently influenced by factors such as illumination,scattering,and refraction,which can result in low image contrast and blurriness.Moreover,the presence of numerous small,overlapping targets reduces detection accuracy.To address these challenges,first,green channel images are preprocessed to rectify color bias while improving contrast and clarity.Se-cond,the YOLO-DBS network that employs deformable convolution is proposed to enhance feature learning from underwater blurry images.The ECA attention mechanism is also introduced to strengthen feature focus.Moreover,a bidirectional feature pyramid net-work is utilized for efficient multilayer feature fusion while removing nodes that contribute minimally to detection performance.In addition,the SIoU loss function that considers factors such as angular error and distance deviation is incorporated into the network.Validation on the RUOD dataset demonstrates that YOLO-DBS achieves approximately 3.1%improvement in mAP@0.5 compared with YOLOv8n and surpasses YOLOv9-tiny by 1.3%.YOLO-DBS reduces parameter count by 32%relative to YOLOv8n,thereby demonstrating superior performance in real-time detection on underwater observation platforms.
基金Supported by the fund of the Henan Province Science and Technology Research Project(No.242102210213).
文摘Addressing the challenges in detecting surface floating litter in artificial lakes,including complex environments,uneven illumination,and susceptibility to noise andweather,this paper proposes an efficient and lightweight Ghost-YOLO(You Only Look Once)v8 algorithm.The algorithmintegrates advanced attention mechanisms and a smalltarget detection head to significantly enhance detection performance and efficiency.Firstly,an SE(Squeeze-and-Excitation)mechanism is incorporated into the backbone network to fortify the extraction of resilient features and precise target localization.This mechanism models feature channel dependencies,enabling adaptive adjustment of channel importance,thereby improving recognition of floating litter targets.Secondly,a 160×160 small-target detection layer is designed in the feature fusion neck to mitigate semantic information loss due to varying target scales.This design enhances the fusion of deep and shallow semantic information,improving small target feature representation and enabling better capture and identification of tiny floating litter.Thirdly,to balance performance and efficiency,the GhostConv module replaces part of the conventional convolutions in the feature fusion neck.Additionally,a novel C2fGhost(CSPDarknet53 to 2-Stage Feature Pyramid Networks Ghost)module is introduced to further reduce network parameters.Lastly,to address the challenge of occlusion,a newloss function,WIoU(Wise Intersection over Union)v3 incorporating a flexible and non-monotonic concentration approach,is adopted to improve detection rates for surface floating litter.The outcomes of the experiments demonstrate that the Ghost-YOLO v8 model proposed in this paper performs well in the dataset Marine,significantly enhances precision and recall by 3.3 and 7.6 percentage points,respectively,in contrast with the base model,mAP@0.5 and mAP 0.5:0.95 improve by 5.3 and 4.4 percentage points and reduces the computational volume by 1.88MB,the FPS value hardly decreases,and the efficient real-time identification of floating debris on the water’s surface can be achieved costeffectively.
基金This work was jointly supported by the Special Fund for Transformation and Upgrade of Jiangsu Industry and Information Industry-Key Core Technologies(Equipment)Key Industrialization Projects in 2022(No.CMHI-2022-RDG-004):“Key Technology Research for Development of Intelligent Wind Power Operation and Maintenance Mothership in Deep Sea”.
文摘Under the influence of air humidity,dust,aerosols,etc.,in real scenes,haze presents an uneven state.In this way,the image quality and contrast will decrease.In this case,It is difficult to detect the target in the image by the universal detection network.Thus,a dual subnet based on multi-task collaborative training(DSMCT)is proposed in this paper.Firstly,in the training phase,the Gated Context Aggregation Network(GCANet)is used as the supervisory network of YOLOX to promote the extraction of clean information in foggy scenes.In the test phase,only the YOLOX branch needs to be activated to ensure the detection speed of the model.Secondly,the deformable convolution module is used to improve GCANet to enhance the model’s ability to capture details of non-homogeneous fog.Finally,the Coordinate Attention mechanism is introduced into the Vision Transformer and the backbone network of YOLOX is redesigned.In this way,the feature extraction ability of the network for deep-level information can be enhanced.The experimental results on artificial fog data set FOG_VOC and real fog data set RTTS show that the map value of DSMCT reached 86.56%and 62.39%,respectively,which was 2.27%and 4.41%higher than the current most advanced detection model.The DSMCT network has high practicality and effectiveness for target detection in real foggy scenes.
基金supported in part by National Natural Science Foundation of China(No.62471034)Hebei Natural Science Foundation(No.F2023105001)。
文摘In the field of remote sensing,the rapid and accurate acquisition of the category and location of airplanes has emerged as a prominent research.However,remote sensing fuzzy imaging and complex environmental interference affect airplane detection.Besides,the inconsistency in the size of remote sensing images and the low accuracy of small target detection are crucial challenges that need to be addressed.To tackle these issues,we propose a novel network SDaDCS(SAHI-data augmentation-dilation-channel and spatial attention)based on YOLOX model and the slicing aided hyper inference(SAHI)framework,a new data augmentation technique and dilation-channel and spatial(DCS)attention mechanism.Initially,we create a remote sensing dataset for airplane targets and introduce a new data augmentation technique based on the Rotate-Mixup and mixed data augmentation to enhance data diversity.The DCS attention mechanism,which comprises the dilated convolution block,channel attention and spatial attention,is designed to bolster the feature extraction and discrimination of the network.To address the challenges arised by the difficulties of detecting small targets,we integrate the YOLOX model with the SAHI framework.Experiment results show that,when compared to the original YOLOX model,the proposed SDaDCS remote sensing target detection algorithm enhances overall accuracy by 13.6%.The experimental results validate the effectiveness of the proposed algorithm.
基金supported by the National Key Research and Development Program of China (No.2022YFE0196000)the National Natural Science Foundation of China (No.61502429)。
文摘Target detection is an important task in computer vision research, and such an anomaly detection and the topic of small target detection task is more concerned. However, there are still some problems in this kind of researches, such as small target detection in complex environments is susceptible to background interference and poor detection results. To solve these issues, this study proposes a method which introduces the attention mechanism into the you only look once(YOLO) network. In addition, the amateur-produced mask dataset was created and experiments were conducted. The results showed that the detection effect of the proposed mothed is much better.
基金the Industry-University-Research Cooperation Fund Project of the Eighth Research Institute of China Aerospace Science and Technology Corporation (No.USCAST2021-5)the Major Scientific Instrument Research of National Natural Science Foundation of China (No.61627810)+1 种基金the National Science and Technology Major Program of China (No.2018YFB1305003)the National Defense Science and Technology Outstanding Youth Science Foundation (No.2017-JCJQ-ZQ-031)。
文摘Infrared detection technology has the advantages of all-weather detection and good concealment,which is widely used in long-distance target detection and tracking systems.However,the complex background,the strong noise,and the characteristics of small scale and weak intensity of targets bring great difficulties to the detection of infrared small targets.A multi-channel based on attention network is proposed in this paper,aimed at the problem of high missed detection rate and false alarm rate of traditional algorithms and the problem of large model,high complexity and poor detection performance of deep learning algorithms.First,given the difficulty in extracting the features of infrared multiscale and small dim targets,the multiple channels are designed based on dilated convolution to capture multiscale target features.Second,the coordinate attention block is incorporated in each channel to suppress background clutters adaptively and enhance target features.In addition,the fusion of shallow detail features and deep abstract semantic features is realized by synthesizing the contextual attention fusion block.Finally,it is verified that,compared with other state-of-the-art methods based on the datasets SIRST and MDFA,the proposed algorithm further improves the detection effect,and the model size and computational complexity are smaller.
基金supported by the National Natural Science Foundation of China (No.U1833203),the National Natural Science Foundation of China (No.62301036)the Aviation Science Foundation (No.2020Z019055001)China Postdoctoral Science Foundation Funded Project (No.2022M720446)。
文摘In order to address the problem of high false alarm rate and low probabilities of infrared small target detection in complex low-altitude background,an infrared small target detection method based on improved weighted local contrast is proposed in this paper.First,the ratio information between the target and local background is utilized as an enhancement factor.The local contrast is calculated by incorporating the heterogeneity between the target and local background.Then,a local product weighted method is designed based on the spatial dissimilarity between target and background to further enhance target while suppressing background.Finally,the location of target is obtained by adaptive threshold segmentation.As experimental results demonstrate,the method shows superior performance in several evaluation metrics compared with six existing algorithms on different datasets containing targets such as unmanned aerial vehicles(UAV).
基金supported by National Sciences Foundation of China Grants(No.61902158).
文摘This paper expounds upon a novel target detection methodology distinguished by its elevated discriminatory efficacy,specifically tailored for environments characterized by markedly low luminance levels.Conventional methodologies struggle with the challenges posed by luminosity fluctuations,especially in settings characterized by diminished radiance,further exacerbated by the utilization of suboptimal imaging instrumentation.The envisioned approach mandates a departure from the conventional YOLOX model,which exhibits inadequacies in mitigating these challenges.To enhance the efficacy of this approach in low-light conditions,the dehazing algorithm undergoes refinement,effecting a discerning regulation of the transmission rate at the pixel level,reducing it to values below 0.5,thereby resulting in an augmentation of image contrast.Subsequently,the coiflet wavelet transform is employed to discern and isolate high-discriminatory attributes by dismantling low-frequency image attributes and extracting high-frequency attributes across divergent axes.The utilization of CycleGAN serves to elevate the features of low-light imagery across an array of stylistic variances.Advanced computational methodologies are then employed to amalgamate and conflate intricate attributes originating from images characterized by distinct stylistic orientations,thereby augmenting the model’s erudition potential.Empirical validation conducted on the PASCAL VOC and MS COCO 2017 datasets substantiates pronounced advancements.The refined low-light enhancement algorithm yields a discernible 5.9%augmentation in the target detection evaluation index when compared to the original imagery.Mean Average Precision(mAP)undergoes enhancements of 9.45%and 0.052%in low-light visual renditions relative to conventional YOLOX outcomes.The envisaged approach presents a myriad of advantages over prevailing benchmark methodologies in the realm of target detection within environments marked by an acute scarcity of luminosity.
基金supported by the National Natural Science Foundation of China(No.51876114)the Shanghai Engineering Research Center of Marine Renewable Energy(Grant No.19DZ2254800).
文摘To address the challenges of missed detections in water surface target detection using solely visual algorithms in unmanned surface vehicle(USV)perception,this paper proposes a method based on the fusion of visual and LiDAR point-cloud projection for water surface target detection.Firstly,the visual recognition component employs an improved YOLOv7 algorithmbased on a self-built dataset for the detection of water surface targets.This algorithm modifies the original YOLOv7 architecture to a Slim-Neck structure,addressing the problemof excessive redundant information during feature extraction in the original YOLOv7 network model.Simultaneously,this modification simplifies the computational burden of the detector,reduces inference time,and maintains accuracy.Secondly,to tackle the issue of sample imbalance in the self-built dataset,slide loss function is introduced.Finally,this paper replaces the original Complete Intersection over Union(CIoU)loss function with the Minimum Point Distance Intersection over Union(MPDIoU)loss function in the YOLOv7 algorithm,which accelerates model learning and enhances robustness.To mitigate the problem of missed recognitions caused by complex water surface conditions in purely visual algorithms,this paper further adopts the fusion of LiDAR and camera data,projecting the threedimensional point-cloud data from LiDAR onto a two-dimensional pixel plane.This significantly reduces the rate of missed detections for water surface targets.
文摘In order to solve the problems that the current synthetic aperture radar(SAR)image target detection method cannot adapt to targets of different sizes,and the complex image background leads to low detection accuracy,an improved SAR image small target detection method based on YOLOv7 was proposed in this study.The proposed method improved the feature extraction network by using Switchable Around Convolution(SAConv)in the backbone network to help the model capture target information at different scales,thus improving the feature extraction ability for small targets.Based on the attention mechanism,the DyHead module was embedded in the target detection head to reduce the impact of complex background,and better focus on the small targets.In addition,the NWD loss function was introduced and combined with CIoU loss.Compared to the CIoU loss function typically used in YOLOv7,the NWD loss function pays more attention to the processing of small targets,so as to further improve the detection ability of small targets.The experimental results on the HRSID dataset indicate that the proposed method achieved mAP@0.5 and mAP@0.95 scores of 93.5%and 71.5%,respectively.Compared to the baseline model,this represents an increase of 7.2%and 7.6%,respectively.The proposed method can effectively complete the task of SAR image small target detection.