Visible and infrared(RGB-IR)fusion object detection plays an important role in security,disaster relief,etc.In recent years,deep-learning-based RGB-IR fusion detection methods have been developing rapidly,but still st...Visible and infrared(RGB-IR)fusion object detection plays an important role in security,disaster relief,etc.In recent years,deep-learning-based RGB-IR fusion detection methods have been developing rapidly,but still struggle to deal with the complex and changing scenarios captured by drones,mainly due to two reasons:(A)RGB-IR fusion detectors are susceptible to inferior inputs that degrade performance and stability.(B)RGB-IR fusion detectors are susceptible to redundant features that reduce accuracy and efficiency.In this paper,an innovative RGB-IR fusion detection framework based on global-local feature optimization,named GLFDet,is proposed to improve the detection performance and efficiency of drone-captured objects.The key components of GLFDet include a Global Feature Optimization(GFO)module,a Local Feature Optimization(LFO)module and a Channel Separation Fusion(CSF)module.Specifically,GFO calculates the information content of the input image from the frequency domain and optimizes the features holistically.Then,LFO dynamically selects high-value features and filters out low-value features before fusion,which significantly improves the efficiency of fusion.Finally,CSF fuses the RGB and IR features across the corresponding channels,which avoids the rearrangement of the channel relationships and enhances the model stability.Extensive experimental results show that the proposed method achieves the best performance on three popular RGB-IR datasets Drone Vehicle,VEDAI,and LLVIP.In addition,GLFDet is more lightweight than other comparable models,making it more appealing to edge devices such as drones.The code is available at https://github.com/lao chen330/GLFDet.展开更多
Parkinson’s disease remains a major clinical issue in terms of early detection,especially during its prodromal stage when symptoms are not evident or not distinct.To address this problem,we proposed a new deep learni...Parkinson’s disease remains a major clinical issue in terms of early detection,especially during its prodromal stage when symptoms are not evident or not distinct.To address this problem,we proposed a new deep learning 2-based approach for detecting Parkinson’s disease before any of the overt symptoms develop during their prodromal stage.We used 5 publicly accessible datasets,including UCI Parkinson’s Voice,Spiral Drawings,PaHaW,NewHandPD,and PPMI,and implemented a dual stream CNN–BiLSTM architecture with Fisher-weighted feature merging and SHAP-based explanation.The findings reveal that the model’s performance was superior and achieved 98.2%,a F1-score of 0.981,and AUC of 0.991 on the UCI Voice dataset.The model’s performance on the remaining datasets was also comparable,with up to a 2–7 percent betterment in accuracy compared to existing strong models such as CNN–RNN–MLP,ILN–GNet,and CASENet.Across the evidence,the findings back the diagnostic promise of micro-tremor assessment and demonstrate that combining temporal and spatial features with a scatter-based segment for a multi-modal approach can be an effective and scalable platform for an“early,”interpretable PD screening system.展开更多
Traffic sign detection is a critical component of driving systems.Single-stage network-based traffic sign detection algorithms,renowned for their fast detection speeds and high accuracy,have become the dominant approa...Traffic sign detection is a critical component of driving systems.Single-stage network-based traffic sign detection algorithms,renowned for their fast detection speeds and high accuracy,have become the dominant approach in current practices.However,in complex and dynamic traffic scenes,particularly with smaller traffic sign objects,challenges such as missed and false detections can lead to reduced overall detection accuracy.To address this issue,this paper proposes a detection algorithm that integrates edge and shape information.Recognizing that traffic signs have specific shapes and distinct edge contours,this paper introduces an edge feature extraction branch within the backbone network,enabling adaptive fusion with features of the same hierarchical level.Additionally,a shape prior convolution module is designed to replaces the first two convolutional modules of the backbone network,aimed at enhancing the model's perception ability for specific shape objects and reducing its sensitivity to background noise.The algorithm was evaluated on the CCTSDB and TT100k datasets,and compared to YOLOv8s,the mAP50 values increased by 3.0%and 10.4%,respectively,demonstrating the effectiveness of the proposed method in improving the accuracy of traffic sign detection.展开更多
In recent years,with the rapid advancement of artificial intelligence,object detection algorithms have made significant strides in accuracy and computational efficiency.Notably,research and applications of Anchor-Free...In recent years,with the rapid advancement of artificial intelligence,object detection algorithms have made significant strides in accuracy and computational efficiency.Notably,research and applications of Anchor-Free models have opened new avenues for real-time target detection in optical remote sensing images(ORSIs).However,in the realmof adversarial attacks,developing adversarial techniques tailored to Anchor-Freemodels remains challenging.Adversarial examples generated based on Anchor-Based models often exhibit poor transferability to these new model architectures.Furthermore,the growing diversity of Anchor-Free models poses additional hurdles to achieving robust transferability of adversarial attacks.This study presents an improved cross-conv-block feature fusion You Only Look Once(YOLO)architecture,meticulously engineered to facilitate the extraction ofmore comprehensive semantic features during the backpropagation process.To address the asymmetry between densely distributed objects in ORSIs and the corresponding detector outputs,a novel dense bounding box attack strategy is proposed.This approach leverages dense target bounding boxes loss in the calculation of adversarial loss functions.Furthermore,by integrating translation-invariant(TI)and momentum-iteration(MI)adversarial methodologies,the proposed framework significantly improves the transferability of adversarial attacks.Experimental results demonstrate that our method achieves superior adversarial attack performance,with adversarial transferability rates(ATR)of 67.53%on the NWPU VHR-10 dataset and 90.71%on the HRSC2016 dataset.Compared to ensemble adversarial attack and cascaded adversarial attack approaches,our method generates adversarial examples in an average of 0.64 s,representing an approximately 14.5%improvement in efficiency under equivalent conditions.展开更多
Camouflaged Object Detection(COD)aims to identify objects that share highly similar patterns—such as texture,intensity,and color—with their surrounding environment.Due to their intrinsic resemblance to the backgroun...Camouflaged Object Detection(COD)aims to identify objects that share highly similar patterns—such as texture,intensity,and color—with their surrounding environment.Due to their intrinsic resemblance to the background,camouflaged objects often exhibit vague boundaries and varying scales,making it challenging to accurately locate targets and delineate their indistinct edges.To address this,we propose a novel camouflaged object detection network called Edge-Guided and Multi-scale Fusion Network(EGMFNet),which leverages edge-guided multi-scale integration for enhanced performance.The model incorporates two innovative components:a Multi-scale Fusion Module(MSFM)and an Edge-Guided Attention Module(EGA).These designs exploit multi-scale features to uncover subtle cues between candidate objects and the background while emphasizing camouflaged object boundaries.Moreover,recognizing the rich contextual information in fused features,we introduce a Dual-Branch Global Context Module(DGCM)to refine features using extensive global context,thereby generatingmore informative representations.Experimental results on four benchmark datasets demonstrate that EGMFNet outperforms state-of-the-art methods across five evaluation metrics.Specifically,on COD10K,our EGMFNet-P improves F_(β)by 4.8 points and reduces mean absolute error(MAE)by 0.006 compared with ZoomNeXt;on NC4K,it achieves a 3.6-point increase in F_(β).OnCAMO and CHAMELEON,it obtains 4.5-point increases in F_(β),respectively.These consistent gains substantiate the superiority and robustness of EGMFNet.展开更多
In fire rescue scenarios,traditional manual operations are highly dangerous,as dense smoke,low visibility,extreme heat,and toxic gases not only hinder rescue efficiency but also endanger firefighters’safety.Although ...In fire rescue scenarios,traditional manual operations are highly dangerous,as dense smoke,low visibility,extreme heat,and toxic gases not only hinder rescue efficiency but also endanger firefighters’safety.Although intelligent rescue robots can enter hazardous environments in place of humans,smoke poses major challenges for human detection algorithms.These challenges include the attenuation of visible and infrared signals,complex thermal fields,and interference frombackground objects,all ofwhichmake it difficult to accurately identify trapped individuals.To address this problem,we propose VIF-YOLO,a visible–infrared fusion model for real-time human detection in dense smoke environments.The framework introduces a lightweight multimodal fusion(LMF)module based on learnable low-rank representation blocks to end-to-end integrate visible and infrared images,preserving fine details while enhancing salient features.In addition,an efficient multiscale attention(EMA)mechanism is incorporated into the YOLOv10n backbone to improve feature representation under low-light conditions.Extensive experiments on our newly constructedmultimodal smoke human detection(MSHD)dataset demonstrate thatVIF-YOLOachievesmAP50 of 99.5%,precision of 99.2%,and recall of 99.3%,outperforming YOLOv10n by a clear margin.Furthermore,when deployed on the NVIDIA Jetson Xavier NX,VIF-YOLO attains 40.6 FPS with an average inference latency of 24.6 ms,validating its real-time capability on edge-computing platforms.These results confirm that VIF-YOLO provides accurate,robust,and fast detection across complex backgrounds and diverse smoke conditions,ensuring reliable and rapid localization of individuals in need of rescue.展开更多
Impact craters are important for understanding the evolution of lunar geologic and surface erosion rates,among other functions.However,the morphological characteristics of these micro impact craters are not obvious an...Impact craters are important for understanding the evolution of lunar geologic and surface erosion rates,among other functions.However,the morphological characteristics of these micro impact craters are not obvious and they are numerous,resulting in low detection accuracy by deep learning models.Therefore,we proposed a new multi-scale fusion crater detection algorithm(MSF-CDA)based on the YOLO11 to improve the accuracy of lunar impact crater detection,especially for small craters with a diameter of<1 km.Using the images taken by the LROC(Lunar Reconnaissance Orbiter Camera)at the Chang’e-4(CE-4)landing area,we constructed three separate datasets for craters with diameters of 0-70 m,70-140 m,and>140 m.We then trained three submodels separately with these three datasets.Additionally,we designed a slicing-amplifying-slicing strategy to enhance the ability to extract features from small craters.To handle redundant predictions,we proposed a new Non-Maximum Suppression with Area Filtering method to fuse the results in overlapping targets within the multi-scale submodels.Finally,our new MSF-CDA method achieved high detection performance,with the Precision,Recall,and F1 score having values of 0.991,0.987,and 0.989,respectively,perfectly addressing the problems induced by the lesser features and sample imbalance of small craters.Our MSF-CDA can provide strong data support for more in-depth study of the geological evolution of the lunar surface and finer geological age estimations.This strategy can also be used to detect other small objects with lesser features and sample imbalance problems.We detected approximately 500,000 impact craters in an area of approximately 214 km2 around the CE-4 landing area.By statistically analyzing the new data,we updated the distribution function of the number and diameter of impact craters.Finally,we identified the most suitable lighting conditions for detecting impact crater targets by analyzing the effect of different lighting conditions on the detection accuracy.展开更多
Small object detection has been a focus of attention since the emergence of deep learning-based object detection.Although classical object detection frameworks have made significant contributions to the development of...Small object detection has been a focus of attention since the emergence of deep learning-based object detection.Although classical object detection frameworks have made significant contributions to the development of object detection,there are still many issues to be resolved in detecting small objects due to the inherent complexity and diversity of real-world visual scenes.In particular,the YOLO(You Only Look Once)series of detection models,renowned for their real-time performance,have undergone numerous adaptations aimed at improving the detection of small targets.In this survey,we summarize the state-of-the-art YOLO-based small object detection methods.This review presents a systematic categorization of YOLO-based approaches for small-object detection,organized into four methodological avenues,namely attention-based feature enhancement,detection-head optimization,loss function,and multi-scale feature fusion strategies.We then examine the principal challenges addressed by each category.Finally,we analyze the performance of thesemethods on public benchmarks and,by comparing current approaches,identify limitations and outline directions for future research.展开更多
Defect detection in printed circuit boards(PCB)remains challenging due to the difficulty of identifying small-scale defects,the inefficiency of conventional approaches,and the interference from complex backgrounds.To ...Defect detection in printed circuit boards(PCB)remains challenging due to the difficulty of identifying small-scale defects,the inefficiency of conventional approaches,and the interference from complex backgrounds.To address these issues,this paper proposes SIM-Net,an enhanced detection framework derived from YOLOv11.The model integrates SPDConv to preserve fine-grained features for small object detection,introduces a novel convolutional partial attention module(C2PAM)to suppress redundant background information and highlight salient regions,and employs a multi-scale fusion network(MFN)with a multi-grain contextual module(MGCT)to strengthen contextual representation and accelerate inference.Experimental evaluations demonstrate that SIM-Net achieves 92.4%mAP,92%accuracy,and 89.4%recall with an inference speed of 75.1 FPS,outperforming existing state-of-the-art methods.These results confirm the robustness and real-time applicability of SIM-Net for PCB defect inspection.展开更多
At inference time,deep neural networks are susceptible to backdoor attacks,which can produce attackercontrolled outputs when inputs contain carefully crafted triggers.Existing defense methods often focus on specific a...At inference time,deep neural networks are susceptible to backdoor attacks,which can produce attackercontrolled outputs when inputs contain carefully crafted triggers.Existing defense methods often focus on specific attack types or incur high costs,such as data cleaning or model fine-tuning.In contrast,we argue that it is possible to achieve effective and generalizable defense without removing triggers or incurring high model-cleaning costs.Fromthe attacker’s perspective and based on characteristics of vulnerable neuron activation anomalies,we propose an Adaptive Feature Injection(AFI)method for black-box backdoor detection.AFI employs a pre-trained image encoder to extract multi-level deep features and constructs a dynamic weight fusionmechanism for precise identification and interception of poisoned samples.Specifically,we select the control samples with the largest feature differences fromthe clean dataset via feature-space analysis,and generate blended sample pairs with the test sample using dynamic linear interpolation.The detection statistic is computed by measuring the divergence G(x)in model output responses.We systematically evaluate the effectiveness of AFI against representative backdoor attacks,including BadNets,Blend,WaNet,and IAB,on three benchmark datasets:MNIST,CIFAR-10,and ImageNet.Experimental results show that AFI can effectively detect poisoned samples,achieving average detection rates of 95.20%,94.15%,and 86.49%on these datasets,respectively.Compared with existing methods,AFI demonstrates strong cross-domain generalization ability and robustness to unknown attacks.展开更多
Deep learning has made significant progress in the field of oriented object detection for remote sensing images.However,existing methods still face challenges when dealing with difficult tasks such as multi-scale targ...Deep learning has made significant progress in the field of oriented object detection for remote sensing images.However,existing methods still face challenges when dealing with difficult tasks such as multi-scale targets,complex backgrounds,and small objects in remote sensing.Maintaining model lightweight to address resource constraints in remote sensing scenarios while improving task completion for remote sensing tasks remains a research hotspot.Therefore,we propose an enhanced multi-scale feature extraction lightweight network EM-YOLO based on the YOLOv8s architecture,specifically optimized for the characteristics of large target scale variations,diverse orientations,and numerous small objects in remote sensing images.Our innovations lie in two main aspects:First,a dynamic snake convolution(DSC)is introduced into the backbone network to enhance the model’s feature extraction capability for oriented targets.Second,an innovative focusing-diffusion module is designed in the feature fusion neck to effectively integrate multi-scale feature information.Finally,we introduce Layer-Adaptive Sparsity for magnitude-based Pruning(LASP)method to perform lightweight network pruning to better complete tasks in resource-constrained scenarios.Experimental results on the lightweight platform Orin demonstrate that the proposed method significantly outperforms the original YOLOv8s model in oriented remote sensing object detection tasks,and achieves comparable or superior performance to state-of-the-art methods on three authoritative remote sensing datasets(DOTA v1.0,DOTA v1.5,and HRSC2016).展开更多
In response to the challenges in highway pavement distress detection,such as multiple defect categories,difficulties in feature extraction for different damage types,and slow identification speeds,this paper proposes ...In response to the challenges in highway pavement distress detection,such as multiple defect categories,difficulties in feature extraction for different damage types,and slow identification speeds,this paper proposes an enhanced pavement crack detection model named Star-YOLO11.This improved algorithm modifies the YOLO11 architecture by substituting the original C3k2 backbone network with a Star-s50 feature extraction network.The enhanced structure adjusts the number of stacked layers in the StarBlock module to optimize detection accuracy and improve model efficiency.To enhance the accuracy of pavement crack detection and improve model efficiency,three key modifications to the YOLO11 architecture are proposed.Firstly,the original C3k2 backbone is replaced with a StarBlock-based structure,forming the Star-s50 feature extraction backbone network.This lightweight redesign reduces computational complexity while maintaining detection precision.Secondly,to address the inefficiency of the original Partial Self-attention(PSA)mechanism in capturing localized crack features,the convolutional prior-aware Channel Prior Convolutional Attention(CPCA)mechanism is integrated into the channel dimension,creating a hybrid CPC-C2PSA attention structure.Thirdly,the original neck structure is upgraded to a Star Multi-Branch Auxiliary Feature Pyramid Network(SMAFPN)based on the Multi-Branch Auxiliary Feature Pyramid Network architecture,which adaptively fuses high-level semantic and low-level spatial information through Star-s50 connections and C3k2 extraction blocks.Additionally,a composite dataset augmentation strategy combining traditional and advanced augmentation techniques is developed.This strategy is validated on a specialized pavement dataset containing five distinct crack categories for comprehensive training and evaluation.Experimental results indicate that the proposed Star-YOLO11 achieves an accuracy of 89.9%(3.5%higher than the baseline),a mean average precision(mAP)of 90.3%(+2.6%),and an F1-score of 85.8%(+0.5%),while reducing the model size by 18.8%and reaching a frame rate of 225.73 frames per second(FPS)for real-time detection.It shows potential for lightweight deployment in pavement crack detection tasks.展开更多
With the widespread use of social media,the propagation of health-related rumors has become a significant public health threat.Existing methods for detecting health rumors predominantly rely on external knowledge or p...With the widespread use of social media,the propagation of health-related rumors has become a significant public health threat.Existing methods for detecting health rumors predominantly rely on external knowledge or propagation structures,with only a few recent approaches attempting causal inference;however,these have not yet effectively integrated causal discovery with domain-specific knowledge graphs for detecting health rumors.In this study,we found that the combined use of causal discovery and domain-specific knowledge graphs can effectively identify implicit pseudo-causal logic embedded within texts,holding significant potential for health rumor detection.To this end,we propose CKDG—a dual-graph fusion framework based on causal logic and medical knowledge graphs.CKDG constructs a weighted causal graph to capture the implicit causal relationships in the text and introduces a medical knowledge graph to verify semantic consistency,thereby enhancing the ability to identify the misuse of professional terminology and pseudoscientific claims.In experiments conducted on a dataset comprising 8430 health rumors,CKDG achieved an accuracy of 91.28%and an F1 score of 90.38%,representing improvements of 5.11%and 3.29%over the best baseline,respectively.Our results indicate that the integrated use of causal discovery and domainspecific knowledge graphs offers significant advantages for health rumor detection systems.This method not only improves detection performance but also enhances the transparency and credibility of model decisions by tracing causal chains and sources of knowledge conflicts.We anticipate that this work will provide key technological support for the development of trustworthy health-information filtering systems,thereby improving the reliability of public health information on social media.展开更多
To address the challenge of real-time detection of unauthorized drone intrusions in complex low-altitude urban environments such as parks and airports,this paper proposes an enhanced MBS-YOLO(Multi-Branch Small Target...To address the challenge of real-time detection of unauthorized drone intrusions in complex low-altitude urban environments such as parks and airports,this paper proposes an enhanced MBS-YOLO(Multi-Branch Small Target Detection YOLO)model for anti-drone object detection,based on the YOLOv8 architecture.To overcome the limitations of existing methods in detecting small objects within complex backgrounds,we designed a C2f-Pu module with excellent feature extraction capability and a more compact parameter set,aiming to reduce the model’s computational complexity.To improve multi-scale feature fusion,we construct a Multi-Branch Feature Pyramid Network(MB-FPN)that employs a cross-level feature fusion strategy to enhance the model’s representation of small objects.Additionally,a shared detail-enhanced detection head is introduced to address the large size variations of Unmanned Aerial Vehicle(UAV)targets,thereby improving detection performance across different scales.Experimental results demonstrate that the proposed model achieves consistent improvements across multiple benchmarks.On the Det-Fly dataset,it improves precision by 3%,recall by 5.6%,and mAP50 by 4.5%compared with the baseline,while reducing parameters by 21.2%.Cross-validation on the VisDrone dataset further validates its robustness,yielding additional gains of 3.2%in precision,6.1%in recall,and 4.8%in mAP50 over the original YOLOv8.These findings confirm the effectiveness of the proposed algorithm in enhancing UAV detection performance under complex scenarios.展开更多
Salient object detection(SOD)models struggle to simultaneously preserve global structure,maintain sharp object boundaries,and sustain computational efficiency in complex scenes.In this study,we propose SPSALNet,a task...Salient object detection(SOD)models struggle to simultaneously preserve global structure,maintain sharp object boundaries,and sustain computational efficiency in complex scenes.In this study,we propose SPSALNet,a task-driven two-stage(macro–micro)architecture that restructures the SOD process around superpixel representations.In the proposed approach,a“split-and-enhance”principle,introduced to our knowledge for the first time in the SOD literature,hierarchically classifies superpixels and then applies targeted refinement only to ambiguous or error-prone regions.At the macro stage,the image is partitioned into content-adaptive superpixel regions,and each superpixel is represented by a high-dimensional region-level feature vector.These representations define a regional decomposition problem in which superpixels are assigned to three classes:background,object interior,and transition regions.Superpixel tokens interact with a global feature vector from a deep network backbone through a cross-attention module and are projected into an enriched embedding space that jointly encodes local topology and global context.At the micro stage,the model employs a U-Net-based refinement process that allocates computational resources only to ambiguous transition regions.The image and distance–similarity maps derived from superpixels are processed through a dual-encoder pathway.Subsequently,channel-aware fusion blocks adaptively combine information from these two sources,producing sharper and more stable object boundaries.Experimental results show that SPSALNet achieves high accuracy with lower computational cost compared to recent competing methods.On the PASCAL-S and DUT-OMRON datasets,SPSALNet exhibits a clear performance advantage across all key metrics,and it ranks first on accuracy-oriented measures on HKU-IS.On the challenging DUT-OMRON benchmark,SPSALNet reaches a MAE of 0.034.Across all datasets,it preserves object boundaries and regional structure in a stable and competitive manner.展开更多
Tomato is a major economic crop worldwide,and diseases on tomato leaves can significantly reduce both yield and quality.Traditional manual inspection is inefficient and highly subjective,making it difficult to meet th...Tomato is a major economic crop worldwide,and diseases on tomato leaves can significantly reduce both yield and quality.Traditional manual inspection is inefficient and highly subjective,making it difficult to meet the requirements of early disease identification in complex natural environments.To address this issue,this study proposes an improved YOLO11-based model,YOLO-SPDNet(Scale Sequence Fusion,Position-Channel Attention,and Dual Enhancement Network).The model integrates the SEAM(Self-Ensembling Attention Mechanism)semantic enhancement module,the MLCA(Mixed Local Channel Attention)lightweight attention mechanism,and the SPA(Scale-Position-Detail Awareness)module composed of SSFF(Scale Sequence Feature Fusion),TFE(Triple Feature Encoding),and CPAM(Channel and Position Attention Mechanism).These enhancements strengthen fine-grained lesion detection while maintaining model lightweightness.Experimental results show that YOLO-SPDNet achieves an accuracy of 91.8%,a recall of 86.5%,and an mAP@0.5 of 90.6%on the test set,with a computational complexity of 12.5 GFLOPs.Furthermore,the model reaches a real-time inference speed of 987 FPS,making it suitable for deployment on mobile agricultural terminals and online monitoring systems.Comparative analysis and ablation studies further validate the reliability and practical applicability of the proposed model in complex natural scenes.展开更多
With the rapid expansion of drone applications,accurate detection of objects in aerial imagery has become crucial for intelligent transportation,urban management,and emergency rescue missions.However,existing methods ...With the rapid expansion of drone applications,accurate detection of objects in aerial imagery has become crucial for intelligent transportation,urban management,and emergency rescue missions.However,existing methods face numerous challenges in practical deployment,including scale variation handling,feature degradation,and complex backgrounds.To address these issues,we propose Edge-enhanced and Detail-Capturing You Only Look Once(EHDC-YOLO),a novel framework for object detection in Unmanned Aerial Vehicle(UAV)imagery.Based on the You Only Look Once version 11 nano(YOLOv11n)baseline,EHDC-YOLO systematically introduces several architectural enhancements:(1)a Multi-Scale Edge Enhancement(MSEE)module that leverages multi-scale pooling and edge information to enhance boundary feature extraction;(2)an Enhanced Feature Pyramid Network(EFPN)that integrates P2-level features with Cross Stage Partial(CSP)structures and OmniKernel convolutions for better fine-grained representation;and(3)Dynamic Head(DyHead)with multi-dimensional attention mechanisms for enhanced cross-scale modeling and perspective adaptability.Comprehensive experiments on the Vision meets Drones for Detection(VisDrone-DET)2019 dataset demonstrate that EHDC-YOLO achieves significant improvements,increasing mean Average Precision(mAP)@0.5 from 33.2%to 46.1%(an absolute improvement of 12.9 percentage points)and mAP@0.5:0.95 from 19.5%to 28.0%(an absolute improvement of 8.5 percentage points)compared with the YOLOv11n baseline,while maintaining a reasonable parameter count(2.81 M vs the baseline’s 2.58 M).Further ablation studies confirm the effectiveness of each proposed component,while visualization results highlight EHDC-YOLO’s superior performance in detecting objects and handling occlusions in complex drone scenarios.展开更多
An improved model based on you only look once version 8(YOLOv8)is proposed to solve the problem of low detection accuracy due to the diversity of object sizes in optical remote sensing images.Firstly,the feature pyram...An improved model based on you only look once version 8(YOLOv8)is proposed to solve the problem of low detection accuracy due to the diversity of object sizes in optical remote sensing images.Firstly,the feature pyramid network(FPN)structure of the original YOLOv8 mode is replaced by the generalized-FPN(GFPN)structure in GiraffeDet to realize the"cross-layer"and"cross-scale"adaptive feature fusion,to enrich the semantic information and spatial information on the feature map to improve the target detection ability of the model.Secondly,a pyramid-pool module of multi atrous spatial pyramid pooling(MASPP)is designed by using the idea of atrous convolution and feature pyramid structure to extract multi-scale features,so as to improve the processing ability of the model for multi-scale objects.The experimental results show that the detection accuracy of the improved YOLOv8 model on DIOR dataset is 92%and mean average precision(mAP)is 87.9%,respectively 3.5%and 1.7%higher than those of the original model.It is proved the detection and classification ability of the proposed model on multi-dimensional optical remote sensing target has been improved.展开更多
A novel dual-branch decoding fusion convolutional neural network model(DDFNet)specifically designed for real-time salient object detection(SOD)on steel surfaces is proposed.DDFNet is based on a standard encoder–decod...A novel dual-branch decoding fusion convolutional neural network model(DDFNet)specifically designed for real-time salient object detection(SOD)on steel surfaces is proposed.DDFNet is based on a standard encoder–decoder architecture.DDFNet integrates three key innovations:first,we introduce a novel,lightweight multi-scale progressive aggregation residual network that effectively suppresses background interference and refines defect details,enabling efficient salient feature extraction.Then,we propose an innovative dual-branch decoding fusion structure,comprising the refined defect representation branch and the enhanced defect representation branch,which enhance accuracy in defect region identification and feature representation.Additionally,to further improve the detection of small and complex defects,we incorporate a multi-scale attention fusion module.Experimental results on the public ESDIs-SOD dataset show that DDFNet,with only 3.69 million parameters,achieves detection performance comparable to current state-of-the-art models,demonstrating its potential for real-time industrial applications.Furthermore,our DDFNet-L variant consistently outperforms leading methods in detection performance.The code is available at https://github.com/13140W/DDFNet.展开更多
Visible-infrared object detection leverages the day-night stable object perception capability of infrared images to enhance detection robustness in low-light environments by fusing the complementary information of vis...Visible-infrared object detection leverages the day-night stable object perception capability of infrared images to enhance detection robustness in low-light environments by fusing the complementary information of visible and infrared images.However,the inherent differences in the imaging mechanisms of visible and infrared modalities make effective cross-modal fusion challenging.Furthermore,constrained by the physical characteristics of sensors and thermal diffusion effects,infrared images generally suffer from blurred object contours and missing details,making it difficult to extract object features effectively.To address these issues,we propose an infrared-visible image fusion network that realizesmultimodal information fusion of infrared and visible images through a carefully designedmultiscale fusion strategy.First,we design an adaptive gray-radiance enhancement(AGRE)module to strengthen the detail representation in infrared images,improving their usability in complex lighting scenarios.Next,we introduce a channelspatial feature interaction(CSFI)module,which achieves efficient complementarity between the RGB and infrared(IR)modalities via dynamic channel switching and a spatial attention mechanism.Finally,we propose a multi-scale enhanced cross-attention fusion(MSECA)module,which optimizes the fusion ofmulti-level features through dynamic convolution and gating mechanisms and captures long-range complementary relationships of cross-modal features on a global scale,thereby enhancing the expressiveness of the fused features.Experiments on the KAIST,M3FD,and FLIR datasets demonstrate that our method delivers outstanding performance in daytime and nighttime scenarios.On the KAIST dataset,the miss rate drops to 5.99%,and further to 4.26% in night scenes.On the FLIR and M3FD datasets,it achieves AP50 scores of 79.4% and 88.9%,respectively.展开更多
基金supported by the National Natural Science Foundation of China(No.62276204)the Fundamental Research Funds for the Central Universities,China(No.YJSJ24011)+1 种基金the Natural Science Basic Research Program of Shaanxi,China(Nos.2022JM-340 and 2023-JC-QN-0710)the China Postdoctoral Science Foundation(Nos.2020T130494 and 2018M633470)。
文摘Visible and infrared(RGB-IR)fusion object detection plays an important role in security,disaster relief,etc.In recent years,deep-learning-based RGB-IR fusion detection methods have been developing rapidly,but still struggle to deal with the complex and changing scenarios captured by drones,mainly due to two reasons:(A)RGB-IR fusion detectors are susceptible to inferior inputs that degrade performance and stability.(B)RGB-IR fusion detectors are susceptible to redundant features that reduce accuracy and efficiency.In this paper,an innovative RGB-IR fusion detection framework based on global-local feature optimization,named GLFDet,is proposed to improve the detection performance and efficiency of drone-captured objects.The key components of GLFDet include a Global Feature Optimization(GFO)module,a Local Feature Optimization(LFO)module and a Channel Separation Fusion(CSF)module.Specifically,GFO calculates the information content of the input image from the frequency domain and optimizes the features holistically.Then,LFO dynamically selects high-value features and filters out low-value features before fusion,which significantly improves the efficiency of fusion.Finally,CSF fuses the RGB and IR features across the corresponding channels,which avoids the rearrangement of the channel relationships and enhances the model stability.Extensive experimental results show that the proposed method achieves the best performance on three popular RGB-IR datasets Drone Vehicle,VEDAI,and LLVIP.In addition,GLFDet is more lightweight than other comparable models,making it more appealing to edge devices such as drones.The code is available at https://github.com/lao chen330/GLFDet.
基金supported via funding from Prince Sattam bin Abdulaziz University project number(PSAU/2025/03/32440).
文摘Parkinson’s disease remains a major clinical issue in terms of early detection,especially during its prodromal stage when symptoms are not evident or not distinct.To address this problem,we proposed a new deep learning 2-based approach for detecting Parkinson’s disease before any of the overt symptoms develop during their prodromal stage.We used 5 publicly accessible datasets,including UCI Parkinson’s Voice,Spiral Drawings,PaHaW,NewHandPD,and PPMI,and implemented a dual stream CNN–BiLSTM architecture with Fisher-weighted feature merging and SHAP-based explanation.The findings reveal that the model’s performance was superior and achieved 98.2%,a F1-score of 0.981,and AUC of 0.991 on the UCI Voice dataset.The model’s performance on the remaining datasets was also comparable,with up to a 2–7 percent betterment in accuracy compared to existing strong models such as CNN–RNN–MLP,ILN–GNet,and CASENet.Across the evidence,the findings back the diagnostic promise of micro-tremor assessment and demonstrate that combining temporal and spatial features with a scatter-based segment for a multi-modal approach can be an effective and scalable platform for an“early,”interpretable PD screening system.
基金supported by the National Natural Science Foundation of China(Grant Nos.62572057,62272049,U24A20331)Beijing Natural Science Foundation(Grant Nos.4232026,4242020)Academic Research Projects of Beijing Union University(Grant No.ZK10202404).
文摘Traffic sign detection is a critical component of driving systems.Single-stage network-based traffic sign detection algorithms,renowned for their fast detection speeds and high accuracy,have become the dominant approach in current practices.However,in complex and dynamic traffic scenes,particularly with smaller traffic sign objects,challenges such as missed and false detections can lead to reduced overall detection accuracy.To address this issue,this paper proposes a detection algorithm that integrates edge and shape information.Recognizing that traffic signs have specific shapes and distinct edge contours,this paper introduces an edge feature extraction branch within the backbone network,enabling adaptive fusion with features of the same hierarchical level.Additionally,a shape prior convolution module is designed to replaces the first two convolutional modules of the backbone network,aimed at enhancing the model's perception ability for specific shape objects and reducing its sensitivity to background noise.The algorithm was evaluated on the CCTSDB and TT100k datasets,and compared to YOLOv8s,the mAP50 values increased by 3.0%and 10.4%,respectively,demonstrating the effectiveness of the proposed method in improving the accuracy of traffic sign detection.
文摘In recent years,with the rapid advancement of artificial intelligence,object detection algorithms have made significant strides in accuracy and computational efficiency.Notably,research and applications of Anchor-Free models have opened new avenues for real-time target detection in optical remote sensing images(ORSIs).However,in the realmof adversarial attacks,developing adversarial techniques tailored to Anchor-Freemodels remains challenging.Adversarial examples generated based on Anchor-Based models often exhibit poor transferability to these new model architectures.Furthermore,the growing diversity of Anchor-Free models poses additional hurdles to achieving robust transferability of adversarial attacks.This study presents an improved cross-conv-block feature fusion You Only Look Once(YOLO)architecture,meticulously engineered to facilitate the extraction ofmore comprehensive semantic features during the backpropagation process.To address the asymmetry between densely distributed objects in ORSIs and the corresponding detector outputs,a novel dense bounding box attack strategy is proposed.This approach leverages dense target bounding boxes loss in the calculation of adversarial loss functions.Furthermore,by integrating translation-invariant(TI)and momentum-iteration(MI)adversarial methodologies,the proposed framework significantly improves the transferability of adversarial attacks.Experimental results demonstrate that our method achieves superior adversarial attack performance,with adversarial transferability rates(ATR)of 67.53%on the NWPU VHR-10 dataset and 90.71%on the HRSC2016 dataset.Compared to ensemble adversarial attack and cascaded adversarial attack approaches,our method generates adversarial examples in an average of 0.64 s,representing an approximately 14.5%improvement in efficiency under equivalent conditions.
基金financially supported byChongqingUniversity of Technology Graduate Innovation Foundation(Grant No.gzlcx20253267).
文摘Camouflaged Object Detection(COD)aims to identify objects that share highly similar patterns—such as texture,intensity,and color—with their surrounding environment.Due to their intrinsic resemblance to the background,camouflaged objects often exhibit vague boundaries and varying scales,making it challenging to accurately locate targets and delineate their indistinct edges.To address this,we propose a novel camouflaged object detection network called Edge-Guided and Multi-scale Fusion Network(EGMFNet),which leverages edge-guided multi-scale integration for enhanced performance.The model incorporates two innovative components:a Multi-scale Fusion Module(MSFM)and an Edge-Guided Attention Module(EGA).These designs exploit multi-scale features to uncover subtle cues between candidate objects and the background while emphasizing camouflaged object boundaries.Moreover,recognizing the rich contextual information in fused features,we introduce a Dual-Branch Global Context Module(DGCM)to refine features using extensive global context,thereby generatingmore informative representations.Experimental results on four benchmark datasets demonstrate that EGMFNet outperforms state-of-the-art methods across five evaluation metrics.Specifically,on COD10K,our EGMFNet-P improves F_(β)by 4.8 points and reduces mean absolute error(MAE)by 0.006 compared with ZoomNeXt;on NC4K,it achieves a 3.6-point increase in F_(β).OnCAMO and CHAMELEON,it obtains 4.5-point increases in F_(β),respectively.These consistent gains substantiate the superiority and robustness of EGMFNet.
基金funded by the National Natural Science Foundation of China under Grant 62306128the Leading Innovation Project of Changzhou Science and Technology Bureau underGrant CQ20230072+2 种基金the Basic Science Research Project of Jiangsu Provincial Department of Education under Grant 23KJD520003the Science and Technology Development Plan Project of Jilin Provinceunder Grant 20240101382JCthe National KeyR esearch and Development Program of China under Grant 2023YFF1105102.
文摘In fire rescue scenarios,traditional manual operations are highly dangerous,as dense smoke,low visibility,extreme heat,and toxic gases not only hinder rescue efficiency but also endanger firefighters’safety.Although intelligent rescue robots can enter hazardous environments in place of humans,smoke poses major challenges for human detection algorithms.These challenges include the attenuation of visible and infrared signals,complex thermal fields,and interference frombackground objects,all ofwhichmake it difficult to accurately identify trapped individuals.To address this problem,we propose VIF-YOLO,a visible–infrared fusion model for real-time human detection in dense smoke environments.The framework introduces a lightweight multimodal fusion(LMF)module based on learnable low-rank representation blocks to end-to-end integrate visible and infrared images,preserving fine details while enhancing salient features.In addition,an efficient multiscale attention(EMA)mechanism is incorporated into the YOLOv10n backbone to improve feature representation under low-light conditions.Extensive experiments on our newly constructedmultimodal smoke human detection(MSHD)dataset demonstrate thatVIF-YOLOachievesmAP50 of 99.5%,precision of 99.2%,and recall of 99.3%,outperforming YOLOv10n by a clear margin.Furthermore,when deployed on the NVIDIA Jetson Xavier NX,VIF-YOLO attains 40.6 FPS with an average inference latency of 24.6 ms,validating its real-time capability on edge-computing platforms.These results confirm that VIF-YOLO provides accurate,robust,and fast detection across complex backgrounds and diverse smoke conditions,ensuring reliable and rapid localization of individuals in need of rescue.
基金the National Key Research and Development Program of China (Grant No.2022YFF0711400)the National Space Science Data Center Youth Open Project (Grant No. NSSDC2302001)
文摘Impact craters are important for understanding the evolution of lunar geologic and surface erosion rates,among other functions.However,the morphological characteristics of these micro impact craters are not obvious and they are numerous,resulting in low detection accuracy by deep learning models.Therefore,we proposed a new multi-scale fusion crater detection algorithm(MSF-CDA)based on the YOLO11 to improve the accuracy of lunar impact crater detection,especially for small craters with a diameter of<1 km.Using the images taken by the LROC(Lunar Reconnaissance Orbiter Camera)at the Chang’e-4(CE-4)landing area,we constructed three separate datasets for craters with diameters of 0-70 m,70-140 m,and>140 m.We then trained three submodels separately with these three datasets.Additionally,we designed a slicing-amplifying-slicing strategy to enhance the ability to extract features from small craters.To handle redundant predictions,we proposed a new Non-Maximum Suppression with Area Filtering method to fuse the results in overlapping targets within the multi-scale submodels.Finally,our new MSF-CDA method achieved high detection performance,with the Precision,Recall,and F1 score having values of 0.991,0.987,and 0.989,respectively,perfectly addressing the problems induced by the lesser features and sample imbalance of small craters.Our MSF-CDA can provide strong data support for more in-depth study of the geological evolution of the lunar surface and finer geological age estimations.This strategy can also be used to detect other small objects with lesser features and sample imbalance problems.We detected approximately 500,000 impact craters in an area of approximately 214 km2 around the CE-4 landing area.By statistically analyzing the new data,we updated the distribution function of the number and diameter of impact craters.Finally,we identified the most suitable lighting conditions for detecting impact crater targets by analyzing the effect of different lighting conditions on the detection accuracy.
基金supported in part by the by Chongqing Research Program of Basic Research and Frontier Technology under Grant CSTB2025NSCQ-GPX1309.
文摘Small object detection has been a focus of attention since the emergence of deep learning-based object detection.Although classical object detection frameworks have made significant contributions to the development of object detection,there are still many issues to be resolved in detecting small objects due to the inherent complexity and diversity of real-world visual scenes.In particular,the YOLO(You Only Look Once)series of detection models,renowned for their real-time performance,have undergone numerous adaptations aimed at improving the detection of small targets.In this survey,we summarize the state-of-the-art YOLO-based small object detection methods.This review presents a systematic categorization of YOLO-based approaches for small-object detection,organized into four methodological avenues,namely attention-based feature enhancement,detection-head optimization,loss function,and multi-scale feature fusion strategies.We then examine the principal challenges addressed by each category.Finally,we analyze the performance of thesemethods on public benchmarks and,by comparing current approaches,identify limitations and outline directions for future research.
文摘Defect detection in printed circuit boards(PCB)remains challenging due to the difficulty of identifying small-scale defects,the inefficiency of conventional approaches,and the interference from complex backgrounds.To address these issues,this paper proposes SIM-Net,an enhanced detection framework derived from YOLOv11.The model integrates SPDConv to preserve fine-grained features for small object detection,introduces a novel convolutional partial attention module(C2PAM)to suppress redundant background information and highlight salient regions,and employs a multi-scale fusion network(MFN)with a multi-grain contextual module(MGCT)to strengthen contextual representation and accelerate inference.Experimental evaluations demonstrate that SIM-Net achieves 92.4%mAP,92%accuracy,and 89.4%recall with an inference speed of 75.1 FPS,outperforming existing state-of-the-art methods.These results confirm the robustness and real-time applicability of SIM-Net for PCB defect inspection.
基金supported by the National Natural Science Foundation of China Grant(No.61972133)Project of Leading Talents in Science and Technology Innovation for Thousands of People Plan in Henan Province Grant(No.204200510021)the Key Research and Development Plan Special Project of Henan Province Grant(No.241111211400).
文摘At inference time,deep neural networks are susceptible to backdoor attacks,which can produce attackercontrolled outputs when inputs contain carefully crafted triggers.Existing defense methods often focus on specific attack types or incur high costs,such as data cleaning or model fine-tuning.In contrast,we argue that it is possible to achieve effective and generalizable defense without removing triggers or incurring high model-cleaning costs.Fromthe attacker’s perspective and based on characteristics of vulnerable neuron activation anomalies,we propose an Adaptive Feature Injection(AFI)method for black-box backdoor detection.AFI employs a pre-trained image encoder to extract multi-level deep features and constructs a dynamic weight fusionmechanism for precise identification and interception of poisoned samples.Specifically,we select the control samples with the largest feature differences fromthe clean dataset via feature-space analysis,and generate blended sample pairs with the test sample using dynamic linear interpolation.The detection statistic is computed by measuring the divergence G(x)in model output responses.We systematically evaluate the effectiveness of AFI against representative backdoor attacks,including BadNets,Blend,WaNet,and IAB,on three benchmark datasets:MNIST,CIFAR-10,and ImageNet.Experimental results show that AFI can effectively detect poisoned samples,achieving average detection rates of 95.20%,94.15%,and 86.49%on these datasets,respectively.Compared with existing methods,AFI demonstrates strong cross-domain generalization ability and robustness to unknown attacks.
基金funded by the Hainan Province Science and Technology Special Fund under Grant ZDYF2024GXJS292.
文摘Deep learning has made significant progress in the field of oriented object detection for remote sensing images.However,existing methods still face challenges when dealing with difficult tasks such as multi-scale targets,complex backgrounds,and small objects in remote sensing.Maintaining model lightweight to address resource constraints in remote sensing scenarios while improving task completion for remote sensing tasks remains a research hotspot.Therefore,we propose an enhanced multi-scale feature extraction lightweight network EM-YOLO based on the YOLOv8s architecture,specifically optimized for the characteristics of large target scale variations,diverse orientations,and numerous small objects in remote sensing images.Our innovations lie in two main aspects:First,a dynamic snake convolution(DSC)is introduced into the backbone network to enhance the model’s feature extraction capability for oriented targets.Second,an innovative focusing-diffusion module is designed in the feature fusion neck to effectively integrate multi-scale feature information.Finally,we introduce Layer-Adaptive Sparsity for magnitude-based Pruning(LASP)method to perform lightweight network pruning to better complete tasks in resource-constrained scenarios.Experimental results on the lightweight platform Orin demonstrate that the proposed method significantly outperforms the original YOLOv8s model in oriented remote sensing object detection tasks,and achieves comparable or superior performance to state-of-the-art methods on three authoritative remote sensing datasets(DOTA v1.0,DOTA v1.5,and HRSC2016).
基金funded by the Jiangxi SASAC Science and Technology Innovation Special Project and the Key Technology Research and Application Promotion of Highway Overload Digital Solution.
文摘In response to the challenges in highway pavement distress detection,such as multiple defect categories,difficulties in feature extraction for different damage types,and slow identification speeds,this paper proposes an enhanced pavement crack detection model named Star-YOLO11.This improved algorithm modifies the YOLO11 architecture by substituting the original C3k2 backbone network with a Star-s50 feature extraction network.The enhanced structure adjusts the number of stacked layers in the StarBlock module to optimize detection accuracy and improve model efficiency.To enhance the accuracy of pavement crack detection and improve model efficiency,three key modifications to the YOLO11 architecture are proposed.Firstly,the original C3k2 backbone is replaced with a StarBlock-based structure,forming the Star-s50 feature extraction backbone network.This lightweight redesign reduces computational complexity while maintaining detection precision.Secondly,to address the inefficiency of the original Partial Self-attention(PSA)mechanism in capturing localized crack features,the convolutional prior-aware Channel Prior Convolutional Attention(CPCA)mechanism is integrated into the channel dimension,creating a hybrid CPC-C2PSA attention structure.Thirdly,the original neck structure is upgraded to a Star Multi-Branch Auxiliary Feature Pyramid Network(SMAFPN)based on the Multi-Branch Auxiliary Feature Pyramid Network architecture,which adaptively fuses high-level semantic and low-level spatial information through Star-s50 connections and C3k2 extraction blocks.Additionally,a composite dataset augmentation strategy combining traditional and advanced augmentation techniques is developed.This strategy is validated on a specialized pavement dataset containing five distinct crack categories for comprehensive training and evaluation.Experimental results indicate that the proposed Star-YOLO11 achieves an accuracy of 89.9%(3.5%higher than the baseline),a mean average precision(mAP)of 90.3%(+2.6%),and an F1-score of 85.8%(+0.5%),while reducing the model size by 18.8%and reaching a frame rate of 225.73 frames per second(FPS)for real-time detection.It shows potential for lightweight deployment in pavement crack detection tasks.
基金funded by the Hunan Provincial Natural Science Foundation of China(Grant No.2025JJ70105)the Hunan Provincial College Students’Innovation and Entrepreneurship Training Program(Project No.S202411342056)The article processing charge(APC)was funded by the Project No.2025JJ70105.
文摘With the widespread use of social media,the propagation of health-related rumors has become a significant public health threat.Existing methods for detecting health rumors predominantly rely on external knowledge or propagation structures,with only a few recent approaches attempting causal inference;however,these have not yet effectively integrated causal discovery with domain-specific knowledge graphs for detecting health rumors.In this study,we found that the combined use of causal discovery and domain-specific knowledge graphs can effectively identify implicit pseudo-causal logic embedded within texts,holding significant potential for health rumor detection.To this end,we propose CKDG—a dual-graph fusion framework based on causal logic and medical knowledge graphs.CKDG constructs a weighted causal graph to capture the implicit causal relationships in the text and introduces a medical knowledge graph to verify semantic consistency,thereby enhancing the ability to identify the misuse of professional terminology and pseudoscientific claims.In experiments conducted on a dataset comprising 8430 health rumors,CKDG achieved an accuracy of 91.28%and an F1 score of 90.38%,representing improvements of 5.11%and 3.29%over the best baseline,respectively.Our results indicate that the integrated use of causal discovery and domainspecific knowledge graphs offers significant advantages for health rumor detection systems.This method not only improves detection performance but also enhances the transparency and credibility of model decisions by tracing causal chains and sources of knowledge conflicts.We anticipate that this work will provide key technological support for the development of trustworthy health-information filtering systems,thereby improving the reliability of public health information on social media.
基金supported by the Key R&D Programof Xianyang City,Shaanxi Province(L2024-ZDYF-ZDYF-GY-0043).
文摘To address the challenge of real-time detection of unauthorized drone intrusions in complex low-altitude urban environments such as parks and airports,this paper proposes an enhanced MBS-YOLO(Multi-Branch Small Target Detection YOLO)model for anti-drone object detection,based on the YOLOv8 architecture.To overcome the limitations of existing methods in detecting small objects within complex backgrounds,we designed a C2f-Pu module with excellent feature extraction capability and a more compact parameter set,aiming to reduce the model’s computational complexity.To improve multi-scale feature fusion,we construct a Multi-Branch Feature Pyramid Network(MB-FPN)that employs a cross-level feature fusion strategy to enhance the model’s representation of small objects.Additionally,a shared detail-enhanced detection head is introduced to address the large size variations of Unmanned Aerial Vehicle(UAV)targets,thereby improving detection performance across different scales.Experimental results demonstrate that the proposed model achieves consistent improvements across multiple benchmarks.On the Det-Fly dataset,it improves precision by 3%,recall by 5.6%,and mAP50 by 4.5%compared with the baseline,while reducing parameters by 21.2%.Cross-validation on the VisDrone dataset further validates its robustness,yielding additional gains of 3.2%in precision,6.1%in recall,and 4.8%in mAP50 over the original YOLOv8.These findings confirm the effectiveness of the proposed algorithm in enhancing UAV detection performance under complex scenarios.
文摘Salient object detection(SOD)models struggle to simultaneously preserve global structure,maintain sharp object boundaries,and sustain computational efficiency in complex scenes.In this study,we propose SPSALNet,a task-driven two-stage(macro–micro)architecture that restructures the SOD process around superpixel representations.In the proposed approach,a“split-and-enhance”principle,introduced to our knowledge for the first time in the SOD literature,hierarchically classifies superpixels and then applies targeted refinement only to ambiguous or error-prone regions.At the macro stage,the image is partitioned into content-adaptive superpixel regions,and each superpixel is represented by a high-dimensional region-level feature vector.These representations define a regional decomposition problem in which superpixels are assigned to three classes:background,object interior,and transition regions.Superpixel tokens interact with a global feature vector from a deep network backbone through a cross-attention module and are projected into an enriched embedding space that jointly encodes local topology and global context.At the micro stage,the model employs a U-Net-based refinement process that allocates computational resources only to ambiguous transition regions.The image and distance–similarity maps derived from superpixels are processed through a dual-encoder pathway.Subsequently,channel-aware fusion blocks adaptively combine information from these two sources,producing sharper and more stable object boundaries.Experimental results show that SPSALNet achieves high accuracy with lower computational cost compared to recent competing methods.On the PASCAL-S and DUT-OMRON datasets,SPSALNet exhibits a clear performance advantage across all key metrics,and it ranks first on accuracy-oriented measures on HKU-IS.On the challenging DUT-OMRON benchmark,SPSALNet reaches a MAE of 0.034.Across all datasets,it preserves object boundaries and regional structure in a stable and competitive manner.
基金Tianmin Tianyuan Boutique Vegetable Industry Technology Service Station(Grant No.2024120011003081)Development of Environmental Monitoring and Traceability System for Wuqing Agricultural Production Areas(Grant No.2024120011001866)。
文摘Tomato is a major economic crop worldwide,and diseases on tomato leaves can significantly reduce both yield and quality.Traditional manual inspection is inefficient and highly subjective,making it difficult to meet the requirements of early disease identification in complex natural environments.To address this issue,this study proposes an improved YOLO11-based model,YOLO-SPDNet(Scale Sequence Fusion,Position-Channel Attention,and Dual Enhancement Network).The model integrates the SEAM(Self-Ensembling Attention Mechanism)semantic enhancement module,the MLCA(Mixed Local Channel Attention)lightweight attention mechanism,and the SPA(Scale-Position-Detail Awareness)module composed of SSFF(Scale Sequence Feature Fusion),TFE(Triple Feature Encoding),and CPAM(Channel and Position Attention Mechanism).These enhancements strengthen fine-grained lesion detection while maintaining model lightweightness.Experimental results show that YOLO-SPDNet achieves an accuracy of 91.8%,a recall of 86.5%,and an mAP@0.5 of 90.6%on the test set,with a computational complexity of 12.5 GFLOPs.Furthermore,the model reaches a real-time inference speed of 987 FPS,making it suitable for deployment on mobile agricultural terminals and online monitoring systems.Comparative analysis and ablation studies further validate the reliability and practical applicability of the proposed model in complex natural scenes.
文摘With the rapid expansion of drone applications,accurate detection of objects in aerial imagery has become crucial for intelligent transportation,urban management,and emergency rescue missions.However,existing methods face numerous challenges in practical deployment,including scale variation handling,feature degradation,and complex backgrounds.To address these issues,we propose Edge-enhanced and Detail-Capturing You Only Look Once(EHDC-YOLO),a novel framework for object detection in Unmanned Aerial Vehicle(UAV)imagery.Based on the You Only Look Once version 11 nano(YOLOv11n)baseline,EHDC-YOLO systematically introduces several architectural enhancements:(1)a Multi-Scale Edge Enhancement(MSEE)module that leverages multi-scale pooling and edge information to enhance boundary feature extraction;(2)an Enhanced Feature Pyramid Network(EFPN)that integrates P2-level features with Cross Stage Partial(CSP)structures and OmniKernel convolutions for better fine-grained representation;and(3)Dynamic Head(DyHead)with multi-dimensional attention mechanisms for enhanced cross-scale modeling and perspective adaptability.Comprehensive experiments on the Vision meets Drones for Detection(VisDrone-DET)2019 dataset demonstrate that EHDC-YOLO achieves significant improvements,increasing mean Average Precision(mAP)@0.5 from 33.2%to 46.1%(an absolute improvement of 12.9 percentage points)and mAP@0.5:0.95 from 19.5%to 28.0%(an absolute improvement of 8.5 percentage points)compared with the YOLOv11n baseline,while maintaining a reasonable parameter count(2.81 M vs the baseline’s 2.58 M).Further ablation studies confirm the effectiveness of each proposed component,while visualization results highlight EHDC-YOLO’s superior performance in detecting objects and handling occlusions in complex drone scenarios.
基金supported by the National Natural Science Foundation of China(No.62241109)the Tianjin Science and Technology Commissioner Project(No.20YDTPJC01110)。
文摘An improved model based on you only look once version 8(YOLOv8)is proposed to solve the problem of low detection accuracy due to the diversity of object sizes in optical remote sensing images.Firstly,the feature pyramid network(FPN)structure of the original YOLOv8 mode is replaced by the generalized-FPN(GFPN)structure in GiraffeDet to realize the"cross-layer"and"cross-scale"adaptive feature fusion,to enrich the semantic information and spatial information on the feature map to improve the target detection ability of the model.Secondly,a pyramid-pool module of multi atrous spatial pyramid pooling(MASPP)is designed by using the idea of atrous convolution and feature pyramid structure to extract multi-scale features,so as to improve the processing ability of the model for multi-scale objects.The experimental results show that the detection accuracy of the improved YOLOv8 model on DIOR dataset is 92%and mean average precision(mAP)is 87.9%,respectively 3.5%and 1.7%higher than those of the original model.It is proved the detection and classification ability of the proposed model on multi-dimensional optical remote sensing target has been improved.
基金supported in part by the National Key R&D Program of China(Grant No.2023YFB3307604)the Shanxi Province Basic Research Program Youth Science Research Project(Grant Nos.202303021212054 and 202303021212046)+3 种基金the Key Projects Supported by Hebei Natural Science Foundation(Grant No.E2024203125)the National Science Foundation of China(Grant No.52105391)the Hebei Provincial Science and Technology Major Project(Grant No.23280101Z)the National Key Laboratory of Metal Forming Technology and Heavy Equipment Open Fund(Grant No.S2308100.W17).
文摘A novel dual-branch decoding fusion convolutional neural network model(DDFNet)specifically designed for real-time salient object detection(SOD)on steel surfaces is proposed.DDFNet is based on a standard encoder–decoder architecture.DDFNet integrates three key innovations:first,we introduce a novel,lightweight multi-scale progressive aggregation residual network that effectively suppresses background interference and refines defect details,enabling efficient salient feature extraction.Then,we propose an innovative dual-branch decoding fusion structure,comprising the refined defect representation branch and the enhanced defect representation branch,which enhance accuracy in defect region identification and feature representation.Additionally,to further improve the detection of small and complex defects,we incorporate a multi-scale attention fusion module.Experimental results on the public ESDIs-SOD dataset show that DDFNet,with only 3.69 million parameters,achieves detection performance comparable to current state-of-the-art models,demonstrating its potential for real-time industrial applications.Furthermore,our DDFNet-L variant consistently outperforms leading methods in detection performance.The code is available at https://github.com/13140W/DDFNet.
基金supported by the National Natural Science Foundation of China(Grant No.62302086)the Natural Science Foundation of Liaoning Province(Grant No.2023-MSBA-070)the Fundamental Research Funds for the Central Universities(Grant No.N2317005).
文摘Visible-infrared object detection leverages the day-night stable object perception capability of infrared images to enhance detection robustness in low-light environments by fusing the complementary information of visible and infrared images.However,the inherent differences in the imaging mechanisms of visible and infrared modalities make effective cross-modal fusion challenging.Furthermore,constrained by the physical characteristics of sensors and thermal diffusion effects,infrared images generally suffer from blurred object contours and missing details,making it difficult to extract object features effectively.To address these issues,we propose an infrared-visible image fusion network that realizesmultimodal information fusion of infrared and visible images through a carefully designedmultiscale fusion strategy.First,we design an adaptive gray-radiance enhancement(AGRE)module to strengthen the detail representation in infrared images,improving their usability in complex lighting scenarios.Next,we introduce a channelspatial feature interaction(CSFI)module,which achieves efficient complementarity between the RGB and infrared(IR)modalities via dynamic channel switching and a spatial attention mechanism.Finally,we propose a multi-scale enhanced cross-attention fusion(MSECA)module,which optimizes the fusion ofmulti-level features through dynamic convolution and gating mechanisms and captures long-range complementary relationships of cross-modal features on a global scale,thereby enhancing the expressiveness of the fused features.Experiments on the KAIST,M3FD,and FLIR datasets demonstrate that our method delivers outstanding performance in daytime and nighttime scenarios.On the KAIST dataset,the miss rate drops to 5.99%,and further to 4.26% in night scenes.On the FLIR and M3FD datasets,it achieves AP50 scores of 79.4% and 88.9%,respectively.