Objective To develop a depression recognition model by integrating the spirit-expression diagnostic framework of traditional Chinese medicine(TCM)with machine learning algorithms.The proposed model seeks to establish ...Objective To develop a depression recognition model by integrating the spirit-expression diagnostic framework of traditional Chinese medicine(TCM)with machine learning algorithms.The proposed model seeks to establish a TCM-informed tool for early depression screening,thereby bridging traditional diagnostic principles with modern computational approaches.Methods The study included patients with depression who visited the Shanghai Pudong New Area Mental Health Center from October 1,2022 to October 1,2023,as well as students and teachers from Shanghai University of Traditional Chinese Medicine during the same period as the healthy control group.Videos of 3–10 s were captured using a Xiaomi Pad 5,and the TCM spirit and expressions were determined by TCM experts(at least 3 out of 5 experts agreed to determine the category of TCM spirit and expressions).Basic information,facial images,and interview information were collected through a portable TCM intelligent analysis and diagnosis device,and facial diagnosis features were extracted using the Open CV computer vision library technology.Statistical analysis methods such as parametric and non-parametric tests were used to analyze the baseline data,TCM spirit and expression features,and facial diagnosis feature parameters of the two groups,to compare the differences in TCM spirit and expression and facial features.Five machine learning algorithms,including extreme gradient boosting(XGBoost),decision tree(DT),Bernoulli naive Bayes(BernoulliNB),support vector machine(SVM),and k-nearest neighbor(KNN)classification,were used to construct a depression recognition model based on the fusion of TCM spirit and expression features.The performance of the model was evaluated using metrics such as accuracy,precision,and the area under the receiver operating characteristic(ROC)curve(AUC).The model results were explained using the Shapley Additive exPlanations(SHAP).Results A total of 93 depression patients and 87 healthy individuals were ultimately included in this study.There was no statistically significant difference in the baseline characteristics between the two groups(P>0.05).The differences in the characteristics of the spirit and expressions in TCM and facial features between the two groups were shown as follows.(i)Quantispirit facial analysis revealed that depression patients exhibited significantly reduced facial spirit and luminance compared with healthy controls(P<0.05),with characteristic features such as sad expressions,facial erythema,and changes in the lip color ranging from erythematous to cyanotic.(ii)Depressed patients exhibited significantly lower values in facial complexion L,lip L,and a values,and gloss index,but higher values in facial complexion a and b,lip b,low gloss index,and matte index(all P<0.05).(iii)The results of multiple models show that the XGBoost-based depression recognition model,integrating the TCM“spirit-expression”diagnostic framework,achieved an accuracy of 98.61%and significantly outperformed four benchmark algorithms—DT,BernoulliNB,SVM,and KNN(P<0.01).(iv)The SHAP visualization results show that in the recognition model constructed by the XGBoost algorithm,the complexion b value,categories of facial spirit,high gloss index,low gloss index,categories of facial expression and texture features have significant contribution to the model.Conclusion This study demonstrates that integrating TCM spirit-expression diagnostic features with machine learning enables the construction of a high-precision depression detection model,offering a novel paradigm for objective depression diagnosis.展开更多
Arabic Sign Language(ArSL)recognition plays a vital role in enhancing the communication for the Deaf and Hard of Hearing(DHH)community.Researchers have proposed multiple methods for automated recognition of ArSL;howev...Arabic Sign Language(ArSL)recognition plays a vital role in enhancing the communication for the Deaf and Hard of Hearing(DHH)community.Researchers have proposed multiple methods for automated recognition of ArSL;however,these methods face multiple challenges that include high gesture variability,occlusions,limited signer diversity,and the scarcity of large annotated datasets.Existing methods,often relying solely on either skeletal data or video-based features,struggle with generalization and robustness,especially in dynamic and real-world conditions.This paper proposes a novel multimodal ensemble classification framework that integrates geometric features derived from 3D skeletal joint distances and angles with temporal features extracted from RGB videos using the Inflated 3D ConvNet(I3D).By fusing these complementary modalities at the feature level and applying a majority-voting ensemble of XGBoost,Random Forest,and Support Vector Machine classifiers,the framework robustly captures both spatial configurations and motion dynamics of sign gestures.Feature selection using the Pearson Correlation Coefficient further enhances efficiency by reducing redundancy.Extensive experiments on the ArabSign dataset,which includes RGB videos and corresponding skeletal data,demonstrate that the proposed approach significantly outperforms state-of-the-art methods,achieving an average F1-score of 97%using a majority-voting ensemble of XGBoost,Random Forest,and SVM classifiers,and improving recognition accuracy by more than 7%over previous best methods.This work not only advances the technical stateof-the-art in ArSL recognition but also provides a scalable,real-time solution for practical deployment in educational,social,and assistive communication technologies.Even though this study is about Arabic Sign Language,the framework proposed here can be extended to different sign languages,creating possibilities for potentially worldwide applicability in sign language recognition tasks.展开更多
Biometric characteristics are playing a vital role in security for the last few years.Human gait classification in video sequences is an important biometrics attribute and is used for security purposes.A new framework...Biometric characteristics are playing a vital role in security for the last few years.Human gait classification in video sequences is an important biometrics attribute and is used for security purposes.A new framework for human gait classification in video sequences using deep learning(DL)fusion assisted and posterior probability-based moth flames optimization(MFO)is proposed.In the first step,the video frames are resized and finetuned by two pre-trained lightweight DL models,EfficientNetB0 and MobileNetV2.Both models are selected based on the top-5 accuracy and less number of parameters.Later,both models are trained through deep transfer learning and extracted deep features fused using a voting scheme.In the last step,the authors develop a posterior probabilitybased MFO feature selection algorithm to select the best features.The selected features are classified using several supervised learning methods.The CASIA-B publicly available dataset has been employed for the experimental process.On this dataset,the authors selected six angles such as 0°,18°,90°,108°,162°,and 180°and obtained an average accuracy of 96.9%,95.7%,86.8%,90.0%,95.1%,and 99.7%.Results demonstrate comparable improvement in accuracy and significantly minimize the computational time with recent state-of-the-art techniques.展开更多
To address the challenges of small target detection and significant scale variations in unmanned aerial vehicle(UAV)aerial imagery,which often lead to missed and false detections,we propose Multi-scale Feature Fusion ...To address the challenges of small target detection and significant scale variations in unmanned aerial vehicle(UAV)aerial imagery,which often lead to missed and false detections,we propose Multi-scale Feature Fusion YOLO(MFF-YOLO),an enhanced algorithm based on YOLOv8s.Our approach introduces a Multi-scale Feature Fusion Strategy(MFFS),comprising the Multiple Features C2f(MFC)module and the Scale Sequence Feature Fusion(SSFF)module,to improve feature integration across different network levels.This enables more effective capture of fine-grained details and sequential multi-scale features.Furthermore,we incorporate Inner-CIoU,an improved loss function that uses auxiliary bounding boxes to enhance the regression quality of small object boxes.To ensure practicality for UAV deployment,we apply the Layer-adaptive Magnitude-based pruning(LAMP)method to significantly reduce model size and computational cost.Experiments on the VisDrone2019 dataset show that MFF-YOLO achieves a 5.7% increase in mean average precision(mAP)over the baseline,while reducing parameters by 8.5 million and computation by 17.5%.The results demonstrate that our method effectively improves detection performance in UAV aerial scenarios.展开更多
Fault diagnosis of rolling bearings is crucial for ensuring the stable operation of mechanical equipment and production safety in industrial environments.However,due to the nonlinearity and non-stationarity of collect...Fault diagnosis of rolling bearings is crucial for ensuring the stable operation of mechanical equipment and production safety in industrial environments.However,due to the nonlinearity and non-stationarity of collected vibration signals,single-modal methods struggle to capture fault features fully.This paper proposes a rolling bearing fault diagnosis method based on multi-modal information fusion.The method first employs the Hippopotamus Optimization Algorithm(HO)to optimize the number of modes in Variational Mode Decomposition(VMD)to achieve optimal modal decomposition performance.It combines Convolutional Neural Networks(CNN)and Gated Recurrent Units(GRU)to extract temporal features from one-dimensional time-series signals.Meanwhile,the Markovian Transition Field(MTF)is used to transform one-dimensional signals into two-dimensional images for spatial feature mining.Through visualization techniques,the effectiveness of generated images from different parameter combinations is compared to determine the optimal parameter configuration.A multi-modal network(GSTCN)is constructed by integrating Swin-Transformer and the Convolutional Block Attention Module(CBAM),where the attention module is utilized to enhance fault features.Finally,the fault features extracted from different modalities are deeply fused and fed into a fully connected layer to complete fault classification.Experimental results show that the GSTCN model achieves an average diagnostic accuracy of 99.5%across three datasets,significantly outperforming existing comparison methods.This demonstrates that the proposed model has high diagnostic precision and good generalization ability,providing an efficient and reliable solution for rolling bearing fault diagnosis.展开更多
The initial noise present in the depth images obtained with RGB-D sensors is a combination of hardware limitations in addition to the environmental factors,due to the limited capabilities of sensors,which also produce...The initial noise present in the depth images obtained with RGB-D sensors is a combination of hardware limitations in addition to the environmental factors,due to the limited capabilities of sensors,which also produce poor computer vision results.The common image denoising techniques tend to remove significant image details and also remove noise,provided they are based on space and frequency filtering.The updated framework presented in this paper is a novel denoising model that makes use of Boruta-driven feature selection using a Long Short-Term Memory Autoencoder(LSTMAE).The Boruta algorithm identifies the most useful depth features that are used to maximize the spatial structure integrity and reduce redundancy.An LSTMAE is then used to process these selected features and model depth pixel sequences to generate robust,noise-resistant representations.The system uses the encoder to encode the input data into a latent space that has been compressed before it is decoded to retrieve the clean image.Experiments on a benchmark data set show that the suggested technique attains a PSNR of 45 dB and an SSIM of 0.90,which is 10 dB higher than the performance of conventional convolutional autoencoders and 15 times higher than that of the wavelet-based models.Moreover,the feature selection step will decrease the input dimensionality by 40%,resulting in a 37.5%reduction in training time and a real-time inference rate of 200 FPS.Boruta-LSTMAE framework,therefore,offers a highly efficient and scalable system for depth image denoising,with a high potential to be applied to close-range 3D systems,such as robotic manipulation and gesture-based interfaces.展开更多
Traffic sign detection is a critical component of driving systems.Single-stage network-based traffic sign detection algorithms,renowned for their fast detection speeds and high accuracy,have become the dominant approa...Traffic sign detection is a critical component of driving systems.Single-stage network-based traffic sign detection algorithms,renowned for their fast detection speeds and high accuracy,have become the dominant approach in current practices.However,in complex and dynamic traffic scenes,particularly with smaller traffic sign objects,challenges such as missed and false detections can lead to reduced overall detection accuracy.To address this issue,this paper proposes a detection algorithm that integrates edge and shape information.Recognizing that traffic signs have specific shapes and distinct edge contours,this paper introduces an edge feature extraction branch within the backbone network,enabling adaptive fusion with features of the same hierarchical level.Additionally,a shape prior convolution module is designed to replaces the first two convolutional modules of the backbone network,aimed at enhancing the model's perception ability for specific shape objects and reducing its sensitivity to background noise.The algorithm was evaluated on the CCTSDB and TT100k datasets,and compared to YOLOv8s,the mAP50 values increased by 3.0%and 10.4%,respectively,demonstrating the effectiveness of the proposed method in improving the accuracy of traffic sign detection.展开更多
Small object detection has been a focus of attention since the emergence of deep learning-based object detection.Although classical object detection frameworks have made significant contributions to the development of...Small object detection has been a focus of attention since the emergence of deep learning-based object detection.Although classical object detection frameworks have made significant contributions to the development of object detection,there are still many issues to be resolved in detecting small objects due to the inherent complexity and diversity of real-world visual scenes.In particular,the YOLO(You Only Look Once)series of detection models,renowned for their real-time performance,have undergone numerous adaptations aimed at improving the detection of small targets.In this survey,we summarize the state-of-the-art YOLO-based small object detection methods.This review presents a systematic categorization of YOLO-based approaches for small-object detection,organized into four methodological avenues,namely attention-based feature enhancement,detection-head optimization,loss function,and multi-scale feature fusion strategies.We then examine the principal challenges addressed by each category.Finally,we analyze the performance of thesemethods on public benchmarks and,by comparing current approaches,identify limitations and outline directions for future research.展开更多
At inference time,deep neural networks are susceptible to backdoor attacks,which can produce attackercontrolled outputs when inputs contain carefully crafted triggers.Existing defense methods often focus on specific a...At inference time,deep neural networks are susceptible to backdoor attacks,which can produce attackercontrolled outputs when inputs contain carefully crafted triggers.Existing defense methods often focus on specific attack types or incur high costs,such as data cleaning or model fine-tuning.In contrast,we argue that it is possible to achieve effective and generalizable defense without removing triggers or incurring high model-cleaning costs.Fromthe attacker’s perspective and based on characteristics of vulnerable neuron activation anomalies,we propose an Adaptive Feature Injection(AFI)method for black-box backdoor detection.AFI employs a pre-trained image encoder to extract multi-level deep features and constructs a dynamic weight fusionmechanism for precise identification and interception of poisoned samples.Specifically,we select the control samples with the largest feature differences fromthe clean dataset via feature-space analysis,and generate blended sample pairs with the test sample using dynamic linear interpolation.The detection statistic is computed by measuring the divergence G(x)in model output responses.We systematically evaluate the effectiveness of AFI against representative backdoor attacks,including BadNets,Blend,WaNet,and IAB,on three benchmark datasets:MNIST,CIFAR-10,and ImageNet.Experimental results show that AFI can effectively detect poisoned samples,achieving average detection rates of 95.20%,94.15%,and 86.49%on these datasets,respectively.Compared with existing methods,AFI demonstrates strong cross-domain generalization ability and robustness to unknown attacks.展开更多
In recent years,with the rapid advancement of artificial intelligence,object detection algorithms have made significant strides in accuracy and computational efficiency.Notably,research and applications of Anchor-Free...In recent years,with the rapid advancement of artificial intelligence,object detection algorithms have made significant strides in accuracy and computational efficiency.Notably,research and applications of Anchor-Free models have opened new avenues for real-time target detection in optical remote sensing images(ORSIs).However,in the realmof adversarial attacks,developing adversarial techniques tailored to Anchor-Freemodels remains challenging.Adversarial examples generated based on Anchor-Based models often exhibit poor transferability to these new model architectures.Furthermore,the growing diversity of Anchor-Free models poses additional hurdles to achieving robust transferability of adversarial attacks.This study presents an improved cross-conv-block feature fusion You Only Look Once(YOLO)architecture,meticulously engineered to facilitate the extraction ofmore comprehensive semantic features during the backpropagation process.To address the asymmetry between densely distributed objects in ORSIs and the corresponding detector outputs,a novel dense bounding box attack strategy is proposed.This approach leverages dense target bounding boxes loss in the calculation of adversarial loss functions.Furthermore,by integrating translation-invariant(TI)and momentum-iteration(MI)adversarial methodologies,the proposed framework significantly improves the transferability of adversarial attacks.Experimental results demonstrate that our method achieves superior adversarial attack performance,with adversarial transferability rates(ATR)of 67.53%on the NWPU VHR-10 dataset and 90.71%on the HRSC2016 dataset.Compared to ensemble adversarial attack and cascaded adversarial attack approaches,our method generates adversarial examples in an average of 0.64 s,representing an approximately 14.5%improvement in efficiency under equivalent conditions.展开更多
Deep learning has made significant progress in the field of oriented object detection for remote sensing images.However,existing methods still face challenges when dealing with difficult tasks such as multi-scale targ...Deep learning has made significant progress in the field of oriented object detection for remote sensing images.However,existing methods still face challenges when dealing with difficult tasks such as multi-scale targets,complex backgrounds,and small objects in remote sensing.Maintaining model lightweight to address resource constraints in remote sensing scenarios while improving task completion for remote sensing tasks remains a research hotspot.Therefore,we propose an enhanced multi-scale feature extraction lightweight network EM-YOLO based on the YOLOv8s architecture,specifically optimized for the characteristics of large target scale variations,diverse orientations,and numerous small objects in remote sensing images.Our innovations lie in two main aspects:First,a dynamic snake convolution(DSC)is introduced into the backbone network to enhance the model’s feature extraction capability for oriented targets.Second,an innovative focusing-diffusion module is designed in the feature fusion neck to effectively integrate multi-scale feature information.Finally,we introduce Layer-Adaptive Sparsity for magnitude-based Pruning(LASP)method to perform lightweight network pruning to better complete tasks in resource-constrained scenarios.Experimental results on the lightweight platform Orin demonstrate that the proposed method significantly outperforms the original YOLOv8s model in oriented remote sensing object detection tasks,and achieves comparable or superior performance to state-of-the-art methods on three authoritative remote sensing datasets(DOTA v1.0,DOTA v1.5,and HRSC2016).展开更多
Roadbed disease detection is essential for maintaining road functionality.Ground penetrating radar(GPR)enables non-destructive detection without drilling.However,current identification often relies on manual inspectio...Roadbed disease detection is essential for maintaining road functionality.Ground penetrating radar(GPR)enables non-destructive detection without drilling.However,current identification often relies on manual inspection,which requires extensive experience,suffers from low efficiency,and is highly subjective.As the results are presented as radar images,image processing methods can be applied for fast and objective identification.Deep learning-based approaches now offer a robust solution for automated roadbed disease detection.This study proposes an enhanced Faster Region-based Convolutional Neural Networks(R-CNN)framework integrating ResNet-50 as the backbone and two-dimensional discrete Fourier spectrum transformation(2D-DFT)for frequency-domain feature fusion.A dedicated GPR image dataset comprising 1650 annotated images was constructed and augmented to 6600 images via median filtering,histogram equalization,and binarization.The proposed model segments defect regions,applies binary masking,and fuses frequency-domain features to improve small-target detection under noisy backgrounds.Experimental results show that the improved Faster R-CNN achieves a mean Average Precision(mAP)of 0.92,representing a 0.22 increase over the baseline.Precision improved by 26%while recall remained stable at 87%.The model was further validated on real urban road data,demonstrating robust detection capability even under interference.These findings highlight the potential of combining GPR with deep learning for efficient,non-destructive roadbed health monitoring.展开更多
In response to the challenges in highway pavement distress detection,such as multiple defect categories,difficulties in feature extraction for different damage types,and slow identification speeds,this paper proposes ...In response to the challenges in highway pavement distress detection,such as multiple defect categories,difficulties in feature extraction for different damage types,and slow identification speeds,this paper proposes an enhanced pavement crack detection model named Star-YOLO11.This improved algorithm modifies the YOLO11 architecture by substituting the original C3k2 backbone network with a Star-s50 feature extraction network.The enhanced structure adjusts the number of stacked layers in the StarBlock module to optimize detection accuracy and improve model efficiency.To enhance the accuracy of pavement crack detection and improve model efficiency,three key modifications to the YOLO11 architecture are proposed.Firstly,the original C3k2 backbone is replaced with a StarBlock-based structure,forming the Star-s50 feature extraction backbone network.This lightweight redesign reduces computational complexity while maintaining detection precision.Secondly,to address the inefficiency of the original Partial Self-attention(PSA)mechanism in capturing localized crack features,the convolutional prior-aware Channel Prior Convolutional Attention(CPCA)mechanism is integrated into the channel dimension,creating a hybrid CPC-C2PSA attention structure.Thirdly,the original neck structure is upgraded to a Star Multi-Branch Auxiliary Feature Pyramid Network(SMAFPN)based on the Multi-Branch Auxiliary Feature Pyramid Network architecture,which adaptively fuses high-level semantic and low-level spatial information through Star-s50 connections and C3k2 extraction blocks.Additionally,a composite dataset augmentation strategy combining traditional and advanced augmentation techniques is developed.This strategy is validated on a specialized pavement dataset containing five distinct crack categories for comprehensive training and evaluation.Experimental results indicate that the proposed Star-YOLO11 achieves an accuracy of 89.9%(3.5%higher than the baseline),a mean average precision(mAP)of 90.3%(+2.6%),and an F1-score of 85.8%(+0.5%),while reducing the model size by 18.8%and reaching a frame rate of 225.73 frames per second(FPS)for real-time detection.It shows potential for lightweight deployment in pavement crack detection tasks.展开更多
Tomato is a major economic crop worldwide,and diseases on tomato leaves can significantly reduce both yield and quality.Traditional manual inspection is inefficient and highly subjective,making it difficult to meet th...Tomato is a major economic crop worldwide,and diseases on tomato leaves can significantly reduce both yield and quality.Traditional manual inspection is inefficient and highly subjective,making it difficult to meet the requirements of early disease identification in complex natural environments.To address this issue,this study proposes an improved YOLO11-based model,YOLO-SPDNet(Scale Sequence Fusion,Position-Channel Attention,and Dual Enhancement Network).The model integrates the SEAM(Self-Ensembling Attention Mechanism)semantic enhancement module,the MLCA(Mixed Local Channel Attention)lightweight attention mechanism,and the SPA(Scale-Position-Detail Awareness)module composed of SSFF(Scale Sequence Feature Fusion),TFE(Triple Feature Encoding),and CPAM(Channel and Position Attention Mechanism).These enhancements strengthen fine-grained lesion detection while maintaining model lightweightness.Experimental results show that YOLO-SPDNet achieves an accuracy of 91.8%,a recall of 86.5%,and an mAP@0.5 of 90.6%on the test set,with a computational complexity of 12.5 GFLOPs.Furthermore,the model reaches a real-time inference speed of 987 FPS,making it suitable for deployment on mobile agricultural terminals and online monitoring systems.Comparative analysis and ablation studies further validate the reliability and practical applicability of the proposed model in complex natural scenes.展开更多
To address the challenge of real-time detection of unauthorized drone intrusions in complex low-altitude urban environments such as parks and airports,this paper proposes an enhanced MBS-YOLO(Multi-Branch Small Target...To address the challenge of real-time detection of unauthorized drone intrusions in complex low-altitude urban environments such as parks and airports,this paper proposes an enhanced MBS-YOLO(Multi-Branch Small Target Detection YOLO)model for anti-drone object detection,based on the YOLOv8 architecture.To overcome the limitations of existing methods in detecting small objects within complex backgrounds,we designed a C2f-Pu module with excellent feature extraction capability and a more compact parameter set,aiming to reduce the model’s computational complexity.To improve multi-scale feature fusion,we construct a Multi-Branch Feature Pyramid Network(MB-FPN)that employs a cross-level feature fusion strategy to enhance the model’s representation of small objects.Additionally,a shared detail-enhanced detection head is introduced to address the large size variations of Unmanned Aerial Vehicle(UAV)targets,thereby improving detection performance across different scales.Experimental results demonstrate that the proposed model achieves consistent improvements across multiple benchmarks.On the Det-Fly dataset,it improves precision by 3%,recall by 5.6%,and mAP50 by 4.5%compared with the baseline,while reducing parameters by 21.2%.Cross-validation on the VisDrone dataset further validates its robustness,yielding additional gains of 3.2%in precision,6.1%in recall,and 4.8%in mAP50 over the original YOLOv8.These findings confirm the effectiveness of the proposed algorithm in enhancing UAV detection performance under complex scenarios.展开更多
Human activity recognition(HAR)is a method to predict human activities from sensor signals using machine learning(ML)techniques.HAR systems have several applications in various domains,including medicine,surveillance,...Human activity recognition(HAR)is a method to predict human activities from sensor signals using machine learning(ML)techniques.HAR systems have several applications in various domains,including medicine,surveillance,behavioral monitoring,and posture analysis.Extraction of suitable information from sensor data is an important part of the HAR process to recognize activities accurately.Several research studies on HAR have utilizedMel frequency cepstral coefficients(MFCCs)because of their effectiveness in capturing the periodic pattern of sensor signals.However,existing MFCC-based approaches often fail to capture sufficient temporal variability,which limits their ability to distinguish between complex or imbalanced activity classes robustly.To address this gap,this study proposes a feature fusion strategy that merges time-based and MFCC features(MFCCT)to enhance activity representation.The merged features were fed to a convolutional neural network(CNN)integrated with long shortterm memory(LSTM)—DeepConvLSTM to construct the HAR model.The MFCCT features with DeepConvLSTM achieved better performance as compared to MFCCs and time-based features on PAMAP2,UCI-HAR,and WISDM by obtaining an accuracy of 97%,98%,and 97%,respectively.In addition,DeepConvLSTM outperformed the deep learning(DL)algorithms that have recently been employed in HAR.These results confirm that the proposed hybrid features are not only practical but also generalizable,making them applicable across diverse HAR datasets for accurate activity classification.展开更多
Camouflaged Object Detection(COD)aims to identify objects that share highly similar patterns—such as texture,intensity,and color—with their surrounding environment.Due to their intrinsic resemblance to the backgroun...Camouflaged Object Detection(COD)aims to identify objects that share highly similar patterns—such as texture,intensity,and color—with their surrounding environment.Due to their intrinsic resemblance to the background,camouflaged objects often exhibit vague boundaries and varying scales,making it challenging to accurately locate targets and delineate their indistinct edges.To address this,we propose a novel camouflaged object detection network called Edge-Guided and Multi-scale Fusion Network(EGMFNet),which leverages edge-guided multi-scale integration for enhanced performance.The model incorporates two innovative components:a Multi-scale Fusion Module(MSFM)and an Edge-Guided Attention Module(EGA).These designs exploit multi-scale features to uncover subtle cues between candidate objects and the background while emphasizing camouflaged object boundaries.Moreover,recognizing the rich contextual information in fused features,we introduce a Dual-Branch Global Context Module(DGCM)to refine features using extensive global context,thereby generatingmore informative representations.Experimental results on four benchmark datasets demonstrate that EGMFNet outperforms state-of-the-art methods across five evaluation metrics.Specifically,on COD10K,our EGMFNet-P improves F_(β)by 4.8 points and reduces mean absolute error(MAE)by 0.006 compared with ZoomNeXt;on NC4K,it achieves a 3.6-point increase in F_(β).OnCAMO and CHAMELEON,it obtains 4.5-point increases in F_(β),respectively.These consistent gains substantiate the superiority and robustness of EGMFNet.展开更多
[Objective]Leaf diseases significantly affect both the yield and quality of tea throughout the year.To address the issue of inadequate segmentation finesse in the current tea spot segmentation models,a novel diagnosis...[Objective]Leaf diseases significantly affect both the yield and quality of tea throughout the year.To address the issue of inadequate segmentation finesse in the current tea spot segmentation models,a novel diagnosis of the severity of tea spots was proposed in this research,designated as MDC-U-Net3+,to enhance segmentation accuracy on the base framework of U-Net3+.[Methods]Multi-scale feature fusion module(MSFFM)was incorporated into the backbone network of U-Net3+to obtain feature information across multiple receptive fields of diseased spots,thereby reducing the loss of features within the encoder.Dual multi-scale attention(DMSA)was incorporated into the skip connection process to mitigate the segmentation boundary ambiguity issue.This integration facilitates the comprehensive fusion of fine-grained and coarse-grained semantic information at full scale.Furthermore,the segmented mask image was subjected to conditional random fields(CRF)to enhance the optimization of the segmentation results[Results and Discussions]The improved model MDC-U-Net3+achieved a mean pixel accuracy(mPA)of 94.92%,accompanied by a mean Intersection over Union(mIoU)ratio of 90.9%.When compared to the mPA and mIoU of U-Net3+,MDC-U-Net3+model showed improvements of 1.85 and 2.12 percentage points,respectively.These results illustrated a more effective segmentation performance than that achieved by other classical semantic segmentation models.[Conclusions]The methodology presented herein could provide data support for automated disease detection and precise medication,consequently reducing the losses associated with tea diseases.展开更多
This research centers on structural health monitoring of bridges,a critical transportation infrastructure.Owing to the cumulative action of heavy vehicle loads,environmental variations,and material aging,bridge compon...This research centers on structural health monitoring of bridges,a critical transportation infrastructure.Owing to the cumulative action of heavy vehicle loads,environmental variations,and material aging,bridge components are prone to cracks and other defects,severely compromising structural safety and service life.Traditional inspection methods relying on manual visual assessment or vehicle-mounted sensors suffer from low efficiency,strong subjectivity,and high costs,while conventional image processing techniques and early deep learning models(e.g.,UNet,Faster R-CNN)still performinadequately in complex environments(e.g.,varying illumination,noise,false cracks)due to poor perception of fine cracks andmulti-scale features,limiting practical application.To address these challenges,this paper proposes CACNN-Net(CBAM-Augmented CNN),a novel dual-encoder architecture that innovatively couples a CNN for local detail extraction with a CBAM-Transformer for global context modeling.A key contribution is the dedicated Feature FusionModule(FFM),which strategically integratesmulti-scale features and focuses attention on crack regions while suppressing irrelevant noise.Experiments on bridge crack datasets demonstrate that CACNNNet achieves a precision of 77.6%,a recall of 79.4%,and an mIoU of 62.7%.These results significantly outperform several typical models(e.g.,UNet-ResNet34,Deeplabv3),confirming their superior accuracy and robust generalization,providing a high-precision automated solution for bridge crack detection and a novel network design paradigm for structural surface defect identification in complex scenarios,while future research may integrate physical features like depth information to advance intelligent infrastructure maintenance and digital twin management.展开更多
With the rapid expansion of drone applications,accurate detection of objects in aerial imagery has become crucial for intelligent transportation,urban management,and emergency rescue missions.However,existing methods ...With the rapid expansion of drone applications,accurate detection of objects in aerial imagery has become crucial for intelligent transportation,urban management,and emergency rescue missions.However,existing methods face numerous challenges in practical deployment,including scale variation handling,feature degradation,and complex backgrounds.To address these issues,we propose Edge-enhanced and Detail-Capturing You Only Look Once(EHDC-YOLO),a novel framework for object detection in Unmanned Aerial Vehicle(UAV)imagery.Based on the You Only Look Once version 11 nano(YOLOv11n)baseline,EHDC-YOLO systematically introduces several architectural enhancements:(1)a Multi-Scale Edge Enhancement(MSEE)module that leverages multi-scale pooling and edge information to enhance boundary feature extraction;(2)an Enhanced Feature Pyramid Network(EFPN)that integrates P2-level features with Cross Stage Partial(CSP)structures and OmniKernel convolutions for better fine-grained representation;and(3)Dynamic Head(DyHead)with multi-dimensional attention mechanisms for enhanced cross-scale modeling and perspective adaptability.Comprehensive experiments on the Vision meets Drones for Detection(VisDrone-DET)2019 dataset demonstrate that EHDC-YOLO achieves significant improvements,increasing mean Average Precision(mAP)@0.5 from 33.2%to 46.1%(an absolute improvement of 12.9 percentage points)and mAP@0.5:0.95 from 19.5%to 28.0%(an absolute improvement of 8.5 percentage points)compared with the YOLOv11n baseline,while maintaining a reasonable parameter count(2.81 M vs the baseline’s 2.58 M).Further ablation studies confirm the effectiveness of each proposed component,while visualization results highlight EHDC-YOLO’s superior performance in detecting objects and handling occlusions in complex drone scenarios.展开更多
基金General Program of National Natural Science Foundation of China(82474390)Construction Project of Pudong New Area Famous TCM Studios(National Pilot Zone for TCM Development,Shanghai)(PDZY-2025-0716)Shanghai Municipal Science and Technology Program Project Shanghai Key Laboratory of Health Identification and Assessment(21DZ2271000).
文摘Objective To develop a depression recognition model by integrating the spirit-expression diagnostic framework of traditional Chinese medicine(TCM)with machine learning algorithms.The proposed model seeks to establish a TCM-informed tool for early depression screening,thereby bridging traditional diagnostic principles with modern computational approaches.Methods The study included patients with depression who visited the Shanghai Pudong New Area Mental Health Center from October 1,2022 to October 1,2023,as well as students and teachers from Shanghai University of Traditional Chinese Medicine during the same period as the healthy control group.Videos of 3–10 s were captured using a Xiaomi Pad 5,and the TCM spirit and expressions were determined by TCM experts(at least 3 out of 5 experts agreed to determine the category of TCM spirit and expressions).Basic information,facial images,and interview information were collected through a portable TCM intelligent analysis and diagnosis device,and facial diagnosis features were extracted using the Open CV computer vision library technology.Statistical analysis methods such as parametric and non-parametric tests were used to analyze the baseline data,TCM spirit and expression features,and facial diagnosis feature parameters of the two groups,to compare the differences in TCM spirit and expression and facial features.Five machine learning algorithms,including extreme gradient boosting(XGBoost),decision tree(DT),Bernoulli naive Bayes(BernoulliNB),support vector machine(SVM),and k-nearest neighbor(KNN)classification,were used to construct a depression recognition model based on the fusion of TCM spirit and expression features.The performance of the model was evaluated using metrics such as accuracy,precision,and the area under the receiver operating characteristic(ROC)curve(AUC).The model results were explained using the Shapley Additive exPlanations(SHAP).Results A total of 93 depression patients and 87 healthy individuals were ultimately included in this study.There was no statistically significant difference in the baseline characteristics between the two groups(P>0.05).The differences in the characteristics of the spirit and expressions in TCM and facial features between the two groups were shown as follows.(i)Quantispirit facial analysis revealed that depression patients exhibited significantly reduced facial spirit and luminance compared with healthy controls(P<0.05),with characteristic features such as sad expressions,facial erythema,and changes in the lip color ranging from erythematous to cyanotic.(ii)Depressed patients exhibited significantly lower values in facial complexion L,lip L,and a values,and gloss index,but higher values in facial complexion a and b,lip b,low gloss index,and matte index(all P<0.05).(iii)The results of multiple models show that the XGBoost-based depression recognition model,integrating the TCM“spirit-expression”diagnostic framework,achieved an accuracy of 98.61%and significantly outperformed four benchmark algorithms—DT,BernoulliNB,SVM,and KNN(P<0.01).(iv)The SHAP visualization results show that in the recognition model constructed by the XGBoost algorithm,the complexion b value,categories of facial spirit,high gloss index,low gloss index,categories of facial expression and texture features have significant contribution to the model.Conclusion This study demonstrates that integrating TCM spirit-expression diagnostic features with machine learning enables the construction of a high-precision depression detection model,offering a novel paradigm for objective depression diagnosis.
基金funding this work through Research Group No.KS-2024-376.
文摘Arabic Sign Language(ArSL)recognition plays a vital role in enhancing the communication for the Deaf and Hard of Hearing(DHH)community.Researchers have proposed multiple methods for automated recognition of ArSL;however,these methods face multiple challenges that include high gesture variability,occlusions,limited signer diversity,and the scarcity of large annotated datasets.Existing methods,often relying solely on either skeletal data or video-based features,struggle with generalization and robustness,especially in dynamic and real-world conditions.This paper proposes a novel multimodal ensemble classification framework that integrates geometric features derived from 3D skeletal joint distances and angles with temporal features extracted from RGB videos using the Inflated 3D ConvNet(I3D).By fusing these complementary modalities at the feature level and applying a majority-voting ensemble of XGBoost,Random Forest,and Support Vector Machine classifiers,the framework robustly captures both spatial configurations and motion dynamics of sign gestures.Feature selection using the Pearson Correlation Coefficient further enhances efficiency by reducing redundancy.Extensive experiments on the ArabSign dataset,which includes RGB videos and corresponding skeletal data,demonstrate that the proposed approach significantly outperforms state-of-the-art methods,achieving an average F1-score of 97%using a majority-voting ensemble of XGBoost,Random Forest,and SVM classifiers,and improving recognition accuracy by more than 7%over previous best methods.This work not only advances the technical stateof-the-art in ArSL recognition but also provides a scalable,real-time solution for practical deployment in educational,social,and assistive communication technologies.Even though this study is about Arabic Sign Language,the framework proposed here can be extended to different sign languages,creating possibilities for potentially worldwide applicability in sign language recognition tasks.
基金King Saud University,Grant/Award Number:RSP2024R157。
文摘Biometric characteristics are playing a vital role in security for the last few years.Human gait classification in video sequences is an important biometrics attribute and is used for security purposes.A new framework for human gait classification in video sequences using deep learning(DL)fusion assisted and posterior probability-based moth flames optimization(MFO)is proposed.In the first step,the video frames are resized and finetuned by two pre-trained lightweight DL models,EfficientNetB0 and MobileNetV2.Both models are selected based on the top-5 accuracy and less number of parameters.Later,both models are trained through deep transfer learning and extracted deep features fused using a voting scheme.In the last step,the authors develop a posterior probabilitybased MFO feature selection algorithm to select the best features.The selected features are classified using several supervised learning methods.The CASIA-B publicly available dataset has been employed for the experimental process.On this dataset,the authors selected six angles such as 0°,18°,90°,108°,162°,and 180°and obtained an average accuracy of 96.9%,95.7%,86.8%,90.0%,95.1%,and 99.7%.Results demonstrate comparable improvement in accuracy and significantly minimize the computational time with recent state-of-the-art techniques.
基金supported by the National Natural Science Foundation of China(No.61976028).
文摘To address the challenges of small target detection and significant scale variations in unmanned aerial vehicle(UAV)aerial imagery,which often lead to missed and false detections,we propose Multi-scale Feature Fusion YOLO(MFF-YOLO),an enhanced algorithm based on YOLOv8s.Our approach introduces a Multi-scale Feature Fusion Strategy(MFFS),comprising the Multiple Features C2f(MFC)module and the Scale Sequence Feature Fusion(SSFF)module,to improve feature integration across different network levels.This enables more effective capture of fine-grained details and sequential multi-scale features.Furthermore,we incorporate Inner-CIoU,an improved loss function that uses auxiliary bounding boxes to enhance the regression quality of small object boxes.To ensure practicality for UAV deployment,we apply the Layer-adaptive Magnitude-based pruning(LAMP)method to significantly reduce model size and computational cost.Experiments on the VisDrone2019 dataset show that MFF-YOLO achieves a 5.7% increase in mean average precision(mAP)over the baseline,while reducing parameters by 8.5 million and computation by 17.5%.The results demonstrate that our method effectively improves detection performance in UAV aerial scenarios.
基金funded by the Jilin Provincial Department of Science and Technology,grant number 20230101208JC.
文摘Fault diagnosis of rolling bearings is crucial for ensuring the stable operation of mechanical equipment and production safety in industrial environments.However,due to the nonlinearity and non-stationarity of collected vibration signals,single-modal methods struggle to capture fault features fully.This paper proposes a rolling bearing fault diagnosis method based on multi-modal information fusion.The method first employs the Hippopotamus Optimization Algorithm(HO)to optimize the number of modes in Variational Mode Decomposition(VMD)to achieve optimal modal decomposition performance.It combines Convolutional Neural Networks(CNN)and Gated Recurrent Units(GRU)to extract temporal features from one-dimensional time-series signals.Meanwhile,the Markovian Transition Field(MTF)is used to transform one-dimensional signals into two-dimensional images for spatial feature mining.Through visualization techniques,the effectiveness of generated images from different parameter combinations is compared to determine the optimal parameter configuration.A multi-modal network(GSTCN)is constructed by integrating Swin-Transformer and the Convolutional Block Attention Module(CBAM),where the attention module is utilized to enhance fault features.Finally,the fault features extracted from different modalities are deeply fused and fed into a fully connected layer to complete fault classification.Experimental results show that the GSTCN model achieves an average diagnostic accuracy of 99.5%across three datasets,significantly outperforming existing comparison methods.This demonstrates that the proposed model has high diagnostic precision and good generalization ability,providing an efficient and reliable solution for rolling bearing fault diagnosis.
文摘The initial noise present in the depth images obtained with RGB-D sensors is a combination of hardware limitations in addition to the environmental factors,due to the limited capabilities of sensors,which also produce poor computer vision results.The common image denoising techniques tend to remove significant image details and also remove noise,provided they are based on space and frequency filtering.The updated framework presented in this paper is a novel denoising model that makes use of Boruta-driven feature selection using a Long Short-Term Memory Autoencoder(LSTMAE).The Boruta algorithm identifies the most useful depth features that are used to maximize the spatial structure integrity and reduce redundancy.An LSTMAE is then used to process these selected features and model depth pixel sequences to generate robust,noise-resistant representations.The system uses the encoder to encode the input data into a latent space that has been compressed before it is decoded to retrieve the clean image.Experiments on a benchmark data set show that the suggested technique attains a PSNR of 45 dB and an SSIM of 0.90,which is 10 dB higher than the performance of conventional convolutional autoencoders and 15 times higher than that of the wavelet-based models.Moreover,the feature selection step will decrease the input dimensionality by 40%,resulting in a 37.5%reduction in training time and a real-time inference rate of 200 FPS.Boruta-LSTMAE framework,therefore,offers a highly efficient and scalable system for depth image denoising,with a high potential to be applied to close-range 3D systems,such as robotic manipulation and gesture-based interfaces.
基金supported by the National Natural Science Foundation of China(Grant Nos.62572057,62272049,U24A20331)Beijing Natural Science Foundation(Grant Nos.4232026,4242020)Academic Research Projects of Beijing Union University(Grant No.ZK10202404).
文摘Traffic sign detection is a critical component of driving systems.Single-stage network-based traffic sign detection algorithms,renowned for their fast detection speeds and high accuracy,have become the dominant approach in current practices.However,in complex and dynamic traffic scenes,particularly with smaller traffic sign objects,challenges such as missed and false detections can lead to reduced overall detection accuracy.To address this issue,this paper proposes a detection algorithm that integrates edge and shape information.Recognizing that traffic signs have specific shapes and distinct edge contours,this paper introduces an edge feature extraction branch within the backbone network,enabling adaptive fusion with features of the same hierarchical level.Additionally,a shape prior convolution module is designed to replaces the first two convolutional modules of the backbone network,aimed at enhancing the model's perception ability for specific shape objects and reducing its sensitivity to background noise.The algorithm was evaluated on the CCTSDB and TT100k datasets,and compared to YOLOv8s,the mAP50 values increased by 3.0%and 10.4%,respectively,demonstrating the effectiveness of the proposed method in improving the accuracy of traffic sign detection.
基金supported in part by the by Chongqing Research Program of Basic Research and Frontier Technology under Grant CSTB2025NSCQ-GPX1309.
文摘Small object detection has been a focus of attention since the emergence of deep learning-based object detection.Although classical object detection frameworks have made significant contributions to the development of object detection,there are still many issues to be resolved in detecting small objects due to the inherent complexity and diversity of real-world visual scenes.In particular,the YOLO(You Only Look Once)series of detection models,renowned for their real-time performance,have undergone numerous adaptations aimed at improving the detection of small targets.In this survey,we summarize the state-of-the-art YOLO-based small object detection methods.This review presents a systematic categorization of YOLO-based approaches for small-object detection,organized into four methodological avenues,namely attention-based feature enhancement,detection-head optimization,loss function,and multi-scale feature fusion strategies.We then examine the principal challenges addressed by each category.Finally,we analyze the performance of thesemethods on public benchmarks and,by comparing current approaches,identify limitations and outline directions for future research.
基金supported by the National Natural Science Foundation of China Grant(No.61972133)Project of Leading Talents in Science and Technology Innovation for Thousands of People Plan in Henan Province Grant(No.204200510021)the Key Research and Development Plan Special Project of Henan Province Grant(No.241111211400).
文摘At inference time,deep neural networks are susceptible to backdoor attacks,which can produce attackercontrolled outputs when inputs contain carefully crafted triggers.Existing defense methods often focus on specific attack types or incur high costs,such as data cleaning or model fine-tuning.In contrast,we argue that it is possible to achieve effective and generalizable defense without removing triggers or incurring high model-cleaning costs.Fromthe attacker’s perspective and based on characteristics of vulnerable neuron activation anomalies,we propose an Adaptive Feature Injection(AFI)method for black-box backdoor detection.AFI employs a pre-trained image encoder to extract multi-level deep features and constructs a dynamic weight fusionmechanism for precise identification and interception of poisoned samples.Specifically,we select the control samples with the largest feature differences fromthe clean dataset via feature-space analysis,and generate blended sample pairs with the test sample using dynamic linear interpolation.The detection statistic is computed by measuring the divergence G(x)in model output responses.We systematically evaluate the effectiveness of AFI against representative backdoor attacks,including BadNets,Blend,WaNet,and IAB,on three benchmark datasets:MNIST,CIFAR-10,and ImageNet.Experimental results show that AFI can effectively detect poisoned samples,achieving average detection rates of 95.20%,94.15%,and 86.49%on these datasets,respectively.Compared with existing methods,AFI demonstrates strong cross-domain generalization ability and robustness to unknown attacks.
文摘In recent years,with the rapid advancement of artificial intelligence,object detection algorithms have made significant strides in accuracy and computational efficiency.Notably,research and applications of Anchor-Free models have opened new avenues for real-time target detection in optical remote sensing images(ORSIs).However,in the realmof adversarial attacks,developing adversarial techniques tailored to Anchor-Freemodels remains challenging.Adversarial examples generated based on Anchor-Based models often exhibit poor transferability to these new model architectures.Furthermore,the growing diversity of Anchor-Free models poses additional hurdles to achieving robust transferability of adversarial attacks.This study presents an improved cross-conv-block feature fusion You Only Look Once(YOLO)architecture,meticulously engineered to facilitate the extraction ofmore comprehensive semantic features during the backpropagation process.To address the asymmetry between densely distributed objects in ORSIs and the corresponding detector outputs,a novel dense bounding box attack strategy is proposed.This approach leverages dense target bounding boxes loss in the calculation of adversarial loss functions.Furthermore,by integrating translation-invariant(TI)and momentum-iteration(MI)adversarial methodologies,the proposed framework significantly improves the transferability of adversarial attacks.Experimental results demonstrate that our method achieves superior adversarial attack performance,with adversarial transferability rates(ATR)of 67.53%on the NWPU VHR-10 dataset and 90.71%on the HRSC2016 dataset.Compared to ensemble adversarial attack and cascaded adversarial attack approaches,our method generates adversarial examples in an average of 0.64 s,representing an approximately 14.5%improvement in efficiency under equivalent conditions.
基金funded by the Hainan Province Science and Technology Special Fund under Grant ZDYF2024GXJS292.
文摘Deep learning has made significant progress in the field of oriented object detection for remote sensing images.However,existing methods still face challenges when dealing with difficult tasks such as multi-scale targets,complex backgrounds,and small objects in remote sensing.Maintaining model lightweight to address resource constraints in remote sensing scenarios while improving task completion for remote sensing tasks remains a research hotspot.Therefore,we propose an enhanced multi-scale feature extraction lightweight network EM-YOLO based on the YOLOv8s architecture,specifically optimized for the characteristics of large target scale variations,diverse orientations,and numerous small objects in remote sensing images.Our innovations lie in two main aspects:First,a dynamic snake convolution(DSC)is introduced into the backbone network to enhance the model’s feature extraction capability for oriented targets.Second,an innovative focusing-diffusion module is designed in the feature fusion neck to effectively integrate multi-scale feature information.Finally,we introduce Layer-Adaptive Sparsity for magnitude-based Pruning(LASP)method to perform lightweight network pruning to better complete tasks in resource-constrained scenarios.Experimental results on the lightweight platform Orin demonstrate that the proposed method significantly outperforms the original YOLOv8s model in oriented remote sensing object detection tasks,and achieves comparable or superior performance to state-of-the-art methods on three authoritative remote sensing datasets(DOTA v1.0,DOTA v1.5,and HRSC2016).
基金supported by the Second Batch of Key Textbook Construction Projects of“14th Five-Year Plan”of Zhejiang Vocational Colleges(SZDJC-2412).
文摘Roadbed disease detection is essential for maintaining road functionality.Ground penetrating radar(GPR)enables non-destructive detection without drilling.However,current identification often relies on manual inspection,which requires extensive experience,suffers from low efficiency,and is highly subjective.As the results are presented as radar images,image processing methods can be applied for fast and objective identification.Deep learning-based approaches now offer a robust solution for automated roadbed disease detection.This study proposes an enhanced Faster Region-based Convolutional Neural Networks(R-CNN)framework integrating ResNet-50 as the backbone and two-dimensional discrete Fourier spectrum transformation(2D-DFT)for frequency-domain feature fusion.A dedicated GPR image dataset comprising 1650 annotated images was constructed and augmented to 6600 images via median filtering,histogram equalization,and binarization.The proposed model segments defect regions,applies binary masking,and fuses frequency-domain features to improve small-target detection under noisy backgrounds.Experimental results show that the improved Faster R-CNN achieves a mean Average Precision(mAP)of 0.92,representing a 0.22 increase over the baseline.Precision improved by 26%while recall remained stable at 87%.The model was further validated on real urban road data,demonstrating robust detection capability even under interference.These findings highlight the potential of combining GPR with deep learning for efficient,non-destructive roadbed health monitoring.
基金funded by the Jiangxi SASAC Science and Technology Innovation Special Project and the Key Technology Research and Application Promotion of Highway Overload Digital Solution.
文摘In response to the challenges in highway pavement distress detection,such as multiple defect categories,difficulties in feature extraction for different damage types,and slow identification speeds,this paper proposes an enhanced pavement crack detection model named Star-YOLO11.This improved algorithm modifies the YOLO11 architecture by substituting the original C3k2 backbone network with a Star-s50 feature extraction network.The enhanced structure adjusts the number of stacked layers in the StarBlock module to optimize detection accuracy and improve model efficiency.To enhance the accuracy of pavement crack detection and improve model efficiency,three key modifications to the YOLO11 architecture are proposed.Firstly,the original C3k2 backbone is replaced with a StarBlock-based structure,forming the Star-s50 feature extraction backbone network.This lightweight redesign reduces computational complexity while maintaining detection precision.Secondly,to address the inefficiency of the original Partial Self-attention(PSA)mechanism in capturing localized crack features,the convolutional prior-aware Channel Prior Convolutional Attention(CPCA)mechanism is integrated into the channel dimension,creating a hybrid CPC-C2PSA attention structure.Thirdly,the original neck structure is upgraded to a Star Multi-Branch Auxiliary Feature Pyramid Network(SMAFPN)based on the Multi-Branch Auxiliary Feature Pyramid Network architecture,which adaptively fuses high-level semantic and low-level spatial information through Star-s50 connections and C3k2 extraction blocks.Additionally,a composite dataset augmentation strategy combining traditional and advanced augmentation techniques is developed.This strategy is validated on a specialized pavement dataset containing five distinct crack categories for comprehensive training and evaluation.Experimental results indicate that the proposed Star-YOLO11 achieves an accuracy of 89.9%(3.5%higher than the baseline),a mean average precision(mAP)of 90.3%(+2.6%),and an F1-score of 85.8%(+0.5%),while reducing the model size by 18.8%and reaching a frame rate of 225.73 frames per second(FPS)for real-time detection.It shows potential for lightweight deployment in pavement crack detection tasks.
基金Tianmin Tianyuan Boutique Vegetable Industry Technology Service Station(Grant No.2024120011003081)Development of Environmental Monitoring and Traceability System for Wuqing Agricultural Production Areas(Grant No.2024120011001866)。
文摘Tomato is a major economic crop worldwide,and diseases on tomato leaves can significantly reduce both yield and quality.Traditional manual inspection is inefficient and highly subjective,making it difficult to meet the requirements of early disease identification in complex natural environments.To address this issue,this study proposes an improved YOLO11-based model,YOLO-SPDNet(Scale Sequence Fusion,Position-Channel Attention,and Dual Enhancement Network).The model integrates the SEAM(Self-Ensembling Attention Mechanism)semantic enhancement module,the MLCA(Mixed Local Channel Attention)lightweight attention mechanism,and the SPA(Scale-Position-Detail Awareness)module composed of SSFF(Scale Sequence Feature Fusion),TFE(Triple Feature Encoding),and CPAM(Channel and Position Attention Mechanism).These enhancements strengthen fine-grained lesion detection while maintaining model lightweightness.Experimental results show that YOLO-SPDNet achieves an accuracy of 91.8%,a recall of 86.5%,and an mAP@0.5 of 90.6%on the test set,with a computational complexity of 12.5 GFLOPs.Furthermore,the model reaches a real-time inference speed of 987 FPS,making it suitable for deployment on mobile agricultural terminals and online monitoring systems.Comparative analysis and ablation studies further validate the reliability and practical applicability of the proposed model in complex natural scenes.
基金supported by the Key R&D Programof Xianyang City,Shaanxi Province(L2024-ZDYF-ZDYF-GY-0043).
文摘To address the challenge of real-time detection of unauthorized drone intrusions in complex low-altitude urban environments such as parks and airports,this paper proposes an enhanced MBS-YOLO(Multi-Branch Small Target Detection YOLO)model for anti-drone object detection,based on the YOLOv8 architecture.To overcome the limitations of existing methods in detecting small objects within complex backgrounds,we designed a C2f-Pu module with excellent feature extraction capability and a more compact parameter set,aiming to reduce the model’s computational complexity.To improve multi-scale feature fusion,we construct a Multi-Branch Feature Pyramid Network(MB-FPN)that employs a cross-level feature fusion strategy to enhance the model’s representation of small objects.Additionally,a shared detail-enhanced detection head is introduced to address the large size variations of Unmanned Aerial Vehicle(UAV)targets,thereby improving detection performance across different scales.Experimental results demonstrate that the proposed model achieves consistent improvements across multiple benchmarks.On the Det-Fly dataset,it improves precision by 3%,recall by 5.6%,and mAP50 by 4.5%compared with the baseline,while reducing parameters by 21.2%.Cross-validation on the VisDrone dataset further validates its robustness,yielding additional gains of 3.2%in precision,6.1%in recall,and 4.8%in mAP50 over the original YOLOv8.These findings confirm the effectiveness of the proposed algorithm in enhancing UAV detection performance under complex scenarios.
基金supported by Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia through the Researchers Supporting Project PNURSP2025R333.
文摘Human activity recognition(HAR)is a method to predict human activities from sensor signals using machine learning(ML)techniques.HAR systems have several applications in various domains,including medicine,surveillance,behavioral monitoring,and posture analysis.Extraction of suitable information from sensor data is an important part of the HAR process to recognize activities accurately.Several research studies on HAR have utilizedMel frequency cepstral coefficients(MFCCs)because of their effectiveness in capturing the periodic pattern of sensor signals.However,existing MFCC-based approaches often fail to capture sufficient temporal variability,which limits their ability to distinguish between complex or imbalanced activity classes robustly.To address this gap,this study proposes a feature fusion strategy that merges time-based and MFCC features(MFCCT)to enhance activity representation.The merged features were fed to a convolutional neural network(CNN)integrated with long shortterm memory(LSTM)—DeepConvLSTM to construct the HAR model.The MFCCT features with DeepConvLSTM achieved better performance as compared to MFCCs and time-based features on PAMAP2,UCI-HAR,and WISDM by obtaining an accuracy of 97%,98%,and 97%,respectively.In addition,DeepConvLSTM outperformed the deep learning(DL)algorithms that have recently been employed in HAR.These results confirm that the proposed hybrid features are not only practical but also generalizable,making them applicable across diverse HAR datasets for accurate activity classification.
基金financially supported byChongqingUniversity of Technology Graduate Innovation Foundation(Grant No.gzlcx20253267).
文摘Camouflaged Object Detection(COD)aims to identify objects that share highly similar patterns—such as texture,intensity,and color—with their surrounding environment.Due to their intrinsic resemblance to the background,camouflaged objects often exhibit vague boundaries and varying scales,making it challenging to accurately locate targets and delineate their indistinct edges.To address this,we propose a novel camouflaged object detection network called Edge-Guided and Multi-scale Fusion Network(EGMFNet),which leverages edge-guided multi-scale integration for enhanced performance.The model incorporates two innovative components:a Multi-scale Fusion Module(MSFM)and an Edge-Guided Attention Module(EGA).These designs exploit multi-scale features to uncover subtle cues between candidate objects and the background while emphasizing camouflaged object boundaries.Moreover,recognizing the rich contextual information in fused features,we introduce a Dual-Branch Global Context Module(DGCM)to refine features using extensive global context,thereby generatingmore informative representations.Experimental results on four benchmark datasets demonstrate that EGMFNet outperforms state-of-the-art methods across five evaluation metrics.Specifically,on COD10K,our EGMFNet-P improves F_(β)by 4.8 points and reduces mean absolute error(MAE)by 0.006 compared with ZoomNeXt;on NC4K,it achieves a 3.6-point increase in F_(β).OnCAMO and CHAMELEON,it obtains 4.5-point increases in F_(β),respectively.These consistent gains substantiate the superiority and robustness of EGMFNet.
文摘[Objective]Leaf diseases significantly affect both the yield and quality of tea throughout the year.To address the issue of inadequate segmentation finesse in the current tea spot segmentation models,a novel diagnosis of the severity of tea spots was proposed in this research,designated as MDC-U-Net3+,to enhance segmentation accuracy on the base framework of U-Net3+.[Methods]Multi-scale feature fusion module(MSFFM)was incorporated into the backbone network of U-Net3+to obtain feature information across multiple receptive fields of diseased spots,thereby reducing the loss of features within the encoder.Dual multi-scale attention(DMSA)was incorporated into the skip connection process to mitigate the segmentation boundary ambiguity issue.This integration facilitates the comprehensive fusion of fine-grained and coarse-grained semantic information at full scale.Furthermore,the segmented mask image was subjected to conditional random fields(CRF)to enhance the optimization of the segmentation results[Results and Discussions]The improved model MDC-U-Net3+achieved a mean pixel accuracy(mPA)of 94.92%,accompanied by a mean Intersection over Union(mIoU)ratio of 90.9%.When compared to the mPA and mIoU of U-Net3+,MDC-U-Net3+model showed improvements of 1.85 and 2.12 percentage points,respectively.These results illustrated a more effective segmentation performance than that achieved by other classical semantic segmentation models.[Conclusions]The methodology presented herein could provide data support for automated disease detection and precise medication,consequently reducing the losses associated with tea diseases.
基金supported by the National Natural Science Foundation of China(No.52308332)the General Scientific Research Project of the Education Department of Zhejiang Province(No.Y202455824).
文摘This research centers on structural health monitoring of bridges,a critical transportation infrastructure.Owing to the cumulative action of heavy vehicle loads,environmental variations,and material aging,bridge components are prone to cracks and other defects,severely compromising structural safety and service life.Traditional inspection methods relying on manual visual assessment or vehicle-mounted sensors suffer from low efficiency,strong subjectivity,and high costs,while conventional image processing techniques and early deep learning models(e.g.,UNet,Faster R-CNN)still performinadequately in complex environments(e.g.,varying illumination,noise,false cracks)due to poor perception of fine cracks andmulti-scale features,limiting practical application.To address these challenges,this paper proposes CACNN-Net(CBAM-Augmented CNN),a novel dual-encoder architecture that innovatively couples a CNN for local detail extraction with a CBAM-Transformer for global context modeling.A key contribution is the dedicated Feature FusionModule(FFM),which strategically integratesmulti-scale features and focuses attention on crack regions while suppressing irrelevant noise.Experiments on bridge crack datasets demonstrate that CACNNNet achieves a precision of 77.6%,a recall of 79.4%,and an mIoU of 62.7%.These results significantly outperform several typical models(e.g.,UNet-ResNet34,Deeplabv3),confirming their superior accuracy and robust generalization,providing a high-precision automated solution for bridge crack detection and a novel network design paradigm for structural surface defect identification in complex scenarios,while future research may integrate physical features like depth information to advance intelligent infrastructure maintenance and digital twin management.
文摘With the rapid expansion of drone applications,accurate detection of objects in aerial imagery has become crucial for intelligent transportation,urban management,and emergency rescue missions.However,existing methods face numerous challenges in practical deployment,including scale variation handling,feature degradation,and complex backgrounds.To address these issues,we propose Edge-enhanced and Detail-Capturing You Only Look Once(EHDC-YOLO),a novel framework for object detection in Unmanned Aerial Vehicle(UAV)imagery.Based on the You Only Look Once version 11 nano(YOLOv11n)baseline,EHDC-YOLO systematically introduces several architectural enhancements:(1)a Multi-Scale Edge Enhancement(MSEE)module that leverages multi-scale pooling and edge information to enhance boundary feature extraction;(2)an Enhanced Feature Pyramid Network(EFPN)that integrates P2-level features with Cross Stage Partial(CSP)structures and OmniKernel convolutions for better fine-grained representation;and(3)Dynamic Head(DyHead)with multi-dimensional attention mechanisms for enhanced cross-scale modeling and perspective adaptability.Comprehensive experiments on the Vision meets Drones for Detection(VisDrone-DET)2019 dataset demonstrate that EHDC-YOLO achieves significant improvements,increasing mean Average Precision(mAP)@0.5 from 33.2%to 46.1%(an absolute improvement of 12.9 percentage points)and mAP@0.5:0.95 from 19.5%to 28.0%(an absolute improvement of 8.5 percentage points)compared with the YOLOv11n baseline,while maintaining a reasonable parameter count(2.81 M vs the baseline’s 2.58 M).Further ablation studies confirm the effectiveness of each proposed component,while visualization results highlight EHDC-YOLO’s superior performance in detecting objects and handling occlusions in complex drone scenarios.