Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to ...Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to the inability to effectively capture global information from images,CNNs can easily lead to loss of contours and textures in segmentation results.Notice that the transformer model can effectively capture the properties of long-range dependencies in the image,and furthermore,combining the CNN and the transformer can effectively extract local details and global contextual features of the image.Motivated by this,we propose a multi-branch and multi-scale attention network(M2ANet)for medical image segmentation,whose architecture consists of three components.Specifically,in the first component,we construct an adaptive multi-branch patch module for parallel extraction of image features to reduce information loss caused by downsampling.In the second component,we apply residual block to the well-known convolutional block attention module to enhance the network’s ability to recognize important features of images and alleviate the phenomenon of gradient vanishing.In the third component,we design a multi-scale feature fusion module,in which we adopt adaptive average pooling and position encoding to enhance contextual features,and then multi-head attention is introduced to further enrich feature representation.Finally,we validate the effectiveness and feasibility of the proposed M2ANet method through comparative experiments on four benchmark medical image segmentation datasets,particularly in the context of preserving contours and textures.展开更多
Brain tumors present significant challenges in medical diagnosis and treatment,where early detection is crucial for reducing morbidity and mortality rates.This research introduces a novel deep learning model,the Progr...Brain tumors present significant challenges in medical diagnosis and treatment,where early detection is crucial for reducing morbidity and mortality rates.This research introduces a novel deep learning model,the Progressive Layered U-Net(PLU-Net),designed to improve brain tumor segmentation accuracy from Magnetic Resonance Imaging(MRI)scans.The PLU-Net extends the standard U-Net architecture by incorporating progressive layering,attention mechanisms,and multi-scale data augmentation.The progressive layering involves a cascaded structure that refines segmentation masks across multiple stages,allowing the model to capture features at different scales and resolutions.Attention gates within the convolutional layers selectively focus on relevant features while suppressing irrelevant ones,enhancing the model's ability to delineate tumor boundaries.Additionally,multi-scale data augmentation techniques increase the diversity of training data and boost the model's generalization capabilities.Evaluated on the BraTS 2021 dataset,the PLU-Net achieved state-of-the-art performance with a dice coefficient of 0.91,specificity of 0.92,sensitivity of 0.89,Hausdorff95 of 2.5,outperforming other modified U-Net architectures in segmentation accuracy.These results underscore the effectiveness of the PLU-Net in improving brain tumor segmentation from MRI scans,supporting clinicians in early diagnosis,treatment planning,and the development of new therapies.展开更多
Segmenting skin lesions is critical for early skin cancer detection.Existing CNN and Transformer-based methods face challenges such as high computational complexity and limited adaptability to variations in lesion siz...Segmenting skin lesions is critical for early skin cancer detection.Existing CNN and Transformer-based methods face challenges such as high computational complexity and limited adaptability to variations in lesion sizes.To overcome these limitations,we introduce MSAMamba-UNet,a lightweight model that integrates two novel architectures:Multi-Scale Mamba(MSMamba)and Adaptive Dynamic Gating Block(ADGB).MSMamba utilizes multi-scale decomposition and a parallel hierarchical structure to enhance the delineation of irregular lesion boundaries and sensitivity to small targets.ADGB dynamically selects convolutional kernels with varying receptive fields based on input features,improving the model’s capacity to accommodate diverse lesion textures and scales.Additionally,we introduce a Mix Attention Fusion Block(MAF)to enhance shallow feature representation by integrating parallel channel and pixel attention mechanisms.Extensive evaluation of MSAMamba-UNet on the ISIC 2016,ISIC 2017,and ISIC 2018 datasets demonstrates competitive segmentation accuracy with only 0.056 M parameters and 0.069 GFLOPs.Our experiments revealed that MSAMamba-UNet achieved IoU scores of 85.53%,85.47%,and 82.22%,as well as DSC scores of 92.20%,92.17%,and 90.24%,respectively.These results underscore the lightweight design and effectiveness of MSAMamba-UNet.展开更多
Segmentation of the retinal vessels in the fundus is crucial for diagnosing ocular diseases.Retinal vessel images often suffer from category imbalance and large scale variations.This ultimately results in incomplete v...Segmentation of the retinal vessels in the fundus is crucial for diagnosing ocular diseases.Retinal vessel images often suffer from category imbalance and large scale variations.This ultimately results in incomplete vessel segmentation and poor continuity.In this study,we propose CT-MFENet to address the aforementioned issues.First,the use of context transformer(CT)allows for the integration of contextual feature information,which helps establish the connection between pixels and solve the problem of incomplete vessel continuity.Second,multi-scale dense residual networks are used instead of traditional CNN to address the issue of inadequate local feature extraction when the model encounters vessels at multiple scales.In the decoding stage,we introduce a local-global fusion module.It enhances the localization of vascular information and reduces the semantic gap between high-and low-level features.To address the class imbalance in retinal images,we propose a hybrid loss function that enhances the segmentation ability of the model for topological structures.We conducted experiments on the publicly available DRIVE,CHASEDB1,STARE,and IOSTAR datasets.The experimental results show that our CT-MFENet performs better than most existing methods,including the baseline U-Net.展开更多
Semantic segmentation plays a foundational role in biomedical image analysis, providing precise information about cellular, tissue, and organ structures in both biological and medical imaging modalities. Traditional a...Semantic segmentation plays a foundational role in biomedical image analysis, providing precise information about cellular, tissue, and organ structures in both biological and medical imaging modalities. Traditional approaches often fail in the face of challenges such as low contrast, morphological variability, and densely packed structures. Recent advancements in deep learning have transformed segmentation capabilities through the integration of fine-scale detail preservation, coarse-scale contextual modeling, and multi-scale feature fusion. This work provides a comprehensive analysis of state-of-the-art deep learning models, including U-Net variants, attention-based frameworks, and Transformer-integrated networks, highlighting innovations that improve accuracy, generalizability, and computational efficiency. Key architectural components such as convolution operations, shallow and deep blocks, skip connections, and hybrid encoders are examined for their roles in enhancing spatial representation and semantic consistency. We further discuss the importance of hierarchical and instance-aware segmentation and annotation in interpreting complex biological scenes and multiplexed medical images. By bridging methodological developments with diverse application domains, this paper outlines current trends and future directions for semantic segmentation, emphasizing its critical role in facilitating annotation, diagnosis, and discovery in biomedical research.展开更多
Despite its remarkable performance on natural images,the segment anything model(SAM)lacks domain-specific information in medical imaging.and faces the challenge of losing local multi-scale information in the encoding ...Despite its remarkable performance on natural images,the segment anything model(SAM)lacks domain-specific information in medical imaging.and faces the challenge of losing local multi-scale information in the encoding phase.This paper presents a medical image segmentation model based on SAM with a local multi-scale feature encoder(LMSFE-SAM)to address the issues above.Firstly,based on the SAM,a local multi-scale feature encoder is introduced to improve the representation of features within local receptive field,thereby supplying the Vision Transformer(ViT)branch in SAM with enriched local multi-scale contextual information.At the same time,a multiaxial Hadamard product module(MHPM)is incorporated into the local multi-scale feature encoder in a lightweight manner to reduce the quadratic complexity and noise interference.Subsequently,a cross-branch balancing adapter is designed to balance the local and global information between the local multi-scale feature encoder and the ViT encoder in SAM.Finally,to obtain smaller input image size and to mitigate overlapping in patch embeddings,the size of the input image is reduced from 1024×1024 pixels to 256×256 pixels,and a multidimensional information adaptation component is developed,which includes feature adapters,position adapters,and channel-spatial adapters.This component effectively integrates the information from small-sized medical images into SAM,enhancing its suitability for clinical deployment.The proposed model demonstrates an average enhancement ranging from 0.0387 to 0.3191 across six objective evaluation metrics on BUSI,DDTI,and TN3K datasets compared to eight other representative image segmentation models.This significantly enhances the performance of the SAM on medical images,providing clinicians with a powerful tool in clinical diagnosis.展开更多
Semantic segmentation has made significant breakthroughs in various application fields,but achieving both accurate and efficient segmentation with limited computational resources remains a major challenge.To this end,...Semantic segmentation has made significant breakthroughs in various application fields,but achieving both accurate and efficient segmentation with limited computational resources remains a major challenge.To this end,we propose CGMISeg,an efficient semantic segmentation architecture based on a context-guided multi-scale interaction strategy,aiming to significantly reduce computational overhead while maintaining segmentation accuracy.CGMISeg consists of three core components:context-aware attention modulation,feature reconstruction,and crossinformation fusion.Context-aware attention modulation is carefully designed to capture key contextual information through channel and spatial attention mechanisms.The feature reconstruction module reconstructs contextual information from different scales,modeling key rectangular areas by capturing critical contextual information in both horizontal and vertical directions,thereby enhancing the focus on foreground features.The cross-information fusion module aims to fuse the reconstructed high-level features with the original low-level features during upsampling,promoting multi-scale interaction and enhancing the model’s ability to handle objects at different scales.We extensively evaluated CGMISeg on ADE20K,Cityscapes,and COCO-Stuff,three widely used datasets benchmarks,and the experimental results show that CGMISeg exhibits significant advantages in segmentation performance,computational efficiency,and inference speed,clearly outperforming several mainstream methods,including SegFormer,Feedformer,and SegNext.Specifically,CGMISeg achieves 42.9%mIoU(Mean Intersection over Union)and 15.7 FPS(Frames Per Second)on the ADE20K dataset with 3.8 GFLOPs(Giga Floating-point Operations Per Second),outperforming Feedformer and SegNeXt by 3.7%and 1.8%in mIoU,respectively,while also offering reduced computational complexity and faster inference.CGMISeg strikes an excellent balance between accuracy and efficiency,significantly enhancing both computational and inference performance while maintaining high precision,showcasing exceptional practical value and strong potential for widespread applications.展开更多
In high-risk industrial environments like nuclear power plants,precise defect identification and localization are essential for maintaining production stability and safety.However,the complexity of such a harsh enviro...In high-risk industrial environments like nuclear power plants,precise defect identification and localization are essential for maintaining production stability and safety.However,the complexity of such a harsh environment leads to significant variations in the shape and size of the defects.To address this challenge,we propose the multivariate time series segmentation network(MSSN),which adopts a multiscale convolutional network with multi-stage and depth-separable convolutions for efficient feature extraction through variable-length templates.To tackle the classification difficulty caused by structural signal variance,MSSN employs logarithmic normalization to adjust instance distributions.Furthermore,it integrates classification with smoothing loss functions to accurately identify defect segments amid similar structural and defect signal subsequences.Our algorithm evaluated on both the Mackey-Glass dataset and industrial dataset achieves over 95%localization and demonstrates the capture capability on the synthetic dataset.In a nuclear plant's heat transfer tube dataset,it captures 90%of defect instances with75%middle localization F1 score.展开更多
Background:Coronary artery disease(CAD)is a major global health concern requiring efficient and accurate diagnostic methods.Manual interpretation of coronary computed tomography angiography(CTA)images is time-consumin...Background:Coronary artery disease(CAD)is a major global health concern requiring efficient and accurate diagnostic methods.Manual interpretation of coronary computed tomography angiography(CTA)images is time-consuming and prone to interobserver variability,underscoring the need for automated segmentation and stenosis detection tools.Methods:This study presents a hybrid multi-scale 3D segmentation framework utilizing both 3D U-Net and Enhanced 3D U-Net architectures,designed to balance computational efficiency and anatomical precision.Processed CTA images from the ImageCAS dataset underwent data standardization,normalization,and augmentation.The framework applies ensemble learning to merge coarse and fine segmentation masks,followed by advanced post-processing techniques,including connected component analysis and centerline extraction,to refine vessel delineation.Stenosis regions are detected using the Enhanced 3D U-Net and morphological operations for accurate localization.Results:The proposed pipeline achieved near-perfect segmentation accuracy(0.9993)and a Dice similarity coefficient of 0.8539 for coronary artery delineation.Precision,recall,and F1 scores for stenosis detection were 0.8418,0.8289,and 0.8397,respectively.The dual-model approach demonstrated robust performance across varied anatomical structures and effectively localized stenotic regions,indicating clear superiority over conventional models.Conclusion:This hybrid framework enables highly reliable and automated coronary artery segmentation and stenosis detection from 3D CTA images.By reducing reliance on manual interpretation and enhancing diagnostic consistency,the proposed method holds strong potential to improve clinical workflows for CAD diagnosis and management.展开更多
Organoids possess immense potential for unraveling the intricate functions of human tissues and facilitating preclinical disease treatment.Their applications span from high-throughput drug screening to the modeling of...Organoids possess immense potential for unraveling the intricate functions of human tissues and facilitating preclinical disease treatment.Their applications span from high-throughput drug screening to the modeling of complex diseases,with some even achieving clinical translation.Changes in the overall size,shape,boundary,and other morphological features of organoids provide a noninvasive method for assessing organoid drug sensitivity.However,the precise segmentation of organoids in bright-field microscopy images is made difficult by the complexity of the organoid morphology and interference,including overlapping organoids,bubbles,dust particles,and cell fragments.This paper introduces the precision organoid segmentation technique(POST),which is a deep-learning algorithm for segmenting challenging organoids under simple bright-field imaging conditions.Unlike existing methods,POST accurately segments each organoid and eliminates various artifacts encountered during organoid culturing and imaging.Furthermore,it is sensitive to and aligns with measurements of organoid activity in drug sensitivity experiments.POST is expected to be a valuable tool for drug screening using organoids owing to its capability of automatically and rapidly eliminating interfering substances and thereby streamlining the organoid analysis and drug screening process.展开更多
Camouflaged Object Detection(COD)aims to identify objects that share highly similar patterns—such as texture,intensity,and color—with their surrounding environment.Due to their intrinsic resemblance to the backgroun...Camouflaged Object Detection(COD)aims to identify objects that share highly similar patterns—such as texture,intensity,and color—with their surrounding environment.Due to their intrinsic resemblance to the background,camouflaged objects often exhibit vague boundaries and varying scales,making it challenging to accurately locate targets and delineate their indistinct edges.To address this,we propose a novel camouflaged object detection network called Edge-Guided and Multi-scale Fusion Network(EGMFNet),which leverages edge-guided multi-scale integration for enhanced performance.The model incorporates two innovative components:a Multi-scale Fusion Module(MSFM)and an Edge-Guided Attention Module(EGA).These designs exploit multi-scale features to uncover subtle cues between candidate objects and the background while emphasizing camouflaged object boundaries.Moreover,recognizing the rich contextual information in fused features,we introduce a Dual-Branch Global Context Module(DGCM)to refine features using extensive global context,thereby generatingmore informative representations.Experimental results on four benchmark datasets demonstrate that EGMFNet outperforms state-of-the-art methods across five evaluation metrics.Specifically,on COD10K,our EGMFNet-P improves F_(β)by 4.8 points and reduces mean absolute error(MAE)by 0.006 compared with ZoomNeXt;on NC4K,it achieves a 3.6-point increase in F_(β).OnCAMO and CHAMELEON,it obtains 4.5-point increases in F_(β),respectively.These consistent gains substantiate the superiority and robustness of EGMFNet.展开更多
Background:Diabetic macular edema is a prevalent retinal condition and a leading cause of visual impairment among diabetic patients’Early detection of affected areas is beneficial for effective diagnosis and treatmen...Background:Diabetic macular edema is a prevalent retinal condition and a leading cause of visual impairment among diabetic patients’Early detection of affected areas is beneficial for effective diagnosis and treatment.Traditionally,diagnosis relies on optical coherence tomography imaging technology interpreted by ophthalmologists.However,this manual image interpretation is often slow and subjective.Therefore,developing automated segmentation for macular edema images is essential to enhance to improve the diagnosis efficiency and accuracy.Methods:In order to improve clinical diagnostic efficiency and accuracy,we proposed a SegNet network structure integrated with a convolutional block attention module(CBAM).This network introduces a multi-scale input module,the CBAM attention mechanism,and jump connection.The multi-scale input module enhances the network’s perceptual capabilities,while the lightweight CBAM effectively fuses relevant features across channels and spatial dimensions,allowing for better learning of varying information levels.Results:Experimental results demonstrate that the proposed network achieves an IoU of 80.127%and an accuracy of 99.162%.Compared to the traditional segmentation network,this model has fewer parameters,faster training and testing speed,and superior performance on semantic segmentation tasks,indicating its highly practical applicability.Conclusion:The C-SegNet proposed in this study enables accurate segmentation of Diabetic macular edema lesion images,which facilitates quicker diagnosis for healthcare professionals.展开更多
Detailed individual tree crown segmentation is highly relevant for the detection and monitoring of Fraxinus excelsior L.trees affected by ash dieback,a major threat to common ash populations across Europe.In this stud...Detailed individual tree crown segmentation is highly relevant for the detection and monitoring of Fraxinus excelsior L.trees affected by ash dieback,a major threat to common ash populations across Europe.In this study,both fine and coarse crown segmentation methods were applied to close-range multispectral UAV imagery.The fine tree crown segmentation method utilized a novel unsupervised machine learning approach based on a blended NIR-NDVI image,whereas the coarse segmentation relied on the segment anything model(SAM).Both methods successfully delineated tree crown outlines,however,only the fine segmentation accurately captured internal canopy gaps.Despite these structural differences,mean NDVI values calculated per tree crown revealed no significant differences between the two approaches,indicating that coarse segmentation is sufficient for mean vegetation index assessments.Nevertheless,the fine segmentation revealed increased heterogeneity in NDVI values in more severely damaged trees,underscoring its value for detailed structural and health analyses.Furthermore,the fine segmentation workflow proved transferable to both individual UAV images and orthophotos from broader UAV surveys.For applications focused on structural integrity and spatial variation in canopy health,the fine segmentation approach is recommended.展开更多
Images taken in dim environments frequently exhibit issues like insufficient brightness,noise,color shifts,and loss of detail.These problems pose significant challenges to dark image enhancement tasks.Current approach...Images taken in dim environments frequently exhibit issues like insufficient brightness,noise,color shifts,and loss of detail.These problems pose significant challenges to dark image enhancement tasks.Current approaches,while effective in global illumination modeling,often struggle to simultaneously suppress noise and preserve structural details,especially under heterogeneous lighting.Furthermore,misalignment between luminance and color channels introduces additional challenges to accurate enhancement.In response to the aforementioned difficulties,we introduce a single-stage framework,M2ATNet,using the multi-scale multi-attention and Transformer architecture.First,to address the problems of texture blurring and residual noise,we design a multi-scale multi-attention denoising module(MMAD),which is applied separately to the luminance and color channels to enhance the structural and texture modeling capabilities.Secondly,to solve the non-alignment problem of the luminance and color channels,we introduce the multi-channel feature fusion Transformer(CFFT)module,which effectively recovers the dark details and corrects the color shifts through cross-channel alignment and deep feature interaction.To guide the model to learn more stably and efficiently,we also fuse multiple types of loss functions to form a hybrid loss term.We extensively evaluate the proposed method on various standard datasets,including LOL-v1,LOL-v2,DICM,LIME,and NPE.Evaluation in terms of numerical metrics and visual quality demonstrate that M2ATNet consistently outperforms existing advanced approaches.Ablation studies further confirm the critical roles played by the MMAD and CFFT modules to detail preservation and visual fidelity under challenging illumination-deficient environments.展开更多
AIM:To construct an intelligent segmentation scheme for precise localization of central serous chorioretinopathy(CSC)leakage points,thereby enabling ophthalmologists to deliver accurate laser treatment without navigat...AIM:To construct an intelligent segmentation scheme for precise localization of central serous chorioretinopathy(CSC)leakage points,thereby enabling ophthalmologists to deliver accurate laser treatment without navigational laser equipment.METHODS:A dataset with dual labels(point-level and pixel-level)was first established based on fundus fluorescein angiography(FFA)images of CSC and subsequently divided into training(102 images),validation(40 images),and test(40 images)datasets.An intelligent segmentation method was then developed,based on the You Only Look Once version 8 Pose Estimation(YOLOv8-Pose)model and segment anything model(SAM),to segment CSC leakage points.Next,the YOLOv8-Pose model was trained for 200 epochs,and the best-performing model was selected to form the optimal combination with SAM.Additionally,the classic five types of U-Net series models[i.e.,U-Net,recurrent residual U-Net(R2U-Net),attention U-Net(AttU-Net),recurrent residual attention U-Net(R2AttUNet),and nested U-Net(UNet^(++))]were initialized with three random seeds and trained for 200 epochs,resulting in a total of 15 baseline models for comparison.Finally,based on the metrics including Dice similarity coefficient(DICE),intersection over union(IoU),precision,recall,precisionrecall(PR)curve,and receiver operating characteristic(ROC)curve,the proposed method was compared with baseline models through quantitative and qualitative experiments for leakage point segmentation,thereby demonstrating its effectiveness.RESULTS:With the increase of training epochs,the mAP50-95,Recall,and precision of the YOLOv8-Pose model showed a significant increase and tended to stabilize,and it achieved a preliminary localization success rate of 90%(i.e.,36 images)for CSC leakage points in 40 test images.Using manually expert-annotated pixel-level labels as the ground truth,the proposed method achieved outcomes with a DICE of 57.13%,an IoU of 45.31%,a precision of 45.91%,a recall of 93.57%,an area under the PR curve(AUC-PR)of 0.78 and an area under the ROC curve(AUC-ROC)of 0.97,which enables more accurate segmentation of CSC leakage points.CONCLUSION:By combining the precise localization capability of the YOLOv8-Pose model with the robust and flexible segmentation ability of SAM,the proposed method not only demonstrates the effectiveness of the YOLOv8-Pose model in detecting keypoint coordinates of CSC leakage points from the perspective of application innovation but also establishes a novel approach for accurate segmentation of CSC leakage points through the“detect-then-segment”strategy,thereby providing a potential auxiliary means for the automatic and precise realtime localization of leakage points during traditional laser photocoagulation for CSC.展开更多
Accurate and efficient detection of building changes in remote sensing imagery is crucial for urban planning,disaster emergency response,and resource management.However,existing methods face challenges such as spectra...Accurate and efficient detection of building changes in remote sensing imagery is crucial for urban planning,disaster emergency response,and resource management.However,existing methods face challenges such as spectral similarity between buildings and backgrounds,sensor variations,and insufficient computational efficiency.To address these challenges,this paper proposes a novel Multi-scale Efficient Wavelet-based Change Detection Network(MewCDNet),which integrates the advantages of Convolutional Neural Networks and Transformers,balances computational costs,and achieves high-performance building change detection.The network employs EfficientNet-B4 as the backbone for hierarchical feature extraction,integrates multi-level feature maps through a multi-scale fusion strategy,and incorporates two key modules:Cross-temporal Difference Detection(CTDD)and Cross-scale Wavelet Refinement(CSWR).CTDD adopts a dual-branch architecture that combines pixel-wise differencing with semanticaware Euclidean distance weighting to enhance the distinction between true changes and background noise.CSWR integrates Haar-based Discrete Wavelet Transform with multi-head cross-attention mechanisms,enabling cross-scale feature fusion while significantly improving edge localization and suppressing spurious changes.Extensive experiments on four benchmark datasets demonstrate MewCDNet’s superiority over comparison methods:achieving F1 scores of 91.54%on LEVIR,93.70%on WHUCD,and 64.96%on S2Looking for building change detection.Furthermore,MewCDNet exhibits optimal performance on the multi-class⋅SYSU dataset(F1:82.71%),highlighting its exceptional generalization capability.展开更多
Medical image segmentation is of critical importance in the domain of contemporary medical imaging.However,U-Net and its variants exhibit limitations in capturing complex nonlinear patterns and global contextual infor...Medical image segmentation is of critical importance in the domain of contemporary medical imaging.However,U-Net and its variants exhibit limitations in capturing complex nonlinear patterns and global contextual information.Although the subsequent U-KAN model enhances nonlinear representation capabilities,it still faces challenges such as gradient vanishing during deep network training and spatial detail loss during feature downsampling,resulting in insufficient segmentation accuracy for edge structures and minute lesions.To address these challenges,this paper proposes the RE-UKAN model,which innovatively improves upon U-KAN.Firstly,a residual network is introduced into the encoder to effectively mitigate gradient vanishing through cross-layer identity mappings,thus enhancing modelling capabilities for complex pathological structures.Secondly,Efficient Local Attention(ELA)is integrated to suppress spatial detail loss during downsampling,thereby improving the perception of edge structures and minute lesions.Experimental results on four public datasets demonstrate that RE-UKAN outperforms existing medical image segmentation methods across multiple evaluation metrics,with particularly outstanding performance on the TN-SCUI 2020 dataset,achieving IoU of 88.18%and Dice of 93.57%.Compared to the baseline model,it achieves improvements of 3.05%and 1.72%,respectively.These results fully demonstrate RE-UKAN’s superior detail retention capability and boundary recognition accuracy in complex medical image segmentation tasks,providing a reliable solution for clinical precision segmentation.展开更多
Tomato is a major economic crop worldwide,and diseases on tomato leaves can significantly reduce both yield and quality.Traditional manual inspection is inefficient and highly subjective,making it difficult to meet th...Tomato is a major economic crop worldwide,and diseases on tomato leaves can significantly reduce both yield and quality.Traditional manual inspection is inefficient and highly subjective,making it difficult to meet the requirements of early disease identification in complex natural environments.To address this issue,this study proposes an improved YOLO11-based model,YOLO-SPDNet(Scale Sequence Fusion,Position-Channel Attention,and Dual Enhancement Network).The model integrates the SEAM(Self-Ensembling Attention Mechanism)semantic enhancement module,the MLCA(Mixed Local Channel Attention)lightweight attention mechanism,and the SPA(Scale-Position-Detail Awareness)module composed of SSFF(Scale Sequence Feature Fusion),TFE(Triple Feature Encoding),and CPAM(Channel and Position Attention Mechanism).These enhancements strengthen fine-grained lesion detection while maintaining model lightweightness.Experimental results show that YOLO-SPDNet achieves an accuracy of 91.8%,a recall of 86.5%,and an mAP@0.5 of 90.6%on the test set,with a computational complexity of 12.5 GFLOPs.Furthermore,the model reaches a real-time inference speed of 987 FPS,making it suitable for deployment on mobile agricultural terminals and online monitoring systems.Comparative analysis and ablation studies further validate the reliability and practical applicability of the proposed model in complex natural scenes.展开更多
Weakly Supervised Semantic Segmentation(WSSS),which relies only on image-level labels,has attracted significant attention for its cost-effectiveness and scalability.Existing methods mainly enhance inter-class distinct...Weakly Supervised Semantic Segmentation(WSSS),which relies only on image-level labels,has attracted significant attention for its cost-effectiveness and scalability.Existing methods mainly enhance inter-class distinctions and employ data augmentation to mitigate semantic ambiguity and reduce spurious activations.However,they often neglect the complex contextual dependencies among image patches,resulting in incomplete local representations and limited segmentation accuracy.To address these issues,we propose the Context Patch Fusion with Class Token Enhancement(CPF-CTE)framework,which exploits contextual relations among patches to enrich feature repre-sentations and improve segmentation.At its core,the Contextual-Fusion Bidirectional Long Short-Term Memory(CF-BiLSTM)module captures spatial dependencies between patches and enables bidirectional information flow,yield-ing a more comprehensive understanding of spatial correlations.This strengthens feature learning and segmentation robustness.Moreover,we introduce learnable class tokens that dynamically encode and refine class-specific semantics,enhancing discriminative capability.By effectively integrating spatial and semantic cues,CPF-CTE produces richer and more accurate representations of image content.Extensive experiments on PASCAL VOC 2012 and MS COCO 2014 validate that CPF-CTE consistently surpasses prior WSSS methods.展开更多
Distributed Denial of Service(DDoS)attacks are one of the severe threats to network infrastructure,sometimes bypassing traditional diagnosis algorithms because of their evolving complexity.PresentMachine Learning(ML)t...Distributed Denial of Service(DDoS)attacks are one of the severe threats to network infrastructure,sometimes bypassing traditional diagnosis algorithms because of their evolving complexity.PresentMachine Learning(ML)techniques for DDoS attack diagnosis normally apply network traffic statistical features such as packet sizes and inter-arrival times.However,such techniques sometimes fail to capture complicated relations among various traffic flows.In this paper,we present a new multi-scale ensemble strategy given the Graph Neural Networks(GNNs)for improving DDoS detection.Our technique divides traffic into macro-and micro-level elements,letting various GNN models to get the two corase-scale anomalies and subtle,stealthy attack models.Through modeling network traffic as graph-structured data,GNNs efficiently learn intricate relations among network entities.The proposed ensemble learning algorithm combines the results of several GNNs to improve generalization,robustness,and scalability.Extensive experiments on three benchmark datasets—UNSW-NB15,CICIDS2017,and CICDDoS2019—show that our approach outperforms traditional machine learning and deep learning models in detecting both high-rate and low-rate(stealthy)DDoS attacks,with significant improvements in accuracy and recall.These findings demonstrate the suggested method’s applicability and robustness for real-world implementation in contexts where several DDoS patterns coexist.展开更多
基金supported by the Natural Science Foundation of the Anhui Higher Education Institutions of China(Grant Nos.2023AH040149 and 2024AH051915)the Anhui Provincial Natural Science Foundation(Grant No.2208085MF168)+1 种基金the Science and Technology Innovation Tackle Plan Project of Maanshan(Grant No.2024RGZN001)the Scientific Research Fund Project of Anhui Medical University(Grant No.2023xkj122).
文摘Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to the inability to effectively capture global information from images,CNNs can easily lead to loss of contours and textures in segmentation results.Notice that the transformer model can effectively capture the properties of long-range dependencies in the image,and furthermore,combining the CNN and the transformer can effectively extract local details and global contextual features of the image.Motivated by this,we propose a multi-branch and multi-scale attention network(M2ANet)for medical image segmentation,whose architecture consists of three components.Specifically,in the first component,we construct an adaptive multi-branch patch module for parallel extraction of image features to reduce information loss caused by downsampling.In the second component,we apply residual block to the well-known convolutional block attention module to enhance the network’s ability to recognize important features of images and alleviate the phenomenon of gradient vanishing.In the third component,we design a multi-scale feature fusion module,in which we adopt adaptive average pooling and position encoding to enhance contextual features,and then multi-head attention is introduced to further enrich feature representation.Finally,we validate the effectiveness and feasibility of the proposed M2ANet method through comparative experiments on four benchmark medical image segmentation datasets,particularly in the context of preserving contours and textures.
文摘Brain tumors present significant challenges in medical diagnosis and treatment,where early detection is crucial for reducing morbidity and mortality rates.This research introduces a novel deep learning model,the Progressive Layered U-Net(PLU-Net),designed to improve brain tumor segmentation accuracy from Magnetic Resonance Imaging(MRI)scans.The PLU-Net extends the standard U-Net architecture by incorporating progressive layering,attention mechanisms,and multi-scale data augmentation.The progressive layering involves a cascaded structure that refines segmentation masks across multiple stages,allowing the model to capture features at different scales and resolutions.Attention gates within the convolutional layers selectively focus on relevant features while suppressing irrelevant ones,enhancing the model's ability to delineate tumor boundaries.Additionally,multi-scale data augmentation techniques increase the diversity of training data and boost the model's generalization capabilities.Evaluated on the BraTS 2021 dataset,the PLU-Net achieved state-of-the-art performance with a dice coefficient of 0.91,specificity of 0.92,sensitivity of 0.89,Hausdorff95 of 2.5,outperforming other modified U-Net architectures in segmentation accuracy.These results underscore the effectiveness of the PLU-Net in improving brain tumor segmentation from MRI scans,supporting clinicians in early diagnosis,treatment planning,and the development of new therapies.
基金supported in part by the National Natural Science Foundation of China under Grant 62201201the Foundation of Henan Educational Committee under Grant 242102211042.
文摘Segmenting skin lesions is critical for early skin cancer detection.Existing CNN and Transformer-based methods face challenges such as high computational complexity and limited adaptability to variations in lesion sizes.To overcome these limitations,we introduce MSAMamba-UNet,a lightweight model that integrates two novel architectures:Multi-Scale Mamba(MSMamba)and Adaptive Dynamic Gating Block(ADGB).MSMamba utilizes multi-scale decomposition and a parallel hierarchical structure to enhance the delineation of irregular lesion boundaries and sensitivity to small targets.ADGB dynamically selects convolutional kernels with varying receptive fields based on input features,improving the model’s capacity to accommodate diverse lesion textures and scales.Additionally,we introduce a Mix Attention Fusion Block(MAF)to enhance shallow feature representation by integrating parallel channel and pixel attention mechanisms.Extensive evaluation of MSAMamba-UNet on the ISIC 2016,ISIC 2017,and ISIC 2018 datasets demonstrates competitive segmentation accuracy with only 0.056 M parameters and 0.069 GFLOPs.Our experiments revealed that MSAMamba-UNet achieved IoU scores of 85.53%,85.47%,and 82.22%,as well as DSC scores of 92.20%,92.17%,and 90.24%,respectively.These results underscore the lightweight design and effectiveness of MSAMamba-UNet.
基金the National Natural Science Foundation of China(No.62266025)。
文摘Segmentation of the retinal vessels in the fundus is crucial for diagnosing ocular diseases.Retinal vessel images often suffer from category imbalance and large scale variations.This ultimately results in incomplete vessel segmentation and poor continuity.In this study,we propose CT-MFENet to address the aforementioned issues.First,the use of context transformer(CT)allows for the integration of contextual feature information,which helps establish the connection between pixels and solve the problem of incomplete vessel continuity.Second,multi-scale dense residual networks are used instead of traditional CNN to address the issue of inadequate local feature extraction when the model encounters vessels at multiple scales.In the decoding stage,we introduce a local-global fusion module.It enhances the localization of vascular information and reduces the semantic gap between high-and low-level features.To address the class imbalance in retinal images,we propose a hybrid loss function that enhances the segmentation ability of the model for topological structures.We conducted experiments on the publicly available DRIVE,CHASEDB1,STARE,and IOSTAR datasets.The experimental results show that our CT-MFENet performs better than most existing methods,including the baseline U-Net.
基金Open Access funding provided by the National Institutes of Health(NIH)The funding for this project was provided by NCATS Intramural Fund.
文摘Semantic segmentation plays a foundational role in biomedical image analysis, providing precise information about cellular, tissue, and organ structures in both biological and medical imaging modalities. Traditional approaches often fail in the face of challenges such as low contrast, morphological variability, and densely packed structures. Recent advancements in deep learning have transformed segmentation capabilities through the integration of fine-scale detail preservation, coarse-scale contextual modeling, and multi-scale feature fusion. This work provides a comprehensive analysis of state-of-the-art deep learning models, including U-Net variants, attention-based frameworks, and Transformer-integrated networks, highlighting innovations that improve accuracy, generalizability, and computational efficiency. Key architectural components such as convolution operations, shallow and deep blocks, skip connections, and hybrid encoders are examined for their roles in enhancing spatial representation and semantic consistency. We further discuss the importance of hierarchical and instance-aware segmentation and annotation in interpreting complex biological scenes and multiplexed medical images. By bridging methodological developments with diverse application domains, this paper outlines current trends and future directions for semantic segmentation, emphasizing its critical role in facilitating annotation, diagnosis, and discovery in biomedical research.
基金supported by Natural Science Foundation Programme of Gansu Province(No.24JRRA231)National Natural Science Foundation of China(No.62061023)Gansu Provincial Science and Technology Plan Key Research and Development Program Project(No.24YFFA024).
文摘Despite its remarkable performance on natural images,the segment anything model(SAM)lacks domain-specific information in medical imaging.and faces the challenge of losing local multi-scale information in the encoding phase.This paper presents a medical image segmentation model based on SAM with a local multi-scale feature encoder(LMSFE-SAM)to address the issues above.Firstly,based on the SAM,a local multi-scale feature encoder is introduced to improve the representation of features within local receptive field,thereby supplying the Vision Transformer(ViT)branch in SAM with enriched local multi-scale contextual information.At the same time,a multiaxial Hadamard product module(MHPM)is incorporated into the local multi-scale feature encoder in a lightweight manner to reduce the quadratic complexity and noise interference.Subsequently,a cross-branch balancing adapter is designed to balance the local and global information between the local multi-scale feature encoder and the ViT encoder in SAM.Finally,to obtain smaller input image size and to mitigate overlapping in patch embeddings,the size of the input image is reduced from 1024×1024 pixels to 256×256 pixels,and a multidimensional information adaptation component is developed,which includes feature adapters,position adapters,and channel-spatial adapters.This component effectively integrates the information from small-sized medical images into SAM,enhancing its suitability for clinical deployment.The proposed model demonstrates an average enhancement ranging from 0.0387 to 0.3191 across six objective evaluation metrics on BUSI,DDTI,and TN3K datasets compared to eight other representative image segmentation models.This significantly enhances the performance of the SAM on medical images,providing clinicians with a powerful tool in clinical diagnosis.
基金supported by the National Natural Science Foundation of China(62162007)the Guizhou Provincial Basic Research Program(Natural Science)(No.QianKeHeJiChu-ZK[2024]YiBan079).
文摘Semantic segmentation has made significant breakthroughs in various application fields,but achieving both accurate and efficient segmentation with limited computational resources remains a major challenge.To this end,we propose CGMISeg,an efficient semantic segmentation architecture based on a context-guided multi-scale interaction strategy,aiming to significantly reduce computational overhead while maintaining segmentation accuracy.CGMISeg consists of three core components:context-aware attention modulation,feature reconstruction,and crossinformation fusion.Context-aware attention modulation is carefully designed to capture key contextual information through channel and spatial attention mechanisms.The feature reconstruction module reconstructs contextual information from different scales,modeling key rectangular areas by capturing critical contextual information in both horizontal and vertical directions,thereby enhancing the focus on foreground features.The cross-information fusion module aims to fuse the reconstructed high-level features with the original low-level features during upsampling,promoting multi-scale interaction and enhancing the model’s ability to handle objects at different scales.We extensively evaluated CGMISeg on ADE20K,Cityscapes,and COCO-Stuff,three widely used datasets benchmarks,and the experimental results show that CGMISeg exhibits significant advantages in segmentation performance,computational efficiency,and inference speed,clearly outperforming several mainstream methods,including SegFormer,Feedformer,and SegNext.Specifically,CGMISeg achieves 42.9%mIoU(Mean Intersection over Union)and 15.7 FPS(Frames Per Second)on the ADE20K dataset with 3.8 GFLOPs(Giga Floating-point Operations Per Second),outperforming Feedformer and SegNeXt by 3.7%and 1.8%in mIoU,respectively,while also offering reduced computational complexity and faster inference.CGMISeg strikes an excellent balance between accuracy and efficiency,significantly enhancing both computational and inference performance while maintaining high precision,showcasing exceptional practical value and strong potential for widespread applications.
基金supported by the National Science and Technology Major Project of the Ministry of Science and Technology of China(2024ZD0608100)the National Natural Science Foundation of China(62332017,U22A2022)
文摘In high-risk industrial environments like nuclear power plants,precise defect identification and localization are essential for maintaining production stability and safety.However,the complexity of such a harsh environment leads to significant variations in the shape and size of the defects.To address this challenge,we propose the multivariate time series segmentation network(MSSN),which adopts a multiscale convolutional network with multi-stage and depth-separable convolutions for efficient feature extraction through variable-length templates.To tackle the classification difficulty caused by structural signal variance,MSSN employs logarithmic normalization to adjust instance distributions.Furthermore,it integrates classification with smoothing loss functions to accurately identify defect segments amid similar structural and defect signal subsequences.Our algorithm evaluated on both the Mackey-Glass dataset and industrial dataset achieves over 95%localization and demonstrates the capture capability on the synthetic dataset.In a nuclear plant's heat transfer tube dataset,it captures 90%of defect instances with75%middle localization F1 score.
文摘Background:Coronary artery disease(CAD)is a major global health concern requiring efficient and accurate diagnostic methods.Manual interpretation of coronary computed tomography angiography(CTA)images is time-consuming and prone to interobserver variability,underscoring the need for automated segmentation and stenosis detection tools.Methods:This study presents a hybrid multi-scale 3D segmentation framework utilizing both 3D U-Net and Enhanced 3D U-Net architectures,designed to balance computational efficiency and anatomical precision.Processed CTA images from the ImageCAS dataset underwent data standardization,normalization,and augmentation.The framework applies ensemble learning to merge coarse and fine segmentation masks,followed by advanced post-processing techniques,including connected component analysis and centerline extraction,to refine vessel delineation.Stenosis regions are detected using the Enhanced 3D U-Net and morphological operations for accurate localization.Results:The proposed pipeline achieved near-perfect segmentation accuracy(0.9993)and a Dice similarity coefficient of 0.8539 for coronary artery delineation.Precision,recall,and F1 scores for stenosis detection were 0.8418,0.8289,and 0.8397,respectively.The dual-model approach demonstrated robust performance across varied anatomical structures and effectively localized stenotic regions,indicating clear superiority over conventional models.Conclusion:This hybrid framework enables highly reliable and automated coronary artery segmentation and stenosis detection from 3D CTA images.By reducing reliance on manual interpretation and enhancing diagnostic consistency,the proposed method holds strong potential to improve clinical workflows for CAD diagnosis and management.
基金supported by the National Key R&D Program of China(No.2022YFC2504403)the National Natural Science Foundation of China(No.62172202)+1 种基金the Experiment Project of China Manned Space Program(No.HYZHXM01019)the Fundamental Research Funds for the Central Universities from Southeast University(No.3207032101C3)。
文摘Organoids possess immense potential for unraveling the intricate functions of human tissues and facilitating preclinical disease treatment.Their applications span from high-throughput drug screening to the modeling of complex diseases,with some even achieving clinical translation.Changes in the overall size,shape,boundary,and other morphological features of organoids provide a noninvasive method for assessing organoid drug sensitivity.However,the precise segmentation of organoids in bright-field microscopy images is made difficult by the complexity of the organoid morphology and interference,including overlapping organoids,bubbles,dust particles,and cell fragments.This paper introduces the precision organoid segmentation technique(POST),which is a deep-learning algorithm for segmenting challenging organoids under simple bright-field imaging conditions.Unlike existing methods,POST accurately segments each organoid and eliminates various artifacts encountered during organoid culturing and imaging.Furthermore,it is sensitive to and aligns with measurements of organoid activity in drug sensitivity experiments.POST is expected to be a valuable tool for drug screening using organoids owing to its capability of automatically and rapidly eliminating interfering substances and thereby streamlining the organoid analysis and drug screening process.
基金financially supported byChongqingUniversity of Technology Graduate Innovation Foundation(Grant No.gzlcx20253267).
文摘Camouflaged Object Detection(COD)aims to identify objects that share highly similar patterns—such as texture,intensity,and color—with their surrounding environment.Due to their intrinsic resemblance to the background,camouflaged objects often exhibit vague boundaries and varying scales,making it challenging to accurately locate targets and delineate their indistinct edges.To address this,we propose a novel camouflaged object detection network called Edge-Guided and Multi-scale Fusion Network(EGMFNet),which leverages edge-guided multi-scale integration for enhanced performance.The model incorporates two innovative components:a Multi-scale Fusion Module(MSFM)and an Edge-Guided Attention Module(EGA).These designs exploit multi-scale features to uncover subtle cues between candidate objects and the background while emphasizing camouflaged object boundaries.Moreover,recognizing the rich contextual information in fused features,we introduce a Dual-Branch Global Context Module(DGCM)to refine features using extensive global context,thereby generatingmore informative representations.Experimental results on four benchmark datasets demonstrate that EGMFNet outperforms state-of-the-art methods across five evaluation metrics.Specifically,on COD10K,our EGMFNet-P improves F_(β)by 4.8 points and reduces mean absolute error(MAE)by 0.006 compared with ZoomNeXt;on NC4K,it achieves a 3.6-point increase in F_(β).OnCAMO and CHAMELEON,it obtains 4.5-point increases in F_(β),respectively.These consistent gains substantiate the superiority and robustness of EGMFNet.
基金supported by the Guangdong Pharmaceutical University 2024 Higher Education Research Projects(GKP202403,GMP202402)the Guangdong Pharmaceutical University College Students’Innovation and Entrepreneurship Training Programs(Grant No.202504302033,202504302034,202504302036,and 202504302244).
文摘Background:Diabetic macular edema is a prevalent retinal condition and a leading cause of visual impairment among diabetic patients’Early detection of affected areas is beneficial for effective diagnosis and treatment.Traditionally,diagnosis relies on optical coherence tomography imaging technology interpreted by ophthalmologists.However,this manual image interpretation is often slow and subjective.Therefore,developing automated segmentation for macular edema images is essential to enhance to improve the diagnosis efficiency and accuracy.Methods:In order to improve clinical diagnostic efficiency and accuracy,we proposed a SegNet network structure integrated with a convolutional block attention module(CBAM).This network introduces a multi-scale input module,the CBAM attention mechanism,and jump connection.The multi-scale input module enhances the network’s perceptual capabilities,while the lightweight CBAM effectively fuses relevant features across channels and spatial dimensions,allowing for better learning of varying information levels.Results:Experimental results demonstrate that the proposed network achieves an IoU of 80.127%and an accuracy of 99.162%.Compared to the traditional segmentation network,this model has fewer parameters,faster training and testing speed,and superior performance on semantic segmentation tasks,indicating its highly practical applicability.Conclusion:The C-SegNet proposed in this study enables accurate segmentation of Diabetic macular edema lesion images,which facilitates quicker diagnosis for healthcare professionals.
基金This study was conducted within the project FraxVir“Detection,characterisation and analyses of the occurrence of viruses and ash dieback in special stands of Fraxinus excelsior-a supplementary study to the FraxForFuture demonstration project”and receives funding via the Waldklimafonds(WKF)funded by the German Federal Ministry of Food and Agriculture(BMEL)and Federal Ministry for the Environment,Nature Conservation,Nuclear Safety and Consumer Protection(BMUV)administrated by the Agency for Renewable Resources(FNR)under grant agreement 2220WK40A4.
文摘Detailed individual tree crown segmentation is highly relevant for the detection and monitoring of Fraxinus excelsior L.trees affected by ash dieback,a major threat to common ash populations across Europe.In this study,both fine and coarse crown segmentation methods were applied to close-range multispectral UAV imagery.The fine tree crown segmentation method utilized a novel unsupervised machine learning approach based on a blended NIR-NDVI image,whereas the coarse segmentation relied on the segment anything model(SAM).Both methods successfully delineated tree crown outlines,however,only the fine segmentation accurately captured internal canopy gaps.Despite these structural differences,mean NDVI values calculated per tree crown revealed no significant differences between the two approaches,indicating that coarse segmentation is sufficient for mean vegetation index assessments.Nevertheless,the fine segmentation revealed increased heterogeneity in NDVI values in more severely damaged trees,underscoring its value for detailed structural and health analyses.Furthermore,the fine segmentation workflow proved transferable to both individual UAV images and orthophotos from broader UAV surveys.For applications focused on structural integrity and spatial variation in canopy health,the fine segmentation approach is recommended.
基金funded by the National Natural Science Foundation of China,grant numbers 52374156 and 62476005。
文摘Images taken in dim environments frequently exhibit issues like insufficient brightness,noise,color shifts,and loss of detail.These problems pose significant challenges to dark image enhancement tasks.Current approaches,while effective in global illumination modeling,often struggle to simultaneously suppress noise and preserve structural details,especially under heterogeneous lighting.Furthermore,misalignment between luminance and color channels introduces additional challenges to accurate enhancement.In response to the aforementioned difficulties,we introduce a single-stage framework,M2ATNet,using the multi-scale multi-attention and Transformer architecture.First,to address the problems of texture blurring and residual noise,we design a multi-scale multi-attention denoising module(MMAD),which is applied separately to the luminance and color channels to enhance the structural and texture modeling capabilities.Secondly,to solve the non-alignment problem of the luminance and color channels,we introduce the multi-channel feature fusion Transformer(CFFT)module,which effectively recovers the dark details and corrects the color shifts through cross-channel alignment and deep feature interaction.To guide the model to learn more stably and efficiently,we also fuse multiple types of loss functions to form a hybrid loss term.We extensively evaluate the proposed method on various standard datasets,including LOL-v1,LOL-v2,DICM,LIME,and NPE.Evaluation in terms of numerical metrics and visual quality demonstrate that M2ATNet consistently outperforms existing advanced approaches.Ablation studies further confirm the critical roles played by the MMAD and CFFT modules to detail preservation and visual fidelity under challenging illumination-deficient environments.
基金Supported by the Shenzhen Science and Technology Program(No.JCYJ20240813152704006)the National Natural Science Foundation of China(No.62401259)+2 种基金the Fundamental Research Funds for the Central Universities(No.NZ2024036)the Postdoctoral Fellowship Program of CPSF(No.GZC20242228)High Performance Computing Platform of Nanjing University of Aeronautics and Astronautics。
文摘AIM:To construct an intelligent segmentation scheme for precise localization of central serous chorioretinopathy(CSC)leakage points,thereby enabling ophthalmologists to deliver accurate laser treatment without navigational laser equipment.METHODS:A dataset with dual labels(point-level and pixel-level)was first established based on fundus fluorescein angiography(FFA)images of CSC and subsequently divided into training(102 images),validation(40 images),and test(40 images)datasets.An intelligent segmentation method was then developed,based on the You Only Look Once version 8 Pose Estimation(YOLOv8-Pose)model and segment anything model(SAM),to segment CSC leakage points.Next,the YOLOv8-Pose model was trained for 200 epochs,and the best-performing model was selected to form the optimal combination with SAM.Additionally,the classic five types of U-Net series models[i.e.,U-Net,recurrent residual U-Net(R2U-Net),attention U-Net(AttU-Net),recurrent residual attention U-Net(R2AttUNet),and nested U-Net(UNet^(++))]were initialized with three random seeds and trained for 200 epochs,resulting in a total of 15 baseline models for comparison.Finally,based on the metrics including Dice similarity coefficient(DICE),intersection over union(IoU),precision,recall,precisionrecall(PR)curve,and receiver operating characteristic(ROC)curve,the proposed method was compared with baseline models through quantitative and qualitative experiments for leakage point segmentation,thereby demonstrating its effectiveness.RESULTS:With the increase of training epochs,the mAP50-95,Recall,and precision of the YOLOv8-Pose model showed a significant increase and tended to stabilize,and it achieved a preliminary localization success rate of 90%(i.e.,36 images)for CSC leakage points in 40 test images.Using manually expert-annotated pixel-level labels as the ground truth,the proposed method achieved outcomes with a DICE of 57.13%,an IoU of 45.31%,a precision of 45.91%,a recall of 93.57%,an area under the PR curve(AUC-PR)of 0.78 and an area under the ROC curve(AUC-ROC)of 0.97,which enables more accurate segmentation of CSC leakage points.CONCLUSION:By combining the precise localization capability of the YOLOv8-Pose model with the robust and flexible segmentation ability of SAM,the proposed method not only demonstrates the effectiveness of the YOLOv8-Pose model in detecting keypoint coordinates of CSC leakage points from the perspective of application innovation but also establishes a novel approach for accurate segmentation of CSC leakage points through the“detect-then-segment”strategy,thereby providing a potential auxiliary means for the automatic and precise realtime localization of leakage points during traditional laser photocoagulation for CSC.
基金supported by the Henan Province Key R&D Project under Grant 241111210400the Henan Provincial Science and Technology Research Project under Grants 252102211047,252102211062,252102211055 and 232102210069+2 种基金the Jiangsu Provincial Scheme Double Initiative Plan JSS-CBS20230474,the XJTLU RDF-21-02-008the Science and Technology Innovation Project of Zhengzhou University of Light Industry under Grant 23XNKJTD0205the Higher Education Teaching Reform Research and Practice Project of Henan Province under Grant 2024SJGLX0126。
文摘Accurate and efficient detection of building changes in remote sensing imagery is crucial for urban planning,disaster emergency response,and resource management.However,existing methods face challenges such as spectral similarity between buildings and backgrounds,sensor variations,and insufficient computational efficiency.To address these challenges,this paper proposes a novel Multi-scale Efficient Wavelet-based Change Detection Network(MewCDNet),which integrates the advantages of Convolutional Neural Networks and Transformers,balances computational costs,and achieves high-performance building change detection.The network employs EfficientNet-B4 as the backbone for hierarchical feature extraction,integrates multi-level feature maps through a multi-scale fusion strategy,and incorporates two key modules:Cross-temporal Difference Detection(CTDD)and Cross-scale Wavelet Refinement(CSWR).CTDD adopts a dual-branch architecture that combines pixel-wise differencing with semanticaware Euclidean distance weighting to enhance the distinction between true changes and background noise.CSWR integrates Haar-based Discrete Wavelet Transform with multi-head cross-attention mechanisms,enabling cross-scale feature fusion while significantly improving edge localization and suppressing spurious changes.Extensive experiments on four benchmark datasets demonstrate MewCDNet’s superiority over comparison methods:achieving F1 scores of 91.54%on LEVIR,93.70%on WHUCD,and 64.96%on S2Looking for building change detection.Furthermore,MewCDNet exhibits optimal performance on the multi-class⋅SYSU dataset(F1:82.71%),highlighting its exceptional generalization capability.
文摘Medical image segmentation is of critical importance in the domain of contemporary medical imaging.However,U-Net and its variants exhibit limitations in capturing complex nonlinear patterns and global contextual information.Although the subsequent U-KAN model enhances nonlinear representation capabilities,it still faces challenges such as gradient vanishing during deep network training and spatial detail loss during feature downsampling,resulting in insufficient segmentation accuracy for edge structures and minute lesions.To address these challenges,this paper proposes the RE-UKAN model,which innovatively improves upon U-KAN.Firstly,a residual network is introduced into the encoder to effectively mitigate gradient vanishing through cross-layer identity mappings,thus enhancing modelling capabilities for complex pathological structures.Secondly,Efficient Local Attention(ELA)is integrated to suppress spatial detail loss during downsampling,thereby improving the perception of edge structures and minute lesions.Experimental results on four public datasets demonstrate that RE-UKAN outperforms existing medical image segmentation methods across multiple evaluation metrics,with particularly outstanding performance on the TN-SCUI 2020 dataset,achieving IoU of 88.18%and Dice of 93.57%.Compared to the baseline model,it achieves improvements of 3.05%and 1.72%,respectively.These results fully demonstrate RE-UKAN’s superior detail retention capability and boundary recognition accuracy in complex medical image segmentation tasks,providing a reliable solution for clinical precision segmentation.
基金Tianmin Tianyuan Boutique Vegetable Industry Technology Service Station(Grant No.2024120011003081)Development of Environmental Monitoring and Traceability System for Wuqing Agricultural Production Areas(Grant No.2024120011001866)。
文摘Tomato is a major economic crop worldwide,and diseases on tomato leaves can significantly reduce both yield and quality.Traditional manual inspection is inefficient and highly subjective,making it difficult to meet the requirements of early disease identification in complex natural environments.To address this issue,this study proposes an improved YOLO11-based model,YOLO-SPDNet(Scale Sequence Fusion,Position-Channel Attention,and Dual Enhancement Network).The model integrates the SEAM(Self-Ensembling Attention Mechanism)semantic enhancement module,the MLCA(Mixed Local Channel Attention)lightweight attention mechanism,and the SPA(Scale-Position-Detail Awareness)module composed of SSFF(Scale Sequence Feature Fusion),TFE(Triple Feature Encoding),and CPAM(Channel and Position Attention Mechanism).These enhancements strengthen fine-grained lesion detection while maintaining model lightweightness.Experimental results show that YOLO-SPDNet achieves an accuracy of 91.8%,a recall of 86.5%,and an mAP@0.5 of 90.6%on the test set,with a computational complexity of 12.5 GFLOPs.Furthermore,the model reaches a real-time inference speed of 987 FPS,making it suitable for deployment on mobile agricultural terminals and online monitoring systems.Comparative analysis and ablation studies further validate the reliability and practical applicability of the proposed model in complex natural scenes.
文摘Weakly Supervised Semantic Segmentation(WSSS),which relies only on image-level labels,has attracted significant attention for its cost-effectiveness and scalability.Existing methods mainly enhance inter-class distinctions and employ data augmentation to mitigate semantic ambiguity and reduce spurious activations.However,they often neglect the complex contextual dependencies among image patches,resulting in incomplete local representations and limited segmentation accuracy.To address these issues,we propose the Context Patch Fusion with Class Token Enhancement(CPF-CTE)framework,which exploits contextual relations among patches to enrich feature repre-sentations and improve segmentation.At its core,the Contextual-Fusion Bidirectional Long Short-Term Memory(CF-BiLSTM)module captures spatial dependencies between patches and enables bidirectional information flow,yield-ing a more comprehensive understanding of spatial correlations.This strengthens feature learning and segmentation robustness.Moreover,we introduce learnable class tokens that dynamically encode and refine class-specific semantics,enhancing discriminative capability.By effectively integrating spatial and semantic cues,CPF-CTE produces richer and more accurate representations of image content.Extensive experiments on PASCAL VOC 2012 and MS COCO 2014 validate that CPF-CTE consistently surpasses prior WSSS methods.
文摘Distributed Denial of Service(DDoS)attacks are one of the severe threats to network infrastructure,sometimes bypassing traditional diagnosis algorithms because of their evolving complexity.PresentMachine Learning(ML)techniques for DDoS attack diagnosis normally apply network traffic statistical features such as packet sizes and inter-arrival times.However,such techniques sometimes fail to capture complicated relations among various traffic flows.In this paper,we present a new multi-scale ensemble strategy given the Graph Neural Networks(GNNs)for improving DDoS detection.Our technique divides traffic into macro-and micro-level elements,letting various GNN models to get the two corase-scale anomalies and subtle,stealthy attack models.Through modeling network traffic as graph-structured data,GNNs efficiently learn intricate relations among network entities.The proposed ensemble learning algorithm combines the results of several GNNs to improve generalization,robustness,and scalability.Extensive experiments on three benchmark datasets—UNSW-NB15,CICIDS2017,and CICDDoS2019—show that our approach outperforms traditional machine learning and deep learning models in detecting both high-rate and low-rate(stealthy)DDoS attacks,with significant improvements in accuracy and recall.These findings demonstrate the suggested method’s applicability and robustness for real-world implementation in contexts where several DDoS patterns coexist.