Research has been conducted to reduce resource consumption in 3D medical image segmentation for diverse resource-constrained environments.However,decreasing the number of parameters to enhance computational efficiency...Research has been conducted to reduce resource consumption in 3D medical image segmentation for diverse resource-constrained environments.However,decreasing the number of parameters to enhance computational efficiency can also lead to performance degradation.Moreover,these methods face challenges in balancing global and local features,increasing the risk of errors in multi-scale segmentation.This issue is particularly pronounced when segmenting small and complex structures within the human body.To address this problem,we propose a multi-stage hierarchical architecture composed of a detector and a segmentor.The detector extracts regions of interest(ROIs)in a 3D image,while the segmentor performs segmentation in the extracted ROI.Removing unnecessary areas in the detector allows the segmentation to be performed on a more compact input.The segmentor is designed with multiple stages,where each stage utilizes different input sizes.It implements a stage-skippingmechanism that deactivates certain stages using the initial input size.This approach minimizes unnecessary computations on segmenting the essential regions to reduce computational overhead.The proposed framework preserves segmentation performance while reducing resource consumption,enabling segmentation even in resource-constrained environments.展开更多
Organoids possess immense potential for unraveling the intricate functions of human tissues and facilitating preclinical disease treatment.Their applications span from high-throughput drug screening to the modeling of...Organoids possess immense potential for unraveling the intricate functions of human tissues and facilitating preclinical disease treatment.Their applications span from high-throughput drug screening to the modeling of complex diseases,with some even achieving clinical translation.Changes in the overall size,shape,boundary,and other morphological features of organoids provide a noninvasive method for assessing organoid drug sensitivity.However,the precise segmentation of organoids in bright-field microscopy images is made difficult by the complexity of the organoid morphology and interference,including overlapping organoids,bubbles,dust particles,and cell fragments.This paper introduces the precision organoid segmentation technique(POST),which is a deep-learning algorithm for segmenting challenging organoids under simple bright-field imaging conditions.Unlike existing methods,POST accurately segments each organoid and eliminates various artifacts encountered during organoid culturing and imaging.Furthermore,it is sensitive to and aligns with measurements of organoid activity in drug sensitivity experiments.POST is expected to be a valuable tool for drug screening using organoids owing to its capability of automatically and rapidly eliminating interfering substances and thereby streamlining the organoid analysis and drug screening process.展开更多
Medical image segmentation is of critical importance in the domain of contemporary medical imaging.However,U-Net and its variants exhibit limitations in capturing complex nonlinear patterns and global contextual infor...Medical image segmentation is of critical importance in the domain of contemporary medical imaging.However,U-Net and its variants exhibit limitations in capturing complex nonlinear patterns and global contextual information.Although the subsequent U-KAN model enhances nonlinear representation capabilities,it still faces challenges such as gradient vanishing during deep network training and spatial detail loss during feature downsampling,resulting in insufficient segmentation accuracy for edge structures and minute lesions.To address these challenges,this paper proposes the RE-UKAN model,which innovatively improves upon U-KAN.Firstly,a residual network is introduced into the encoder to effectively mitigate gradient vanishing through cross-layer identity mappings,thus enhancing modelling capabilities for complex pathological structures.Secondly,Efficient Local Attention(ELA)is integrated to suppress spatial detail loss during downsampling,thereby improving the perception of edge structures and minute lesions.Experimental results on four public datasets demonstrate that RE-UKAN outperforms existing medical image segmentation methods across multiple evaluation metrics,with particularly outstanding performance on the TN-SCUI 2020 dataset,achieving IoU of 88.18%and Dice of 93.57%.Compared to the baseline model,it achieves improvements of 3.05%and 1.72%,respectively.These results fully demonstrate RE-UKAN’s superior detail retention capability and boundary recognition accuracy in complex medical image segmentation tasks,providing a reliable solution for clinical precision segmentation.展开更多
Surface polaritons,as surface electromagnetic waves propagating along the surface of a medium,have played a crucial role in enhancing photonic spin Hall effect(PSHE)and developing highly sensitive refractive index(RI)...Surface polaritons,as surface electromagnetic waves propagating along the surface of a medium,have played a crucial role in enhancing photonic spin Hall effect(PSHE)and developing highly sensitive refractive index(RI)sensors.Among them,the traditional surface plasmon polariton(SPP)based on noble metals limits its application beyond the near-infrared(IR)regime due to the large negative permittivity and optical losses.In this contribution,we theoretically proposed a highly sensitive PSHE sensor with the structure of Ge prism-SiC-Si:InAs-sensing medium,by taking advantage of the hybrid surface plasmon phonon polariton(SPPhP)in mid-IR regime.Here,heavily Si-doped InAs(Si:InAs)and SiC excite the SPP and surface phonon polariton(SPhP),and the hybrid SPPhP is realized in this system.More importantly,the designed PSHE sensor based on this SPPhP mechanism achieves the multi-stage RI measurements from 1.00025-1.00225 to 1.70025-1.70225,and the maximal intensity sensitivity and angle sensitivity can be up to 9.4×10^(4)μm/RIU and245°/RIU,respectively.These findings provide a new pathway for the enhancement of PSHE in mid-IR regime,and offer new opportunities to develop highly sensitive RI sensors in multi-scenario applications,such as harmful gas monitoring and biosensing.展开更多
Quantitative analysis of aluminum-silicon(Al-Si)alloy microstructure is crucial for evaluating and controlling alloy performance.Conventional analysis methods rely on manual segmentation,which is inefficient and subje...Quantitative analysis of aluminum-silicon(Al-Si)alloy microstructure is crucial for evaluating and controlling alloy performance.Conventional analysis methods rely on manual segmentation,which is inefficient and subjective,while fully supervised deep learning approaches require extensive and expensive pixel-level annotated data.Furthermore,existing semi-supervised methods still face challenges in handling the adhesion of adjacent primary silicon particles and effectively utilizing consistency in unlabeled data.To address these issues,this paper proposes a novel semi-supervised framework for Al-Si alloy microstructure image segmentation.First,we introduce a Rotational Uncertainty Correction Strategy(RUCS).This strategy employs multi-angle rotational perturbations andMonte Carlo sampling to assess prediction consistency,generating a pixel-wise confidence weight map.By integrating this map into the loss function,the model dynamically focuses on high-confidence regions,thereby improving generalization ability while reducing manual annotation pressure.Second,we design a Boundary EnhancementModule(BEM)to strengthen boundary feature extraction through erosion difference and multi-scale dilated convolutions.This module guides the model to focus on the boundary regions of adjacent particles,effectively resolving particle adhesion and improving segmentation accuracy.Systematic experiments were conducted on the Aluminum-Silicon Alloy Microstructure Dataset(ASAD).Results indicate that the proposed method performs exceptionally well with scarce labeled data.Specifically,using only 5%labeled data,our method improves the Jaccard index and Adjusted Rand Index(ARI)by 2.84 and 1.57 percentage points,respectively,and reduces the Variation of Information(VI)by 8.65 compared to stateof-the-art semi-supervised models,approaching the performance levels of 10%labeled data.These results demonstrate that the proposed method significantly enhances the accuracy and robustness of quantitative microstructure analysis while reducing annotation costs.展开更多
This article studies the problem of image segmentation-based semantic communication in autonomous driving.In real traffic scenes,the detecting of objects(e.g.,vehicles and pedestrians)is more important to guarantee dr...This article studies the problem of image segmentation-based semantic communication in autonomous driving.In real traffic scenes,the detecting of objects(e.g.,vehicles and pedestrians)is more important to guarantee driving safety,which is always ignored in existing works.Therefore,we propose a vehicular image segmentation-oriented semantic communication system,termed VIS-SemCom,focusing on transmitting and recovering image semantic features of high-important objects to reduce transmission redundancy.First,we develop a semantic codec based on Swin Transformer architecture,which expands the perceptual field thus improving the segmentation accuracy.To highlight the important objects'accuracy,we propose a multi-scale semantic extraction method by assigning the number of Swin Transformer blocks for diverse resolution semantic features.Also,an importance-aware loss incorporating important levels is devised,and an online hard example mining(OHEM)strategy is proposed to handle small sample issues in the dataset.Finally,experimental results demonstrate that the proposed VIS-SemCom can achieve a significant mean intersection over union(mIoU)performance in the SNR regions,a reduction of transmitted data volume by about 60%at 60%mIoU,and improve the segmentation accuracy of important objects,compared to baseline image communication.展开更多
Magnetic Resonance Imaging(MRI)has a pivotal role in medical image analysis,for its ability in supporting disease detection and diagnosis.Fuzzy C-Means(FCM)clustering is widely used for MRI segmentation due to its abi...Magnetic Resonance Imaging(MRI)has a pivotal role in medical image analysis,for its ability in supporting disease detection and diagnosis.Fuzzy C-Means(FCM)clustering is widely used for MRI segmentation due to its ability to handle image uncertainty.However,the latter still has countless limitations,including sensitivity to initialization,susceptibility to local optima,and high computational cost.To address these limitations,this study integrates Grey Wolf Optimization(GWO)with FCM to enhance cluster center selection,improving segmentation accuracy and robustness.Moreover,to further refine optimization,Fuzzy Entropy Clustering was utilized for its distinctive features from other traditional objective functions.Fuzzy entropy effectively quantifies uncertainty,leading to more well-defined clusters,improved noise robustness,and better preservation of anatomical structures in MRI images.Despite these advantages,the iterative nature of GWO and FCM introduces significant computational overhead,which restricts their applicability to high-resolution medical images.To overcome this bottleneck,we propose a Parallelized-GWO-based FCM(P-GWO-FCM)approach using GPU acceleration,where both GWO optimization and FCM updates(centroid computation and membership matrix updates)are parallelized.By concurrently executing these processes,our approach efficiently distributes the computational workload,significantly reducing execution time while maintaining high segmentation accuracy.The proposed parallel method,P-GWO-FCM,was evaluated on both simulated and clinical brain MR images,focusing on segmenting white matter,gray matter,and cerebrospinal fluid regions.The results indicate significant improvements in segmentation accuracy,achieving a Jaccard Similarity(JS)of 0.92,a Partition Coefficient Index(PCI)of 0.91,a Partition Entropy Index(PEI)of 0.25,and a Davies-Bouldin Index(DBI)of 0.30.Experimental comparisons demonstrate that P-GWO-FCM outperforms existing methods in both segmentation accuracy and computational efficiency,making it a promising solution for real-time medical image segmentation.展开更多
Autonomous vehicles rely heavily on accurate and efficient scene segmentation for safe navigation and efficient operations.Traditional Bird’s Eye View(BEV)methods on semantic scene segmentation,which leverage multimo...Autonomous vehicles rely heavily on accurate and efficient scene segmentation for safe navigation and efficient operations.Traditional Bird’s Eye View(BEV)methods on semantic scene segmentation,which leverage multimodal sensor fusion,often struggle with noisy data and demand high-performance GPUs,leading to sensor misalignment and performance degradation.This paper introduces an Enhanced Channel Attention BEV(ECABEV),a novel approach designed to address the challenges under insufficient GPU memory conditions.ECABEV integrates camera and radar data through a de-noise enhanced channel attention mechanism,which utilizes global average and max pooling to effectively filter out noise while preserving discriminative features.Furthermore,an improved fusion approach is proposed to efficiently merge categorical data across modalities.To reduce computational overhead,a bilinear interpolation layer normalizationmethod is devised to ensure spatial feature fidelity.Moreover,a scalable crossentropy loss function is further designed to handle the imbalanced classes with less computational efficiency sacrifice.Extensive experiments on the nuScenes dataset demonstrate that ECABEV achieves state-of-the-art performance with an IoU of 39.961,using a lightweight ViT-B/14 backbone and lower resolution(224×224).Our approach highlights its cost-effectiveness and practical applicability,even on low-end devices.The code is publicly available at:https://github.com/YYF-CQU/ECABEV.git.展开更多
This study aimed to enhance the performance of semantic segmentation for autonomous driving by improving the 2DPASS model.Two novel improvements were proposed and implemented in this paper:dynamically adjusting the lo...This study aimed to enhance the performance of semantic segmentation for autonomous driving by improving the 2DPASS model.Two novel improvements were proposed and implemented in this paper:dynamically adjusting the loss function ratio and integrating an attention mechanism(CBAM).First,the loss function weights were adjusted dynamically.The grid search method is used for deciding the best ratio of 7:3.It gives greater emphasis to the cross-entropy loss,which resulted in better segmentation performance.Second,CBAM was applied at different layers of the 2Dencoder.Heatmap analysis revealed that introducing it after the second block of 2D image encoding produced the most effective enhancement of important feature representation.The training epoch was chosen for optimizing the best value by experiments,which improved model convergence and overall accuracy.To evaluate the proposed approach,experiments were conducted based on the SemanticKITTI database.The results showed that the improved model achieved higher segmentation accuracy by 64.31%,improved 11.47% in mIoU compared with the conventional 2DPASS model(baseline:52.84%).It was more effective at detecting small and distant objects and clearly identifying boundaries between different classes.Issues such as noise and variations in data distribution affected its accuracy,indicating the need for further refinement.Overall,the proposed improvements to the 2DPASS model demonstrated the potential to advance semantic segmentation technology and contributed to a more reliable perception of complex,dynamic environments in autonomous vehicles.Accurate segmentation enhances the vehicle’s ability to distinguish different objects,and this improvement directly supports safer navigation,robust decision-making,and efficient path planning,making it highly applicable to real-world deployment of autonomous systems in urban and highway settings.展开更多
High-resolution remote sensing images(HRSIs)are now an essential data source for gathering surface information due to advancements in remote sensing data capture technologies.However,their significant scale changes an...High-resolution remote sensing images(HRSIs)are now an essential data source for gathering surface information due to advancements in remote sensing data capture technologies.However,their significant scale changes and wealth of spatial details pose challenges for semantic segmentation.While convolutional neural networks(CNNs)excel at capturing local features,they are limited in modeling long-range dependencies.Conversely,transformers utilize multihead self-attention to integrate global context effectively,but this approach often incurs a high computational cost.This paper proposes a global-local multiscale context network(GLMCNet)to extract both global and local multiscale contextual information from HRSIs.A detail-enhanced filtering module(DEFM)is proposed at the end of the encoder to refine the encoder outputs further,thereby enhancing the key details extracted by the encoder and effectively suppressing redundant information.In addition,a global-local multiscale transformer block(GLMTB)is proposed in the decoding stage to enable the modeling of rich multiscale global and local information.We also design a stair fusion mechanism to transmit deep semantic information from deep to shallow layers progressively.Finally,we propose the semantic awareness enhancement module(SAEM),which further enhances the representation of multiscale semantic features through spatial attention and covariance channel attention.Extensive ablation analyses and comparative experiments were conducted to evaluate the performance of the proposed method.Specifically,our method achieved a mean Intersection over Union(mIoU)of 86.89%on the ISPRS Potsdam dataset and 84.34%on the ISPRS Vaihingen dataset,outperforming existing models such as ABCNet and BANet.展开更多
Advanced traffic monitoring systems encounter substantial challenges in vehicle detection and classification due to the limitations of conventional methods,which often demand extensive computational resources and stru...Advanced traffic monitoring systems encounter substantial challenges in vehicle detection and classification due to the limitations of conventional methods,which often demand extensive computational resources and struggle with diverse data acquisition techniques.This research presents a novel approach for vehicle classification and recognition in aerial image sequences,integrating multiple advanced techniques to enhance detection accuracy.The proposed model begins with preprocessing using Multiscale Retinex(MSR)to enhance image quality,followed by Expectation-Maximization(EM)Segmentation for precise foreground object identification.Vehicle detection is performed using the state-of-the-art YOLOv10 framework,while feature extraction incorporates Maximally Stable Extremal Regions(MSER),Dense Scale-Invariant Feature Transform(Dense SIFT),and Zernike Moments Features to capture distinct object characteristics.Feature optimization is further refined through a Hybrid Swarm-based Optimization algorithm,ensuring optimal feature selection for improved classification performance.The final classification is conducted using a Vision Transformer,leveraging its robust learning capabilities for enhanced accuracy.Experimental evaluations on benchmark datasets,including UAVDT and the Unmanned Aerial Vehicle Intruder Dataset(UAVID),demonstrate the superiority of the proposed approach,achieving an accuracy of 94.40%on UAVDT and 93.57%on UAVID.The results highlight the efficacy of the model in significantly enhancing vehicle detection and classification in aerial imagery,outperforming existing methodologies and offering a statistically validated improvement for intelligent traffic monitoring systems compared to existing approaches.展开更多
Weakly supervised semantic segmentation(WSSS)is a tricky task,which only provides category information for segmentation prediction.Thus,the key stage of WSSS is to generate the pseudo labels.For convolutional neural n...Weakly supervised semantic segmentation(WSSS)is a tricky task,which only provides category information for segmentation prediction.Thus,the key stage of WSSS is to generate the pseudo labels.For convolutional neural network(CNN)based methods,in which class activation mapping(CAM)is proposed to obtain the pseudo labels,and only concentrates on the most discriminative parts.Recently,transformer-based methods utilize attention map from the multi-headed self-attention(MHSA)module to predict pseudo labels,which usually contain obvious background noise and incoherent object area.To solve the above problems,we use the Conformer as our backbone,which is a parallel network based on convolutional neural network(CNN)and Transformer.The two branches generate pseudo labels and refine them independently,and can effectively combine the advantages of CNN and Transformer.However,the parallel structure is not close enough in the information communication.Thus,parallel structure can result in poor details about pseudo labels,and the background noise still exists.To alleviate this problem,we propose enhancing convolution CAM(ECCAM)model,which have three improved modules based on enhancing convolution,including deeper stem(DStem),convolutional feed-forward network(CFFN)and feature coupling unit with convolution(FCUConv).The ECCAM could make Conformer have tighter interaction between CNN and Transformer branches.After experimental verification,the improved modules we propose can help the network perceive more local information from images,making the final segmentation results more refined.Compared with similar architecture,our modules greatly improve the semantic segmentation performance and achieve70.2%mean intersection over union(mIoU)on the PASCAL VOC 2012 dataset.展开更多
Accurate segmentation of breast cancer in mammogram images plays a critical role in early diagnosis and treatment planning.As research in this domain continues to expand,various segmentation techniques have been propo...Accurate segmentation of breast cancer in mammogram images plays a critical role in early diagnosis and treatment planning.As research in this domain continues to expand,various segmentation techniques have been proposed across classical image processing,machine learning(ML),deep learning(DL),and hybrid/ensemble models.This study conducts a systematic literature review using the PRISMA methodology,analyzing 57 selected articles to explore how these methods have evolved and been applied.The review highlights the strengths and limitations of each approach,identifies commonly used public datasets,and observes emerging trends in model integration and clinical relevance.By synthesizing current findings,this work provides a structured overview of segmentation strategies and outlines key considerations for developing more adaptable and explainable tools for breast cancer detection.Overall,our synthesis suggests that classical and ML methods are suitable for limited labels and computing resources,while DL models are preferable when pixel-level annotations and resources are available,and hybrid pipelines are most appropriate when fine-grained clinical precision is required.展开更多
Background:Diabetic macular edema is a prevalent retinal condition and a leading cause of visual impairment among diabetic patients’Early detection of affected areas is beneficial for effective diagnosis and treatmen...Background:Diabetic macular edema is a prevalent retinal condition and a leading cause of visual impairment among diabetic patients’Early detection of affected areas is beneficial for effective diagnosis and treatment.Traditionally,diagnosis relies on optical coherence tomography imaging technology interpreted by ophthalmologists.However,this manual image interpretation is often slow and subjective.Therefore,developing automated segmentation for macular edema images is essential to enhance to improve the diagnosis efficiency and accuracy.Methods:In order to improve clinical diagnostic efficiency and accuracy,we proposed a SegNet network structure integrated with a convolutional block attention module(CBAM).This network introduces a multi-scale input module,the CBAM attention mechanism,and jump connection.The multi-scale input module enhances the network’s perceptual capabilities,while the lightweight CBAM effectively fuses relevant features across channels and spatial dimensions,allowing for better learning of varying information levels.Results:Experimental results demonstrate that the proposed network achieves an IoU of 80.127%and an accuracy of 99.162%.Compared to the traditional segmentation network,this model has fewer parameters,faster training and testing speed,and superior performance on semantic segmentation tasks,indicating its highly practical applicability.Conclusion:The C-SegNet proposed in this study enables accurate segmentation of Diabetic macular edema lesion images,which facilitates quicker diagnosis for healthcare professionals.展开更多
In image analysis,high-precision semantic segmentation predominantly relies on supervised learning.Despite significant advancements driven by deep learning techniques,challenges such as class imbalance and dynamic per...In image analysis,high-precision semantic segmentation predominantly relies on supervised learning.Despite significant advancements driven by deep learning techniques,challenges such as class imbalance and dynamic performance evaluation persist.Traditional weighting methods,often based on pre-statistical class counting,tend to overemphasize certain classes while neglecting others,particularly rare sample categories.Approaches like focal loss and other rare-sample segmentation techniques introduce multiple hyperparameters that require manual tuning,leading to increased experimental costs due to their instability.This paper proposes a novel CAWASeg framework to address these limitations.Our approach leverages Grad-CAM technology to generate class activation maps,identifying key feature regions that the model focuses on during decision-making.We introduce a Comprehensive Segmentation Performance Score(CSPS)to dynamically evaluate model performance by converting these activation maps into pseudo mask and comparing them with Ground Truth.Additionally,we design two adaptive weights for each class:a Basic Weight(BW)and a Ratio Weight(RW),which the model adjusts during training based on real-time feedback.Extensive experiments on the COCO-Stuff,CityScapes,and ADE20k datasets demonstrate that our CAWASeg framework significantly improves segmentation performance for rare sample categories while enhancing overall segmentation accuracy.The proposed method offers a robust and efficient solution for addressing class imbalance in semantic segmentation tasks.展开更多
This systematic review aims to comprehensively examine and compare deep learning methods for brain tumor segmentation and classification using MRI and other imaging modalities,focusing on recent trends from 2022 to 20...This systematic review aims to comprehensively examine and compare deep learning methods for brain tumor segmentation and classification using MRI and other imaging modalities,focusing on recent trends from 2022 to 2025.The primary objective is to evaluate methodological advancements,model performance,dataset usage,and existing challenges in developing clinically robust AI systems.We included peer-reviewed journal articles and highimpact conference papers published between 2022 and 2025,written in English,that proposed or evaluated deep learning methods for brain tumor segmentation and/or classification.Excluded were non-open-access publications,books,and non-English articles.A structured search was conducted across Scopus,Google Scholar,Wiley,and Taylor&Francis,with the last search performed in August 2025.Risk of bias was not formally quantified but considered during full-text screening based on dataset diversity,validation methods,and availability of performance metrics.We used narrative synthesis and tabular benchmarking to compare performance metrics(e.g.,accuracy,Dice score)across model types(CNN,Transformer,Hybrid),imaging modalities,and datasets.A total of 49 studies were included(43 journal articles and 6 conference papers).These studies spanned over 9 public datasets(e.g.,BraTS,Figshare,REMBRANDT,MOLAB)and utilized a range of imaging modalities,predominantly MRI.Hybrid models,especially ResViT and UNetFormer,consistently achieved high performance,with classification accuracy exceeding 98%and segmentation Dice scores above 0.90 across multiple studies.Transformers and hybrid architectures showed increasing adoption post2023.Many studies lacked external validation and were evaluated only on a few benchmark datasets,raising concerns about generalizability and dataset bias.Few studies addressed clinical interpretability or uncertainty quantification.Despite promising results,particularly for hybrid deep learning models,widespread clinical adoption remains limited due to lack of validation,interpretability concerns,and real-world deployment barriers.展开更多
Inspections of power transmission lines(PTLs)conducted using unmanned aerial vehicles(UAVs)are complicated by the fine structure of the lines and complex backgrounds,making accurate and efficient segmentation challeng...Inspections of power transmission lines(PTLs)conducted using unmanned aerial vehicles(UAVs)are complicated by the fine structure of the lines and complex backgrounds,making accurate and efficient segmentation challenging.This study presents the Wavelet-Guided Transformer U-Net(WGT-UNet)model,a new hybrid net-work that combines Convolutional Neural Networks(CNNs),Discrete Wavelet Transform(DWT),and Transformer architectures.The model’s primary contribution is based on spatial and channel attention mechanisms derived from wavelet subbands to guide the Transformer’s self-attention structure.Thus,low and high frequency components are separated at each stage using DWT,suppressing structural noise and making linear objects more prominent.The developed design is supported by multi-component hybrid cost functions that simultaneously solve class imbalance,edge sharpness,structural integrity,and spatial regularity issues.Furthermore,high segmentation success has been achieved in producing sharp boundaries and continuous line structures with the DWT-guided attention mechanism.Experiments conducted on the TTPLA dataset reveal that the version using the ConvNeXt backbone outperforms the current state-of-the-art approaches with an F1-Score of 79.33%and an Intersection over Union(IoU)value of 68.38%.The models and visual outputs of the developed method and all compared models can be accessed at https://github.com/burhanbarakli/WGT-UNET.展开更多
Satellite image segmentation plays a crucial role in remote sensing,supporting applications such as environmental monitoring,land use analysis,and disaster management.However,traditional segmentation methods often rel...Satellite image segmentation plays a crucial role in remote sensing,supporting applications such as environmental monitoring,land use analysis,and disaster management.However,traditional segmentation methods often rely on large amounts of labeled data,which are costly and time-consuming to obtain,especially in largescale or dynamic environments.To address this challenge,we propose the Semi-Supervised Multi-View Picture Fuzzy Clustering(SS-MPFC)algorithm,which improves segmentation accuracy and robustness,particularly in complex and uncertain remote sensing scenarios.SS-MPFC unifies three paradigms:semi-supervised learning,multi-view clustering,and picture fuzzy set theory.This integration allows the model to effectively utilize a small number of labeled samples,fuse complementary information from multiple data views,and handle the ambiguity and uncertainty inherent in satellite imagery.We design a novel objective function that jointly incorporates picture fuzzy membership functions across multiple views of the data,and embeds pairwise semi-supervised constraints(must-link and cannot-link)directly into the clustering process to enhance segmentation accuracy.Experiments conducted on several benchmark satellite datasets demonstrate that SS-MPFC significantly outperforms existing state-of-the-art methods in segmentation accuracy,noise robustness,and semantic interpretability.On the Augsburg dataset,SS-MPFC achieves a Purity of 0.8158 and an Accuracy of 0.6860,highlighting its outstanding robustness and efficiency.These results demonstrate that SSMPFC offers a scalable and effective solution for real-world satellite-based monitoring systems,particularly in scenarios where rapid annotation is infeasible,such as wildfire tracking,agricultural monitoring,and dynamic urban mapping.展开更多
3D laser scanning technology is widely used in underground openings for high-precision,rapid,and nondestructive structural evaluations.Segmenting large 3D point cloud datasets,particularly in coal mine roadways with m...3D laser scanning technology is widely used in underground openings for high-precision,rapid,and nondestructive structural evaluations.Segmenting large 3D point cloud datasets,particularly in coal mine roadways with multi-scale targets,remains challenging.This paper proposes an enhanced segmentation method integrating improved PointNet++with a coverage-voted strategy.The coverage-voted strategy reduces data while preserving multi-scale target topology.The segmentation is achieved using an enhanced PointNet++algorithm with a normalization preprocessing head,resulting in a 94%accuracy for common supporting components.Ablation experiments show that the preprocessing head and coverage strategies increase segmentation accuracy by 20%and 2%,respectively,and improve Intersection over Union(IoU)for bearing plate segmentation by 58%and 20%.The accuracy of the current pretraining segmentation model may be affected by variations in surface support components,but it can be readily enhanced through re-optimization with additional labeled point cloud data.This proposed method,combined with a previously developed machine learning model that links rock bolt load and the deformation field of its bearing plate,provides a robust technique for simultaneously measuring the load of multiple rock bolts in a single laser scan.展开更多
The accurate segmentation of deep gray matter nuclei is critical for neuropathological research,disease diagnosis and treatment.Existing methods employ the supervised learning training approach,which requires large la...The accurate segmentation of deep gray matter nuclei is critical for neuropathological research,disease diagnosis and treatment.Existing methods employ the supervised learning training approach,which requires large labeled datasets.It is challenging and time-consuming to obtain such datasets for medical image analysis.In addition,these methods based on convolutional neural networks(CNNs)only achieve suboptimal performance due to the locality of convolutional operations.Vision Transformers(ViTs)efficiently model long-range dependencies and thus have the potentiality to outperform these methods in segmentation tasks.To address these issues,we propose a novel hybrid network based on self-supervised pre-training for deep gray matter nuclei segmentation.Specifically,we present a CNN-Transformer hybrid network(CTNet),whose encoder consists of 3D CNN and ViT to learn local spatial-detailed features and global semantic information.A self-supervised learning(SSL)approach that integrates rotation prediction and masked feature reconstruction is proposed to pre-train the CTNet,enabling the model to learn valuable visual representations from unlabeled data.We evaluate the effectiveness of our method on 3T and 7T human brain MRI datasets.The results demonstrate that our CTNet achieves better performance than other comparison models and our pre-training strategy outperforms other advanced self-supervised methods.When the training set has only one sample,our pre-trained CTNet enhances segmentation performance,showing an 8.4%improvement in Dice similarity coefficient(DSC)compared to the randomly initialized CTNet.展开更多
文摘Research has been conducted to reduce resource consumption in 3D medical image segmentation for diverse resource-constrained environments.However,decreasing the number of parameters to enhance computational efficiency can also lead to performance degradation.Moreover,these methods face challenges in balancing global and local features,increasing the risk of errors in multi-scale segmentation.This issue is particularly pronounced when segmenting small and complex structures within the human body.To address this problem,we propose a multi-stage hierarchical architecture composed of a detector and a segmentor.The detector extracts regions of interest(ROIs)in a 3D image,while the segmentor performs segmentation in the extracted ROI.Removing unnecessary areas in the detector allows the segmentation to be performed on a more compact input.The segmentor is designed with multiple stages,where each stage utilizes different input sizes.It implements a stage-skippingmechanism that deactivates certain stages using the initial input size.This approach minimizes unnecessary computations on segmenting the essential regions to reduce computational overhead.The proposed framework preserves segmentation performance while reducing resource consumption,enabling segmentation even in resource-constrained environments.
基金supported by the National Key R&D Program of China(No.2022YFC2504403)the National Natural Science Foundation of China(No.62172202)+1 种基金the Experiment Project of China Manned Space Program(No.HYZHXM01019)the Fundamental Research Funds for the Central Universities from Southeast University(No.3207032101C3)。
文摘Organoids possess immense potential for unraveling the intricate functions of human tissues and facilitating preclinical disease treatment.Their applications span from high-throughput drug screening to the modeling of complex diseases,with some even achieving clinical translation.Changes in the overall size,shape,boundary,and other morphological features of organoids provide a noninvasive method for assessing organoid drug sensitivity.However,the precise segmentation of organoids in bright-field microscopy images is made difficult by the complexity of the organoid morphology and interference,including overlapping organoids,bubbles,dust particles,and cell fragments.This paper introduces the precision organoid segmentation technique(POST),which is a deep-learning algorithm for segmenting challenging organoids under simple bright-field imaging conditions.Unlike existing methods,POST accurately segments each organoid and eliminates various artifacts encountered during organoid culturing and imaging.Furthermore,it is sensitive to and aligns with measurements of organoid activity in drug sensitivity experiments.POST is expected to be a valuable tool for drug screening using organoids owing to its capability of automatically and rapidly eliminating interfering substances and thereby streamlining the organoid analysis and drug screening process.
文摘Medical image segmentation is of critical importance in the domain of contemporary medical imaging.However,U-Net and its variants exhibit limitations in capturing complex nonlinear patterns and global contextual information.Although the subsequent U-KAN model enhances nonlinear representation capabilities,it still faces challenges such as gradient vanishing during deep network training and spatial detail loss during feature downsampling,resulting in insufficient segmentation accuracy for edge structures and minute lesions.To address these challenges,this paper proposes the RE-UKAN model,which innovatively improves upon U-KAN.Firstly,a residual network is introduced into the encoder to effectively mitigate gradient vanishing through cross-layer identity mappings,thus enhancing modelling capabilities for complex pathological structures.Secondly,Efficient Local Attention(ELA)is integrated to suppress spatial detail loss during downsampling,thereby improving the perception of edge structures and minute lesions.Experimental results on four public datasets demonstrate that RE-UKAN outperforms existing medical image segmentation methods across multiple evaluation metrics,with particularly outstanding performance on the TN-SCUI 2020 dataset,achieving IoU of 88.18%and Dice of 93.57%.Compared to the baseline model,it achieves improvements of 3.05%and 1.72%,respectively.These results fully demonstrate RE-UKAN’s superior detail retention capability and boundary recognition accuracy in complex medical image segmentation tasks,providing a reliable solution for clinical precision segmentation.
基金Project supported by the National Natural Science Foundation of China(Grant No.12175107)the Qing Lan Project of Jiangsu Province+2 种基金the Hua Li Talents Program of Nanjing University of PostsTelecommunications,Natural Science Foundation of Nanjing Vocational University of Industry Technology(Grant No.YK22-02-08)the Fund from the Research Center of Industrial Perception and Intelligent Manufacturing Equipment Engineering of Jiangsu Province,China(Grant No.ZK21-05-09)。
文摘Surface polaritons,as surface electromagnetic waves propagating along the surface of a medium,have played a crucial role in enhancing photonic spin Hall effect(PSHE)and developing highly sensitive refractive index(RI)sensors.Among them,the traditional surface plasmon polariton(SPP)based on noble metals limits its application beyond the near-infrared(IR)regime due to the large negative permittivity and optical losses.In this contribution,we theoretically proposed a highly sensitive PSHE sensor with the structure of Ge prism-SiC-Si:InAs-sensing medium,by taking advantage of the hybrid surface plasmon phonon polariton(SPPhP)in mid-IR regime.Here,heavily Si-doped InAs(Si:InAs)and SiC excite the SPP and surface phonon polariton(SPhP),and the hybrid SPPhP is realized in this system.More importantly,the designed PSHE sensor based on this SPPhP mechanism achieves the multi-stage RI measurements from 1.00025-1.00225 to 1.70025-1.70225,and the maximal intensity sensitivity and angle sensitivity can be up to 9.4×10^(4)μm/RIU and245°/RIU,respectively.These findings provide a new pathway for the enhancement of PSHE in mid-IR regime,and offer new opportunities to develop highly sensitive RI sensors in multi-scenario applications,such as harmful gas monitoring and biosensing.
基金funded by the National Natural Science Foundation of China (52061020).
文摘Quantitative analysis of aluminum-silicon(Al-Si)alloy microstructure is crucial for evaluating and controlling alloy performance.Conventional analysis methods rely on manual segmentation,which is inefficient and subjective,while fully supervised deep learning approaches require extensive and expensive pixel-level annotated data.Furthermore,existing semi-supervised methods still face challenges in handling the adhesion of adjacent primary silicon particles and effectively utilizing consistency in unlabeled data.To address these issues,this paper proposes a novel semi-supervised framework for Al-Si alloy microstructure image segmentation.First,we introduce a Rotational Uncertainty Correction Strategy(RUCS).This strategy employs multi-angle rotational perturbations andMonte Carlo sampling to assess prediction consistency,generating a pixel-wise confidence weight map.By integrating this map into the loss function,the model dynamically focuses on high-confidence regions,thereby improving generalization ability while reducing manual annotation pressure.Second,we design a Boundary EnhancementModule(BEM)to strengthen boundary feature extraction through erosion difference and multi-scale dilated convolutions.This module guides the model to focus on the boundary regions of adjacent particles,effectively resolving particle adhesion and improving segmentation accuracy.Systematic experiments were conducted on the Aluminum-Silicon Alloy Microstructure Dataset(ASAD).Results indicate that the proposed method performs exceptionally well with scarce labeled data.Specifically,using only 5%labeled data,our method improves the Jaccard index and Adjusted Rand Index(ARI)by 2.84 and 1.57 percentage points,respectively,and reduces the Variation of Information(VI)by 8.65 compared to stateof-the-art semi-supervised models,approaching the performance levels of 10%labeled data.These results demonstrate that the proposed method significantly enhances the accuracy and robustness of quantitative microstructure analysis while reducing annotation costs.
基金National Natural Science Foundation of China under Grants No.62171047,U22B2001,62271065,62001051Beijing Natural Science Foundation under Grant L223027BUPT Excellent Ph.D Students Foundation under Grants CX2021114。
文摘This article studies the problem of image segmentation-based semantic communication in autonomous driving.In real traffic scenes,the detecting of objects(e.g.,vehicles and pedestrians)is more important to guarantee driving safety,which is always ignored in existing works.Therefore,we propose a vehicular image segmentation-oriented semantic communication system,termed VIS-SemCom,focusing on transmitting and recovering image semantic features of high-important objects to reduce transmission redundancy.First,we develop a semantic codec based on Swin Transformer architecture,which expands the perceptual field thus improving the segmentation accuracy.To highlight the important objects'accuracy,we propose a multi-scale semantic extraction method by assigning the number of Swin Transformer blocks for diverse resolution semantic features.Also,an importance-aware loss incorporating important levels is devised,and an online hard example mining(OHEM)strategy is proposed to handle small sample issues in the dataset.Finally,experimental results demonstrate that the proposed VIS-SemCom can achieve a significant mean intersection over union(mIoU)performance in the SNR regions,a reduction of transmitted data volume by about 60%at 60%mIoU,and improve the segmentation accuracy of important objects,compared to baseline image communication.
文摘Magnetic Resonance Imaging(MRI)has a pivotal role in medical image analysis,for its ability in supporting disease detection and diagnosis.Fuzzy C-Means(FCM)clustering is widely used for MRI segmentation due to its ability to handle image uncertainty.However,the latter still has countless limitations,including sensitivity to initialization,susceptibility to local optima,and high computational cost.To address these limitations,this study integrates Grey Wolf Optimization(GWO)with FCM to enhance cluster center selection,improving segmentation accuracy and robustness.Moreover,to further refine optimization,Fuzzy Entropy Clustering was utilized for its distinctive features from other traditional objective functions.Fuzzy entropy effectively quantifies uncertainty,leading to more well-defined clusters,improved noise robustness,and better preservation of anatomical structures in MRI images.Despite these advantages,the iterative nature of GWO and FCM introduces significant computational overhead,which restricts their applicability to high-resolution medical images.To overcome this bottleneck,we propose a Parallelized-GWO-based FCM(P-GWO-FCM)approach using GPU acceleration,where both GWO optimization and FCM updates(centroid computation and membership matrix updates)are parallelized.By concurrently executing these processes,our approach efficiently distributes the computational workload,significantly reducing execution time while maintaining high segmentation accuracy.The proposed parallel method,P-GWO-FCM,was evaluated on both simulated and clinical brain MR images,focusing on segmenting white matter,gray matter,and cerebrospinal fluid regions.The results indicate significant improvements in segmentation accuracy,achieving a Jaccard Similarity(JS)of 0.92,a Partition Coefficient Index(PCI)of 0.91,a Partition Entropy Index(PEI)of 0.25,and a Davies-Bouldin Index(DBI)of 0.30.Experimental comparisons demonstrate that P-GWO-FCM outperforms existing methods in both segmentation accuracy and computational efficiency,making it a promising solution for real-time medical image segmentation.
基金funded by the National Natural Science Foundation of China,grant number 62262045the Fundamental Research Funds for the Central Universities,grant number 2023CDJYGRH-YB11the Open Funding of SUGON Industrial Control and Security Center,grant number CUIT-SICSC-2025-03.
文摘Autonomous vehicles rely heavily on accurate and efficient scene segmentation for safe navigation and efficient operations.Traditional Bird’s Eye View(BEV)methods on semantic scene segmentation,which leverage multimodal sensor fusion,often struggle with noisy data and demand high-performance GPUs,leading to sensor misalignment and performance degradation.This paper introduces an Enhanced Channel Attention BEV(ECABEV),a novel approach designed to address the challenges under insufficient GPU memory conditions.ECABEV integrates camera and radar data through a de-noise enhanced channel attention mechanism,which utilizes global average and max pooling to effectively filter out noise while preserving discriminative features.Furthermore,an improved fusion approach is proposed to efficiently merge categorical data across modalities.To reduce computational overhead,a bilinear interpolation layer normalizationmethod is devised to ensure spatial feature fidelity.Moreover,a scalable crossentropy loss function is further designed to handle the imbalanced classes with less computational efficiency sacrifice.Extensive experiments on the nuScenes dataset demonstrate that ECABEV achieves state-of-the-art performance with an IoU of 39.961,using a lightweight ViT-B/14 backbone and lower resolution(224×224).Our approach highlights its cost-effectiveness and practical applicability,even on low-end devices.The code is publicly available at:https://github.com/YYF-CQU/ECABEV.git.
文摘This study aimed to enhance the performance of semantic segmentation for autonomous driving by improving the 2DPASS model.Two novel improvements were proposed and implemented in this paper:dynamically adjusting the loss function ratio and integrating an attention mechanism(CBAM).First,the loss function weights were adjusted dynamically.The grid search method is used for deciding the best ratio of 7:3.It gives greater emphasis to the cross-entropy loss,which resulted in better segmentation performance.Second,CBAM was applied at different layers of the 2Dencoder.Heatmap analysis revealed that introducing it after the second block of 2D image encoding produced the most effective enhancement of important feature representation.The training epoch was chosen for optimizing the best value by experiments,which improved model convergence and overall accuracy.To evaluate the proposed approach,experiments were conducted based on the SemanticKITTI database.The results showed that the improved model achieved higher segmentation accuracy by 64.31%,improved 11.47% in mIoU compared with the conventional 2DPASS model(baseline:52.84%).It was more effective at detecting small and distant objects and clearly identifying boundaries between different classes.Issues such as noise and variations in data distribution affected its accuracy,indicating the need for further refinement.Overall,the proposed improvements to the 2DPASS model demonstrated the potential to advance semantic segmentation technology and contributed to a more reliable perception of complex,dynamic environments in autonomous vehicles.Accurate segmentation enhances the vehicle’s ability to distinguish different objects,and this improvement directly supports safer navigation,robust decision-making,and efficient path planning,making it highly applicable to real-world deployment of autonomous systems in urban and highway settings.
基金provided by the Science Research Project of Hebei Education Department under grant No.BJK2024115.
文摘High-resolution remote sensing images(HRSIs)are now an essential data source for gathering surface information due to advancements in remote sensing data capture technologies.However,their significant scale changes and wealth of spatial details pose challenges for semantic segmentation.While convolutional neural networks(CNNs)excel at capturing local features,they are limited in modeling long-range dependencies.Conversely,transformers utilize multihead self-attention to integrate global context effectively,but this approach often incurs a high computational cost.This paper proposes a global-local multiscale context network(GLMCNet)to extract both global and local multiscale contextual information from HRSIs.A detail-enhanced filtering module(DEFM)is proposed at the end of the encoder to refine the encoder outputs further,thereby enhancing the key details extracted by the encoder and effectively suppressing redundant information.In addition,a global-local multiscale transformer block(GLMTB)is proposed in the decoding stage to enable the modeling of rich multiscale global and local information.We also design a stair fusion mechanism to transmit deep semantic information from deep to shallow layers progressively.Finally,we propose the semantic awareness enhancement module(SAEM),which further enhances the representation of multiscale semantic features through spatial attention and covariance channel attention.Extensive ablation analyses and comparative experiments were conducted to evaluate the performance of the proposed method.Specifically,our method achieved a mean Intersection over Union(mIoU)of 86.89%on the ISPRS Potsdam dataset and 84.34%on the ISPRS Vaihingen dataset,outperforming existing models such as ABCNet and BANet.
文摘Advanced traffic monitoring systems encounter substantial challenges in vehicle detection and classification due to the limitations of conventional methods,which often demand extensive computational resources and struggle with diverse data acquisition techniques.This research presents a novel approach for vehicle classification and recognition in aerial image sequences,integrating multiple advanced techniques to enhance detection accuracy.The proposed model begins with preprocessing using Multiscale Retinex(MSR)to enhance image quality,followed by Expectation-Maximization(EM)Segmentation for precise foreground object identification.Vehicle detection is performed using the state-of-the-art YOLOv10 framework,while feature extraction incorporates Maximally Stable Extremal Regions(MSER),Dense Scale-Invariant Feature Transform(Dense SIFT),and Zernike Moments Features to capture distinct object characteristics.Feature optimization is further refined through a Hybrid Swarm-based Optimization algorithm,ensuring optimal feature selection for improved classification performance.The final classification is conducted using a Vision Transformer,leveraging its robust learning capabilities for enhanced accuracy.Experimental evaluations on benchmark datasets,including UAVDT and the Unmanned Aerial Vehicle Intruder Dataset(UAVID),demonstrate the superiority of the proposed approach,achieving an accuracy of 94.40%on UAVDT and 93.57%on UAVID.The results highlight the efficacy of the model in significantly enhancing vehicle detection and classification in aerial imagery,outperforming existing methodologies and offering a statistically validated improvement for intelligent traffic monitoring systems compared to existing approaches.
文摘Weakly supervised semantic segmentation(WSSS)is a tricky task,which only provides category information for segmentation prediction.Thus,the key stage of WSSS is to generate the pseudo labels.For convolutional neural network(CNN)based methods,in which class activation mapping(CAM)is proposed to obtain the pseudo labels,and only concentrates on the most discriminative parts.Recently,transformer-based methods utilize attention map from the multi-headed self-attention(MHSA)module to predict pseudo labels,which usually contain obvious background noise and incoherent object area.To solve the above problems,we use the Conformer as our backbone,which is a parallel network based on convolutional neural network(CNN)and Transformer.The two branches generate pseudo labels and refine them independently,and can effectively combine the advantages of CNN and Transformer.However,the parallel structure is not close enough in the information communication.Thus,parallel structure can result in poor details about pseudo labels,and the background noise still exists.To alleviate this problem,we propose enhancing convolution CAM(ECCAM)model,which have three improved modules based on enhancing convolution,including deeper stem(DStem),convolutional feed-forward network(CFFN)and feature coupling unit with convolution(FCUConv).The ECCAM could make Conformer have tighter interaction between CNN and Transformer branches.After experimental verification,the improved modules we propose can help the network perceive more local information from images,making the final segmentation results more refined.Compared with similar architecture,our modules greatly improve the semantic segmentation performance and achieve70.2%mean intersection over union(mIoU)on the PASCAL VOC 2012 dataset.
基金funded by BK21 FOUR(Fostering Outstanding Universities for Research)(No.:5199990914048).
文摘Accurate segmentation of breast cancer in mammogram images plays a critical role in early diagnosis and treatment planning.As research in this domain continues to expand,various segmentation techniques have been proposed across classical image processing,machine learning(ML),deep learning(DL),and hybrid/ensemble models.This study conducts a systematic literature review using the PRISMA methodology,analyzing 57 selected articles to explore how these methods have evolved and been applied.The review highlights the strengths and limitations of each approach,identifies commonly used public datasets,and observes emerging trends in model integration and clinical relevance.By synthesizing current findings,this work provides a structured overview of segmentation strategies and outlines key considerations for developing more adaptable and explainable tools for breast cancer detection.Overall,our synthesis suggests that classical and ML methods are suitable for limited labels and computing resources,while DL models are preferable when pixel-level annotations and resources are available,and hybrid pipelines are most appropriate when fine-grained clinical precision is required.
基金supported by the Guangdong Pharmaceutical University 2024 Higher Education Research Projects(GKP202403,GMP202402)the Guangdong Pharmaceutical University College Students’Innovation and Entrepreneurship Training Programs(Grant No.202504302033,202504302034,202504302036,and 202504302244).
文摘Background:Diabetic macular edema is a prevalent retinal condition and a leading cause of visual impairment among diabetic patients’Early detection of affected areas is beneficial for effective diagnosis and treatment.Traditionally,diagnosis relies on optical coherence tomography imaging technology interpreted by ophthalmologists.However,this manual image interpretation is often slow and subjective.Therefore,developing automated segmentation for macular edema images is essential to enhance to improve the diagnosis efficiency and accuracy.Methods:In order to improve clinical diagnostic efficiency and accuracy,we proposed a SegNet network structure integrated with a convolutional block attention module(CBAM).This network introduces a multi-scale input module,the CBAM attention mechanism,and jump connection.The multi-scale input module enhances the network’s perceptual capabilities,while the lightweight CBAM effectively fuses relevant features across channels and spatial dimensions,allowing for better learning of varying information levels.Results:Experimental results demonstrate that the proposed network achieves an IoU of 80.127%and an accuracy of 99.162%.Compared to the traditional segmentation network,this model has fewer parameters,faster training and testing speed,and superior performance on semantic segmentation tasks,indicating its highly practical applicability.Conclusion:The C-SegNet proposed in this study enables accurate segmentation of Diabetic macular edema lesion images,which facilitates quicker diagnosis for healthcare professionals.
基金supported by the Funds for Central-Guided Local Science and Technology Development(Grant No.202407AC110005)Key Technologies for the Construction of a Whole-Process Intelligent Service System for Neuroendocrine Neoplasm.Supported by 2023 Opening Research Fund of Yunnan Key Laboratory of Digital Communications(YNJTKFB-20230686,YNKLDC-KFKT-202304).
文摘In image analysis,high-precision semantic segmentation predominantly relies on supervised learning.Despite significant advancements driven by deep learning techniques,challenges such as class imbalance and dynamic performance evaluation persist.Traditional weighting methods,often based on pre-statistical class counting,tend to overemphasize certain classes while neglecting others,particularly rare sample categories.Approaches like focal loss and other rare-sample segmentation techniques introduce multiple hyperparameters that require manual tuning,leading to increased experimental costs due to their instability.This paper proposes a novel CAWASeg framework to address these limitations.Our approach leverages Grad-CAM technology to generate class activation maps,identifying key feature regions that the model focuses on during decision-making.We introduce a Comprehensive Segmentation Performance Score(CSPS)to dynamically evaluate model performance by converting these activation maps into pseudo mask and comparing them with Ground Truth.Additionally,we design two adaptive weights for each class:a Basic Weight(BW)and a Ratio Weight(RW),which the model adjusts during training based on real-time feedback.Extensive experiments on the COCO-Stuff,CityScapes,and ADE20k datasets demonstrate that our CAWASeg framework significantly improves segmentation performance for rare sample categories while enhancing overall segmentation accuracy.The proposed method offers a robust and efficient solution for addressing class imbalance in semantic segmentation tasks.
文摘This systematic review aims to comprehensively examine and compare deep learning methods for brain tumor segmentation and classification using MRI and other imaging modalities,focusing on recent trends from 2022 to 2025.The primary objective is to evaluate methodological advancements,model performance,dataset usage,and existing challenges in developing clinically robust AI systems.We included peer-reviewed journal articles and highimpact conference papers published between 2022 and 2025,written in English,that proposed or evaluated deep learning methods for brain tumor segmentation and/or classification.Excluded were non-open-access publications,books,and non-English articles.A structured search was conducted across Scopus,Google Scholar,Wiley,and Taylor&Francis,with the last search performed in August 2025.Risk of bias was not formally quantified but considered during full-text screening based on dataset diversity,validation methods,and availability of performance metrics.We used narrative synthesis and tabular benchmarking to compare performance metrics(e.g.,accuracy,Dice score)across model types(CNN,Transformer,Hybrid),imaging modalities,and datasets.A total of 49 studies were included(43 journal articles and 6 conference papers).These studies spanned over 9 public datasets(e.g.,BraTS,Figshare,REMBRANDT,MOLAB)and utilized a range of imaging modalities,predominantly MRI.Hybrid models,especially ResViT and UNetFormer,consistently achieved high performance,with classification accuracy exceeding 98%and segmentation Dice scores above 0.90 across multiple studies.Transformers and hybrid architectures showed increasing adoption post2023.Many studies lacked external validation and were evaluated only on a few benchmark datasets,raising concerns about generalizability and dataset bias.Few studies addressed clinical interpretability or uncertainty quantification.Despite promising results,particularly for hybrid deep learning models,widespread clinical adoption remains limited due to lack of validation,interpretability concerns,and real-world deployment barriers.
文摘Inspections of power transmission lines(PTLs)conducted using unmanned aerial vehicles(UAVs)are complicated by the fine structure of the lines and complex backgrounds,making accurate and efficient segmentation challenging.This study presents the Wavelet-Guided Transformer U-Net(WGT-UNet)model,a new hybrid net-work that combines Convolutional Neural Networks(CNNs),Discrete Wavelet Transform(DWT),and Transformer architectures.The model’s primary contribution is based on spatial and channel attention mechanisms derived from wavelet subbands to guide the Transformer’s self-attention structure.Thus,low and high frequency components are separated at each stage using DWT,suppressing structural noise and making linear objects more prominent.The developed design is supported by multi-component hybrid cost functions that simultaneously solve class imbalance,edge sharpness,structural integrity,and spatial regularity issues.Furthermore,high segmentation success has been achieved in producing sharp boundaries and continuous line structures with the DWT-guided attention mechanism.Experiments conducted on the TTPLA dataset reveal that the version using the ConvNeXt backbone outperforms the current state-of-the-art approaches with an F1-Score of 79.33%and an Intersection over Union(IoU)value of 68.38%.The models and visual outputs of the developed method and all compared models can be accessed at https://github.com/burhanbarakli/WGT-UNET.
基金funded by the Research Project:THTETN.05/24-25,VietnamAcademy of Science and Technology.
文摘Satellite image segmentation plays a crucial role in remote sensing,supporting applications such as environmental monitoring,land use analysis,and disaster management.However,traditional segmentation methods often rely on large amounts of labeled data,which are costly and time-consuming to obtain,especially in largescale or dynamic environments.To address this challenge,we propose the Semi-Supervised Multi-View Picture Fuzzy Clustering(SS-MPFC)algorithm,which improves segmentation accuracy and robustness,particularly in complex and uncertain remote sensing scenarios.SS-MPFC unifies three paradigms:semi-supervised learning,multi-view clustering,and picture fuzzy set theory.This integration allows the model to effectively utilize a small number of labeled samples,fuse complementary information from multiple data views,and handle the ambiguity and uncertainty inherent in satellite imagery.We design a novel objective function that jointly incorporates picture fuzzy membership functions across multiple views of the data,and embeds pairwise semi-supervised constraints(must-link and cannot-link)directly into the clustering process to enhance segmentation accuracy.Experiments conducted on several benchmark satellite datasets demonstrate that SS-MPFC significantly outperforms existing state-of-the-art methods in segmentation accuracy,noise robustness,and semantic interpretability.On the Augsburg dataset,SS-MPFC achieves a Purity of 0.8158 and an Accuracy of 0.6860,highlighting its outstanding robustness and efficiency.These results demonstrate that SSMPFC offers a scalable and effective solution for real-world satellite-based monitoring systems,particularly in scenarios where rapid annotation is infeasible,such as wildfire tracking,agricultural monitoring,and dynamic urban mapping.
基金supported by the National Natural Science Foundation of China(Grant Nos.52304139,52325403)the CCTEG Coal Mining Research Institute funding(Grant No.KCYJY-2024-MS-10).
文摘3D laser scanning technology is widely used in underground openings for high-precision,rapid,and nondestructive structural evaluations.Segmenting large 3D point cloud datasets,particularly in coal mine roadways with multi-scale targets,remains challenging.This paper proposes an enhanced segmentation method integrating improved PointNet++with a coverage-voted strategy.The coverage-voted strategy reduces data while preserving multi-scale target topology.The segmentation is achieved using an enhanced PointNet++algorithm with a normalization preprocessing head,resulting in a 94%accuracy for common supporting components.Ablation experiments show that the preprocessing head and coverage strategies increase segmentation accuracy by 20%and 2%,respectively,and improve Intersection over Union(IoU)for bearing plate segmentation by 58%and 20%.The accuracy of the current pretraining segmentation model may be affected by variations in surface support components,but it can be readily enhanced through re-optimization with additional labeled point cloud data.This proposed method,combined with a previously developed machine learning model that links rock bolt load and the deformation field of its bearing plate,provides a robust technique for simultaneously measuring the load of multiple rock bolts in a single laser scan.
基金supported in part by the National Natural Science Foundation of China under Grant 62071405the National Natural Science Foundation of China under Grant 12175189.
文摘The accurate segmentation of deep gray matter nuclei is critical for neuropathological research,disease diagnosis and treatment.Existing methods employ the supervised learning training approach,which requires large labeled datasets.It is challenging and time-consuming to obtain such datasets for medical image analysis.In addition,these methods based on convolutional neural networks(CNNs)only achieve suboptimal performance due to the locality of convolutional operations.Vision Transformers(ViTs)efficiently model long-range dependencies and thus have the potentiality to outperform these methods in segmentation tasks.To address these issues,we propose a novel hybrid network based on self-supervised pre-training for deep gray matter nuclei segmentation.Specifically,we present a CNN-Transformer hybrid network(CTNet),whose encoder consists of 3D CNN and ViT to learn local spatial-detailed features and global semantic information.A self-supervised learning(SSL)approach that integrates rotation prediction and masked feature reconstruction is proposed to pre-train the CTNet,enabling the model to learn valuable visual representations from unlabeled data.We evaluate the effectiveness of our method on 3T and 7T human brain MRI datasets.The results demonstrate that our CTNet achieves better performance than other comparison models and our pre-training strategy outperforms other advanced self-supervised methods.When the training set has only one sample,our pre-trained CTNet enhances segmentation performance,showing an 8.4%improvement in Dice similarity coefficient(DSC)compared to the randomly initialized CTNet.