Organoids possess immense potential for unraveling the intricate functions of human tissues and facilitating preclinical disease treatment.Their applications span from high-throughput drug screening to the modeling of...Organoids possess immense potential for unraveling the intricate functions of human tissues and facilitating preclinical disease treatment.Their applications span from high-throughput drug screening to the modeling of complex diseases,with some even achieving clinical translation.Changes in the overall size,shape,boundary,and other morphological features of organoids provide a noninvasive method for assessing organoid drug sensitivity.However,the precise segmentation of organoids in bright-field microscopy images is made difficult by the complexity of the organoid morphology and interference,including overlapping organoids,bubbles,dust particles,and cell fragments.This paper introduces the precision organoid segmentation technique(POST),which is a deep-learning algorithm for segmenting challenging organoids under simple bright-field imaging conditions.Unlike existing methods,POST accurately segments each organoid and eliminates various artifacts encountered during organoid culturing and imaging.Furthermore,it is sensitive to and aligns with measurements of organoid activity in drug sensitivity experiments.POST is expected to be a valuable tool for drug screening using organoids owing to its capability of automatically and rapidly eliminating interfering substances and thereby streamlining the organoid analysis and drug screening process.展开更多
Brain tumors require precise segmentation for diagnosis and treatment plans due to their complex morphology and heterogeneous characteristics.While MRI-based automatic brain tumor segmentation technology reduces the b...Brain tumors require precise segmentation for diagnosis and treatment plans due to their complex morphology and heterogeneous characteristics.While MRI-based automatic brain tumor segmentation technology reduces the burden on medical staff and provides quantitative information,existing methodologies and recent models still struggle to accurately capture and classify the fine boundaries and diverse morphologies of tumors.In order to address these challenges and maximize the performance of brain tumor segmentation,this research introduces a novel SwinUNETR-based model by integrating a new decoder block,the Hierarchical Channel-wise Attention Decoder(HCAD),into a powerful SwinUNETR encoder.The HCAD decoder block utilizes hierarchical features and channelspecific attention mechanisms to further fuse information at different scales transmitted from the encoder and preserve spatial details throughout the reconstruction phase.Rigorous evaluations on the recent BraTS GLI datasets demonstrate that the proposed SwinHCAD model achieved superior and improved segmentation accuracy on both the Dice score and HD95 metrics across all tumor subregions(WT,TC,and ET)compared to baseline models.In particular,the rationale and contribution of the model design were clarified through ablation studies to verify the effectiveness of the proposed HCAD decoder block.The results of this study are expected to greatly contribute to enhancing the efficiency of clinical diagnosis and treatment planning by increasing the precision of automated brain tumor segmentation.展开更多
AIM:To construct an intelligent segmentation scheme for precise localization of central serous chorioretinopathy(CSC)leakage points,thereby enabling ophthalmologists to deliver accurate laser treatment without navigat...AIM:To construct an intelligent segmentation scheme for precise localization of central serous chorioretinopathy(CSC)leakage points,thereby enabling ophthalmologists to deliver accurate laser treatment without navigational laser equipment.METHODS:A dataset with dual labels(point-level and pixel-level)was first established based on fundus fluorescein angiography(FFA)images of CSC and subsequently divided into training(102 images),validation(40 images),and test(40 images)datasets.An intelligent segmentation method was then developed,based on the You Only Look Once version 8 Pose Estimation(YOLOv8-Pose)model and segment anything model(SAM),to segment CSC leakage points.Next,the YOLOv8-Pose model was trained for 200 epochs,and the best-performing model was selected to form the optimal combination with SAM.Additionally,the classic five types of U-Net series models[i.e.,U-Net,recurrent residual U-Net(R2U-Net),attention U-Net(AttU-Net),recurrent residual attention U-Net(R2AttUNet),and nested U-Net(UNet^(++))]were initialized with three random seeds and trained for 200 epochs,resulting in a total of 15 baseline models for comparison.Finally,based on the metrics including Dice similarity coefficient(DICE),intersection over union(IoU),precision,recall,precisionrecall(PR)curve,and receiver operating characteristic(ROC)curve,the proposed method was compared with baseline models through quantitative and qualitative experiments for leakage point segmentation,thereby demonstrating its effectiveness.RESULTS:With the increase of training epochs,the mAP50-95,Recall,and precision of the YOLOv8-Pose model showed a significant increase and tended to stabilize,and it achieved a preliminary localization success rate of 90%(i.e.,36 images)for CSC leakage points in 40 test images.Using manually expert-annotated pixel-level labels as the ground truth,the proposed method achieved outcomes with a DICE of 57.13%,an IoU of 45.31%,a precision of 45.91%,a recall of 93.57%,an area under the PR curve(AUC-PR)of 0.78 and an area under the ROC curve(AUC-ROC)of 0.97,which enables more accurate segmentation of CSC leakage points.CONCLUSION:By combining the precise localization capability of the YOLOv8-Pose model with the robust and flexible segmentation ability of SAM,the proposed method not only demonstrates the effectiveness of the YOLOv8-Pose model in detecting keypoint coordinates of CSC leakage points from the perspective of application innovation but also establishes a novel approach for accurate segmentation of CSC leakage points through the“detect-then-segment”strategy,thereby providing a potential auxiliary means for the automatic and precise realtime localization of leakage points during traditional laser photocoagulation for CSC.展开更多
Detailed individual tree crown segmentation is highly relevant for the detection and monitoring of Fraxinus excelsior L.trees affected by ash dieback,a major threat to common ash populations across Europe.In this stud...Detailed individual tree crown segmentation is highly relevant for the detection and monitoring of Fraxinus excelsior L.trees affected by ash dieback,a major threat to common ash populations across Europe.In this study,both fine and coarse crown segmentation methods were applied to close-range multispectral UAV imagery.The fine tree crown segmentation method utilized a novel unsupervised machine learning approach based on a blended NIR-NDVI image,whereas the coarse segmentation relied on the segment anything model(SAM).Both methods successfully delineated tree crown outlines,however,only the fine segmentation accurately captured internal canopy gaps.Despite these structural differences,mean NDVI values calculated per tree crown revealed no significant differences between the two approaches,indicating that coarse segmentation is sufficient for mean vegetation index assessments.Nevertheless,the fine segmentation revealed increased heterogeneity in NDVI values in more severely damaged trees,underscoring its value for detailed structural and health analyses.Furthermore,the fine segmentation workflow proved transferable to both individual UAV images and orthophotos from broader UAV surveys.For applications focused on structural integrity and spatial variation in canopy health,the fine segmentation approach is recommended.展开更多
Microscopy imaging is fundamental in analyzing bacterial morphology and dynamics,offering critical insights into bacterial physiology and pathogenicity.Image segmentation techniques enable quantitative analysis of bac...Microscopy imaging is fundamental in analyzing bacterial morphology and dynamics,offering critical insights into bacterial physiology and pathogenicity.Image segmentation techniques enable quantitative analysis of bacterial structures,facilitating precise measurement of morphological variations and population behaviors at single-cell resolution.This paper reviews advancements in bacterial image segmentation,emphasizing the shift from traditional thresholding and watershed methods to deep learning-driven approaches.Convolutional neural networks(CNNs),U-Net architectures,and three-dimensional(3D)frameworks excel at segmenting dense biofilms and resolving antibiotic-induced morphological changes.These methods combine automated feature extraction with physics-informed postprocessing.Despite progress,challenges persist in computational efficiency,cross-species generalizability,and integration with multimodal experimental workflows.Future progress will depend on improving model robustness across species and imaging modalities,integrating multimodal data for phenotype-function mapping,and developing standard pipelines that link computational tools with clinical diagnostics.These innovations will expand microbial phenotyping beyond structural analysis,enabling deeper insights into bacterial physiology and ecological interactions.展开更多
Advanced traffic monitoring systems encounter substantial challenges in vehicle detection and classification due to the limitations of conventional methods,which often demand extensive computational resources and stru...Advanced traffic monitoring systems encounter substantial challenges in vehicle detection and classification due to the limitations of conventional methods,which often demand extensive computational resources and struggle with diverse data acquisition techniques.This research presents a novel approach for vehicle classification and recognition in aerial image sequences,integrating multiple advanced techniques to enhance detection accuracy.The proposed model begins with preprocessing using Multiscale Retinex(MSR)to enhance image quality,followed by Expectation-Maximization(EM)Segmentation for precise foreground object identification.Vehicle detection is performed using the state-of-the-art YOLOv10 framework,while feature extraction incorporates Maximally Stable Extremal Regions(MSER),Dense Scale-Invariant Feature Transform(Dense SIFT),and Zernike Moments Features to capture distinct object characteristics.Feature optimization is further refined through a Hybrid Swarm-based Optimization algorithm,ensuring optimal feature selection for improved classification performance.The final classification is conducted using a Vision Transformer,leveraging its robust learning capabilities for enhanced accuracy.Experimental evaluations on benchmark datasets,including UAVDT and the Unmanned Aerial Vehicle Intruder Dataset(UAVID),demonstrate the superiority of the proposed approach,achieving an accuracy of 94.40%on UAVDT and 93.57%on UAVID.The results highlight the efficacy of the model in significantly enhancing vehicle detection and classification in aerial imagery,outperforming existing methodologies and offering a statistically validated improvement for intelligent traffic monitoring systems compared to existing approaches.展开更多
Modern manufacturing processes have become more reliant on automation because of the accelerated transition from Industry 3.0 to Industry 4.0.Manual inspection of products on assembly lines remains inefficient,prone t...Modern manufacturing processes have become more reliant on automation because of the accelerated transition from Industry 3.0 to Industry 4.0.Manual inspection of products on assembly lines remains inefficient,prone to errors and lacks consistency,emphasizing the need for a reliable and automated inspection system.Leveraging both object detection and image segmentation approaches,this research proposes a vision-based solution for the detection of various kinds of tools in the toolkit using deep learning(DL)models.Two Intel RealSense D455f depth cameras were arranged in a top down configuration to capture both RGB and depth images of the toolkits.After applying multiple constraints and enhancing them through preprocessing and augmentation,a dataset consisting of 3300 annotated RGB-D photos was generated.Several DL models were selected through a comprehensive assessment of mean Average Precision(mAP),precision-recall equilibrium,inference latency(target≥30 FPS),and computational burden,resulting in a preference for YOLO and Region-based Convolutional Neural Networks(R-CNN)variants over ViT-based models due to the latter’s increased latency and resource requirements.YOLOV5,YOLOV8,YOLOV11,Faster R-CNN,and Mask R-CNN were trained on the annotated dataset and evaluated using key performance metrics(Recall,Accuracy,F1-score,and Precision).YOLOV11 demonstrated balanced excellence with 93.0%precision,89.9%recall,and a 90.6%F1-score in object detection,as well as 96.9%precision,95.3%recall,and a 96.5%F1-score in instance segmentation with an average inference time of 25 ms per frame(≈40 FPS),demonstrating real-time performance.Leveraging these results,a YOLOV11-based windows application was successfully deployed in a real-time assembly line environment,where it accurately processed live video streams to detect and segment tools within toolkits,demonstrating its practical effectiveness in industrial automation.The application is capable of precisely measuring socket dimensions by utilising edge detection techniques on YOLOv11 segmentation masks,in addition to detection and segmentation.This makes it possible to do specification-level quality control right on the assembly line,which improves the ability to examine things in real time.The implementation is a big step forward for intelligent manufacturing in the Industry 4.0 paradigm.It provides a scalable,efficient,and accurate way to do automated inspection and dimensional verification activities.展开更多
Weakly Supervised Semantic Segmentation(WSSS),which relies only on image-level labels,has attracted significant attention for its cost-effectiveness and scalability.Existing methods mainly enhance inter-class distinct...Weakly Supervised Semantic Segmentation(WSSS),which relies only on image-level labels,has attracted significant attention for its cost-effectiveness and scalability.Existing methods mainly enhance inter-class distinctions and employ data augmentation to mitigate semantic ambiguity and reduce spurious activations.However,they often neglect the complex contextual dependencies among image patches,resulting in incomplete local representations and limited segmentation accuracy.To address these issues,we propose the Context Patch Fusion with Class Token Enhancement(CPF-CTE)framework,which exploits contextual relations among patches to enrich feature repre-sentations and improve segmentation.At its core,the Contextual-Fusion Bidirectional Long Short-Term Memory(CF-BiLSTM)module captures spatial dependencies between patches and enables bidirectional information flow,yield-ing a more comprehensive understanding of spatial correlations.This strengthens feature learning and segmentation robustness.Moreover,we introduce learnable class tokens that dynamically encode and refine class-specific semantics,enhancing discriminative capability.By effectively integrating spatial and semantic cues,CPF-CTE produces richer and more accurate representations of image content.Extensive experiments on PASCAL VOC 2012 and MS COCO 2014 validate that CPF-CTE consistently surpasses prior WSSS methods.展开更多
This systematic review aims to comprehensively examine and compare deep learning methods for brain tumor segmentation and classification using MRI and other imaging modalities,focusing on recent trends from 2022 to 20...This systematic review aims to comprehensively examine and compare deep learning methods for brain tumor segmentation and classification using MRI and other imaging modalities,focusing on recent trends from 2022 to 2025.The primary objective is to evaluate methodological advancements,model performance,dataset usage,and existing challenges in developing clinically robust AI systems.We included peer-reviewed journal articles and highimpact conference papers published between 2022 and 2025,written in English,that proposed or evaluated deep learning methods for brain tumor segmentation and/or classification.Excluded were non-open-access publications,books,and non-English articles.A structured search was conducted across Scopus,Google Scholar,Wiley,and Taylor&Francis,with the last search performed in August 2025.Risk of bias was not formally quantified but considered during full-text screening based on dataset diversity,validation methods,and availability of performance metrics.We used narrative synthesis and tabular benchmarking to compare performance metrics(e.g.,accuracy,Dice score)across model types(CNN,Transformer,Hybrid),imaging modalities,and datasets.A total of 49 studies were included(43 journal articles and 6 conference papers).These studies spanned over 9 public datasets(e.g.,BraTS,Figshare,REMBRANDT,MOLAB)and utilized a range of imaging modalities,predominantly MRI.Hybrid models,especially ResViT and UNetFormer,consistently achieved high performance,with classification accuracy exceeding 98%and segmentation Dice scores above 0.90 across multiple studies.Transformers and hybrid architectures showed increasing adoption post2023.Many studies lacked external validation and were evaluated only on a few benchmark datasets,raising concerns about generalizability and dataset bias.Few studies addressed clinical interpretability or uncertainty quantification.Despite promising results,particularly for hybrid deep learning models,widespread clinical adoption remains limited due to lack of validation,interpretability concerns,and real-world deployment barriers.展开更多
High-resolution remote sensing images(HRSIs)are now an essential data source for gathering surface information due to advancements in remote sensing data capture technologies.However,their significant scale changes an...High-resolution remote sensing images(HRSIs)are now an essential data source for gathering surface information due to advancements in remote sensing data capture technologies.However,their significant scale changes and wealth of spatial details pose challenges for semantic segmentation.While convolutional neural networks(CNNs)excel at capturing local features,they are limited in modeling long-range dependencies.Conversely,transformers utilize multihead self-attention to integrate global context effectively,but this approach often incurs a high computational cost.This paper proposes a global-local multiscale context network(GLMCNet)to extract both global and local multiscale contextual information from HRSIs.A detail-enhanced filtering module(DEFM)is proposed at the end of the encoder to refine the encoder outputs further,thereby enhancing the key details extracted by the encoder and effectively suppressing redundant information.In addition,a global-local multiscale transformer block(GLMTB)is proposed in the decoding stage to enable the modeling of rich multiscale global and local information.We also design a stair fusion mechanism to transmit deep semantic information from deep to shallow layers progressively.Finally,we propose the semantic awareness enhancement module(SAEM),which further enhances the representation of multiscale semantic features through spatial attention and covariance channel attention.Extensive ablation analyses and comparative experiments were conducted to evaluate the performance of the proposed method.Specifically,our method achieved a mean Intersection over Union(mIoU)of 86.89%on the ISPRS Potsdam dataset and 84.34%on the ISPRS Vaihingen dataset,outperforming existing models such as ABCNet and BANet.展开更多
3D laser scanning technology is widely used in underground openings for high-precision,rapid,and nondestructive structural evaluations.Segmenting large 3D point cloud datasets,particularly in coal mine roadways with m...3D laser scanning technology is widely used in underground openings for high-precision,rapid,and nondestructive structural evaluations.Segmenting large 3D point cloud datasets,particularly in coal mine roadways with multi-scale targets,remains challenging.This paper proposes an enhanced segmentation method integrating improved PointNet++with a coverage-voted strategy.The coverage-voted strategy reduces data while preserving multi-scale target topology.The segmentation is achieved using an enhanced PointNet++algorithm with a normalization preprocessing head,resulting in a 94%accuracy for common supporting components.Ablation experiments show that the preprocessing head and coverage strategies increase segmentation accuracy by 20%and 2%,respectively,and improve Intersection over Union(IoU)for bearing plate segmentation by 58%and 20%.The accuracy of the current pretraining segmentation model may be affected by variations in surface support components,but it can be readily enhanced through re-optimization with additional labeled point cloud data.This proposed method,combined with a previously developed machine learning model that links rock bolt load and the deformation field of its bearing plate,provides a robust technique for simultaneously measuring the load of multiple rock bolts in a single laser scan.展开更多
Cabin cables,as critical components of an aircraft's electrical system,significantly impact the operational efficiency and safety of the aircraft.The existing cable segmentation methods in civil aviation cabins ar...Cabin cables,as critical components of an aircraft's electrical system,significantly impact the operational efficiency and safety of the aircraft.The existing cable segmentation methods in civil aviation cabins are limited,especially in automation,heavily dependent on large amounts of data and resources,lacking the flexibility to adapt to different scenarios.To address these challenges,this paper introduces a novel image segmentation model,CableSAM,specifically designed for automated segmentation of cabin cables.CableSAM improves segmentation efficiency and accuracy using knowledge distillation and employs a context ensemble strategy.It accurately segments cables in various scenarios with minimal input prompts.Comparative experiments on three cable datasets demonstrate that CableSAM surpasses other advanced cable segmentation methods in performance.展开更多
Automatic segmentation and recognition of content and element information in color geological map are of great significance for researchers to analyze the distribution of mineral resources and predict disaster informa...Automatic segmentation and recognition of content and element information in color geological map are of great significance for researchers to analyze the distribution of mineral resources and predict disaster information.This article focuses on color planar raster geological map(geological maps include planar geological maps,columnar maps,and profiles).While existing deep learning approaches are often used to segment general images,their performance is limited due to complex elements,diverse regional features,and complicated backgrounds for color geological map in the domain of geoscience.To address the issue,a color geological map segmentation model is proposed that combines the Felz clustering algorithm and an improved SE-UNet deep learning network(named GeoMSeg).Firstly,a symmetrical encoder-decoder structure backbone network based on UNet is constructed,and the channel attention mechanism SENet has been incorporated to augment the network’s capacity for feature representation,enabling the model to purposefully extract map information.The SE-UNet network is employed for feature extraction from the geological map and obtain coarse segmentation results.Secondly,the Felz clustering algorithm is used for super pixel pre-segmentation of geological maps.The coarse segmentation results are refined and modified based on the super pixel pre-segmentation results to obtain the final segmentation results.This study applies GeoMSeg to the constructed dataset,and the experimental results show that the algorithm proposed in this paper has superior performance compared to other mainstream map segmentation models,with an accuracy of 91.89%and a MIoU of 71.91%.展开更多
The Internet of Vehicles (IoV) has become an important direction in the field of intelligent transportation, in which vehicle positioning is a crucial part. SLAM (Simultaneous Localization and Mapping) technology play...The Internet of Vehicles (IoV) has become an important direction in the field of intelligent transportation, in which vehicle positioning is a crucial part. SLAM (Simultaneous Localization and Mapping) technology plays a crucial role in vehicle localization and navigation. Traditional Simultaneous Localization and Mapping (SLAM) systems are designed for use in static environments, and they can result in poor performance in terms of accuracy and robustness when used in dynamic environments where objects are in constant movement. To address this issue, a new real-time visual SLAM system called MG-SLAM has been developed. Based on ORB-SLAM2, MG-SLAM incorporates a dynamic target detection process that enables the detection of both known and unknown moving objects. In this process, a separate semantic segmentation thread is required to segment dynamic target instances, and the Mask R-CNN algorithm is applied on the Graphics Processing Unit (GPU) to accelerate segmentation. To reduce computational cost, only key frames are segmented to identify known dynamic objects. Additionally, a multi-view geometry method is adopted to detect unknown moving objects. The results demonstrate that MG-SLAM achieves higher precision, with an improvement from 0.2730 m to 0.0135 m in precision. Moreover, the processing time required by MG-SLAM is significantly reduced compared to other dynamic scene SLAM algorithms, which illustrates its efficacy in locating objects in dynamic scenes.展开更多
Indoor scene semantic segmentation is essential for enabling robots to understand and interact with their environments effectively.However,numerous challenges remain unresolved,particularly in single-robot systems,whi...Indoor scene semantic segmentation is essential for enabling robots to understand and interact with their environments effectively.However,numerous challenges remain unresolved,particularly in single-robot systems,which often struggle with the complexity and variability of indoor scenes.To address these limitations,we introduce a novel multi-robot collaborative framework based on multiplex interactive learning(MPIL)in which each robot specialises in a distinct visual task within a unified multitask architecture.During training,the framework employs task-specific decoders and cross-task feature sharing to enhance collaborative optimisation.At inference time,robots operate independently with optimised models,enabling scalable,asynchronous and efficient deployment in real-world scenarios.Specifically,MPIL employs specially designed modules that integrate RGB and depth data,refine feature representations and facilitate the simultaneous execution of multiple tasks,such as instance segmentation,scene classification and semantic segmentation.By leveraging these modules,distinct agents within multi-robot systems can effectively handle specialised tasks,thereby enhancing the overall system's flexibility and adaptability.This collaborative effort maximises the strengths of each robot,resulting in a more comprehensive understanding of environments.Extensive experiments on two public benchmark datasets demonstrate MPIL's competitive performance compared to state-of-the-art approaches,highlighting the effectiveness and robustness of our multi-robot system in complex indoor environments.展开更多
Considering the three-dimensional(3D) U-Net lacks sufficient local feature extraction for image features and lacks attention to the fusion of high-and low-level features, we propose a new model called 3DMAU-Net based ...Considering the three-dimensional(3D) U-Net lacks sufficient local feature extraction for image features and lacks attention to the fusion of high-and low-level features, we propose a new model called 3DMAU-Net based on the 3D U-Net architecture for liver region segmentation. Our model replaces the last two layers of the 3D U-Net with a sliding window-based multilayer perceptron(SMLP), enabling better extraction of local image features. We also design a high-and low-level feature fusion dilated convolution block that focuses on local features and better supplements the surrounding information of the target region. This block is embedded in the entire encoding process, ensuring that the overall network is not simply downsampling. Before each feature extraction, the input features are processed by the dilated convolution block. We validate our experiments on the liver tumor segmentation challenge 2017(Lits2017) dataset, and our model achieves a Dice coefficient of 0.95, which is an improvement of 0.015 compared to the 3D U-Net model. Furthermore, we compare our results with other segmentation methods, and our model consistently outperforms them.展开更多
Brain tumor segmentation is critical in clinical diagnosis and treatment planning.Existing methods for brain tumor segmentation with missing modalities often struggle when dealing with multiple missing modalities,a co...Brain tumor segmentation is critical in clinical diagnosis and treatment planning.Existing methods for brain tumor segmentation with missing modalities often struggle when dealing with multiple missing modalities,a common scenario in real-world clinical settings.These methods primarily focus on handling a single missing modality at a time,making them insufficiently robust for the additional complexity encountered with incomplete data containing various missing modality combinations.Additionally,most existing methods rely on single models,which may limit their performance and increase the risk of overfitting the training data.This work proposes a novel method called the ensemble adversarial co-training neural network(EACNet)for accurate brain tumor segmentation from multi-modal magnetic resonance imaging(MRI)scans with multiple missing modalities.The proposed method consists of three key modules:the ensemble of pre-trained models,which captures diverse feature representations from the MRI data by employing an ensemble of pre-trained models;adversarial learning,which leverages a competitive training approach involving two models;a generator model,which creates realistic missing data,while sub-networks acting as discriminators learn to distinguish real data from the generated“fake”data.Co-training framework utilizes the information extracted by the multimodal path(trained on complete scans)to guide the learning process in the path handling missing modalities.The model potentially compensates for missing information through co-training interactions by exploiting the relationships between available modalities and the tumor segmentation task.EACNet was evaluated on the BraTS2018 and BraTS2020 challenge datasets and achieved state-of-the-art and competitive performance respectively.Notably,the segmentation results for the whole tumor(WT)dice similarity coefficient(DSC)reached 89.27%,surpassing the performance of existing methods.The analysis suggests that the ensemble approach offers potential benefits,and the adversarial co-training contributes to the increased robustness and accuracy of EACNet for brain tumor segmentation of MRI scans with missing modalities.The experimental results show that EACNet has promising results for the task of brain tumor segmentation of MRI scans with missing modalities and is a better candidate for real-world clinical applications.展开更多
Few-shot image semantic segmentation aims to achieve pixel-level classification for novel classes using only a few labeled examples.The method first trains the segmentation model on base classes,and then adapts it to ...Few-shot image semantic segmentation aims to achieve pixel-level classification for novel classes using only a few labeled examples.The method first trains the segmentation model on base classes,and then adapts it to novel classes.Although existing methods have achieved remarkable performance in few-shot image semantic segmentation,they still face the following challenges.Traditional methods typically rely on mask average pooling to generate single-category prototype vectors and perform feature matching via metric learning,but they exhibit significant limitations in modeling inter-category relationships and addressing complex background interference.Inspired by the analogy-based transfer mechanisms in cognitive psychology,we propose a Generalized Prototype Network(GPNet)to enhance the model's generalization ability for unseen categories and improve robustness in feature matching.GPNet consists of two key modules.The first is a generalized prototype enhancement module,which explores potential inter-category relationships to construct more discriminative category prototype representations.The second is a multi-scale feature alignment module,which dynamically aligns support and query features across multiple scales using an attention mechanism,thus mitigating background interference in complex scenarios.Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art approaches on several few-shot semantic segmentation benchmarks,validating its effectiveness and generalization capabilities.展开更多
Medical image segmentation has become a cornerstone for many healthcare applications,allowing for the automated extraction of critical information from images such as Computed Tomography(CT)scans,Magnetic Resonance Im...Medical image segmentation has become a cornerstone for many healthcare applications,allowing for the automated extraction of critical information from images such as Computed Tomography(CT)scans,Magnetic Resonance Imaging(MRIs),and X-rays.The introduction of U-Net in 2015 has significantly advanced segmentation capabilities,especially for small datasets commonly found in medical imaging.Since then,various modifications to the original U-Net architecture have been proposed to enhance segmentation accuracy and tackle challenges like class imbalance,data scarcity,and multi-modal image processing.This paper provides a detailed review and comparison of several U-Net-based architectures,focusing on their effectiveness in medical image segmentation tasks.We evaluate performance metrics such as Dice Similarity Coefficient(DSC)and Intersection over Union(IoU)across different U-Net variants including HmsU-Net,CrossU-Net,mResU-Net,and others.Our results indicate that architectural enhancements such as transformers,attention mechanisms,and residual connections improve segmentation performance across diverse medical imaging applications,including tumor detection,organ segmentation,and lesion identification.The study also identifies current challenges in the field,including data variability,limited dataset sizes,and issues with class imbalance.Based on these findings,the paper suggests potential future directions for improving the robustness and clinical applicability of U-Net-based models in medical image segmentation.展开更多
This paper presents CW-HRNet,a high-resolution,lightweight crack segmentation network designed to address challenges in complex scenes with slender,deformable,and blurred crack structures.The model incorporates two ke...This paper presents CW-HRNet,a high-resolution,lightweight crack segmentation network designed to address challenges in complex scenes with slender,deformable,and blurred crack structures.The model incorporates two key modules:Constrained Deformable Convolution(CDC),which stabilizes geometric alignment by applying a tanh limiter and learnable scaling factor to the predicted offsets,and the Wavelet Frequency Enhancement Module(WFEM),which decomposes features using Haar wavelets to preserve low-frequency structures while enhancing high-frequency boundaries and textures.Evaluations on the CrackSeg9k benchmark demonstrate CW-HRNet’s superior performance,achieving 82.39%mIoU with only 7.49M parameters and 10.34 GFLOPs,outperforming HrSegNet-B48 by 1.83% in segmentation accuracy with minimal complexity overhead.The model also shows strong cross-dataset generalization,achieving 60.01%mIoU and 66.22%F1 on Asphalt3k without fine-tuning.These results highlight CW-HRNet’s favorable accuracyefficiency trade-off for real-world crack segmentation tasks.展开更多
基金supported by the National Key R&D Program of China(No.2022YFC2504403)the National Natural Science Foundation of China(No.62172202)+1 种基金the Experiment Project of China Manned Space Program(No.HYZHXM01019)the Fundamental Research Funds for the Central Universities from Southeast University(No.3207032101C3)。
文摘Organoids possess immense potential for unraveling the intricate functions of human tissues and facilitating preclinical disease treatment.Their applications span from high-throughput drug screening to the modeling of complex diseases,with some even achieving clinical translation.Changes in the overall size,shape,boundary,and other morphological features of organoids provide a noninvasive method for assessing organoid drug sensitivity.However,the precise segmentation of organoids in bright-field microscopy images is made difficult by the complexity of the organoid morphology and interference,including overlapping organoids,bubbles,dust particles,and cell fragments.This paper introduces the precision organoid segmentation technique(POST),which is a deep-learning algorithm for segmenting challenging organoids under simple bright-field imaging conditions.Unlike existing methods,POST accurately segments each organoid and eliminates various artifacts encountered during organoid culturing and imaging.Furthermore,it is sensitive to and aligns with measurements of organoid activity in drug sensitivity experiments.POST is expected to be a valuable tool for drug screening using organoids owing to its capability of automatically and rapidly eliminating interfering substances and thereby streamlining the organoid analysis and drug screening process.
基金supported by Institute of Information&Communications Technology Planning&Evaluation(IITP)under the Metaverse Support Program to Nurture the Best Talents(IITP-2024-RS-2023-00254529)grant funded by the Korea government(MSIT).
文摘Brain tumors require precise segmentation for diagnosis and treatment plans due to their complex morphology and heterogeneous characteristics.While MRI-based automatic brain tumor segmentation technology reduces the burden on medical staff and provides quantitative information,existing methodologies and recent models still struggle to accurately capture and classify the fine boundaries and diverse morphologies of tumors.In order to address these challenges and maximize the performance of brain tumor segmentation,this research introduces a novel SwinUNETR-based model by integrating a new decoder block,the Hierarchical Channel-wise Attention Decoder(HCAD),into a powerful SwinUNETR encoder.The HCAD decoder block utilizes hierarchical features and channelspecific attention mechanisms to further fuse information at different scales transmitted from the encoder and preserve spatial details throughout the reconstruction phase.Rigorous evaluations on the recent BraTS GLI datasets demonstrate that the proposed SwinHCAD model achieved superior and improved segmentation accuracy on both the Dice score and HD95 metrics across all tumor subregions(WT,TC,and ET)compared to baseline models.In particular,the rationale and contribution of the model design were clarified through ablation studies to verify the effectiveness of the proposed HCAD decoder block.The results of this study are expected to greatly contribute to enhancing the efficiency of clinical diagnosis and treatment planning by increasing the precision of automated brain tumor segmentation.
基金Supported by the Shenzhen Science and Technology Program(No.JCYJ20240813152704006)the National Natural Science Foundation of China(No.62401259)+2 种基金the Fundamental Research Funds for the Central Universities(No.NZ2024036)the Postdoctoral Fellowship Program of CPSF(No.GZC20242228)High Performance Computing Platform of Nanjing University of Aeronautics and Astronautics。
文摘AIM:To construct an intelligent segmentation scheme for precise localization of central serous chorioretinopathy(CSC)leakage points,thereby enabling ophthalmologists to deliver accurate laser treatment without navigational laser equipment.METHODS:A dataset with dual labels(point-level and pixel-level)was first established based on fundus fluorescein angiography(FFA)images of CSC and subsequently divided into training(102 images),validation(40 images),and test(40 images)datasets.An intelligent segmentation method was then developed,based on the You Only Look Once version 8 Pose Estimation(YOLOv8-Pose)model and segment anything model(SAM),to segment CSC leakage points.Next,the YOLOv8-Pose model was trained for 200 epochs,and the best-performing model was selected to form the optimal combination with SAM.Additionally,the classic five types of U-Net series models[i.e.,U-Net,recurrent residual U-Net(R2U-Net),attention U-Net(AttU-Net),recurrent residual attention U-Net(R2AttUNet),and nested U-Net(UNet^(++))]were initialized with three random seeds and trained for 200 epochs,resulting in a total of 15 baseline models for comparison.Finally,based on the metrics including Dice similarity coefficient(DICE),intersection over union(IoU),precision,recall,precisionrecall(PR)curve,and receiver operating characteristic(ROC)curve,the proposed method was compared with baseline models through quantitative and qualitative experiments for leakage point segmentation,thereby demonstrating its effectiveness.RESULTS:With the increase of training epochs,the mAP50-95,Recall,and precision of the YOLOv8-Pose model showed a significant increase and tended to stabilize,and it achieved a preliminary localization success rate of 90%(i.e.,36 images)for CSC leakage points in 40 test images.Using manually expert-annotated pixel-level labels as the ground truth,the proposed method achieved outcomes with a DICE of 57.13%,an IoU of 45.31%,a precision of 45.91%,a recall of 93.57%,an area under the PR curve(AUC-PR)of 0.78 and an area under the ROC curve(AUC-ROC)of 0.97,which enables more accurate segmentation of CSC leakage points.CONCLUSION:By combining the precise localization capability of the YOLOv8-Pose model with the robust and flexible segmentation ability of SAM,the proposed method not only demonstrates the effectiveness of the YOLOv8-Pose model in detecting keypoint coordinates of CSC leakage points from the perspective of application innovation but also establishes a novel approach for accurate segmentation of CSC leakage points through the“detect-then-segment”strategy,thereby providing a potential auxiliary means for the automatic and precise realtime localization of leakage points during traditional laser photocoagulation for CSC.
基金This study was conducted within the project FraxVir“Detection,characterisation and analyses of the occurrence of viruses and ash dieback in special stands of Fraxinus excelsior-a supplementary study to the FraxForFuture demonstration project”and receives funding via the Waldklimafonds(WKF)funded by the German Federal Ministry of Food and Agriculture(BMEL)and Federal Ministry for the Environment,Nature Conservation,Nuclear Safety and Consumer Protection(BMUV)administrated by the Agency for Renewable Resources(FNR)under grant agreement 2220WK40A4.
文摘Detailed individual tree crown segmentation is highly relevant for the detection and monitoring of Fraxinus excelsior L.trees affected by ash dieback,a major threat to common ash populations across Europe.In this study,both fine and coarse crown segmentation methods were applied to close-range multispectral UAV imagery.The fine tree crown segmentation method utilized a novel unsupervised machine learning approach based on a blended NIR-NDVI image,whereas the coarse segmentation relied on the segment anything model(SAM).Both methods successfully delineated tree crown outlines,however,only the fine segmentation accurately captured internal canopy gaps.Despite these structural differences,mean NDVI values calculated per tree crown revealed no significant differences between the two approaches,indicating that coarse segmentation is sufficient for mean vegetation index assessments.Nevertheless,the fine segmentation revealed increased heterogeneity in NDVI values in more severely damaged trees,underscoring its value for detailed structural and health analyses.Furthermore,the fine segmentation workflow proved transferable to both individual UAV images and orthophotos from broader UAV surveys.For applications focused on structural integrity and spatial variation in canopy health,the fine segmentation approach is recommended.
基金financially supported by the Open Project Program of Wuhan National Laboratory for Optoelectronics(No.2022WNLOKF009)the National Natural Science Foundation of China(No.62475216)+2 种基金the Key Research and Development Program of Shaanxi(No.2024GH-ZDXM-37)the Fujian Provincial Natural Science Foundation of China(No.2024J01060)the Startup Program of XMU,and the Fundamental Research Funds for the Central Universities.
文摘Microscopy imaging is fundamental in analyzing bacterial morphology and dynamics,offering critical insights into bacterial physiology and pathogenicity.Image segmentation techniques enable quantitative analysis of bacterial structures,facilitating precise measurement of morphological variations and population behaviors at single-cell resolution.This paper reviews advancements in bacterial image segmentation,emphasizing the shift from traditional thresholding and watershed methods to deep learning-driven approaches.Convolutional neural networks(CNNs),U-Net architectures,and three-dimensional(3D)frameworks excel at segmenting dense biofilms and resolving antibiotic-induced morphological changes.These methods combine automated feature extraction with physics-informed postprocessing.Despite progress,challenges persist in computational efficiency,cross-species generalizability,and integration with multimodal experimental workflows.Future progress will depend on improving model robustness across species and imaging modalities,integrating multimodal data for phenotype-function mapping,and developing standard pipelines that link computational tools with clinical diagnostics.These innovations will expand microbial phenotyping beyond structural analysis,enabling deeper insights into bacterial physiology and ecological interactions.
文摘Advanced traffic monitoring systems encounter substantial challenges in vehicle detection and classification due to the limitations of conventional methods,which often demand extensive computational resources and struggle with diverse data acquisition techniques.This research presents a novel approach for vehicle classification and recognition in aerial image sequences,integrating multiple advanced techniques to enhance detection accuracy.The proposed model begins with preprocessing using Multiscale Retinex(MSR)to enhance image quality,followed by Expectation-Maximization(EM)Segmentation for precise foreground object identification.Vehicle detection is performed using the state-of-the-art YOLOv10 framework,while feature extraction incorporates Maximally Stable Extremal Regions(MSER),Dense Scale-Invariant Feature Transform(Dense SIFT),and Zernike Moments Features to capture distinct object characteristics.Feature optimization is further refined through a Hybrid Swarm-based Optimization algorithm,ensuring optimal feature selection for improved classification performance.The final classification is conducted using a Vision Transformer,leveraging its robust learning capabilities for enhanced accuracy.Experimental evaluations on benchmark datasets,including UAVDT and the Unmanned Aerial Vehicle Intruder Dataset(UAVID),demonstrate the superiority of the proposed approach,achieving an accuracy of 94.40%on UAVDT and 93.57%on UAVID.The results highlight the efficacy of the model in significantly enhancing vehicle detection and classification in aerial imagery,outperforming existing methodologies and offering a statistically validated improvement for intelligent traffic monitoring systems compared to existing approaches.
文摘Modern manufacturing processes have become more reliant on automation because of the accelerated transition from Industry 3.0 to Industry 4.0.Manual inspection of products on assembly lines remains inefficient,prone to errors and lacks consistency,emphasizing the need for a reliable and automated inspection system.Leveraging both object detection and image segmentation approaches,this research proposes a vision-based solution for the detection of various kinds of tools in the toolkit using deep learning(DL)models.Two Intel RealSense D455f depth cameras were arranged in a top down configuration to capture both RGB and depth images of the toolkits.After applying multiple constraints and enhancing them through preprocessing and augmentation,a dataset consisting of 3300 annotated RGB-D photos was generated.Several DL models were selected through a comprehensive assessment of mean Average Precision(mAP),precision-recall equilibrium,inference latency(target≥30 FPS),and computational burden,resulting in a preference for YOLO and Region-based Convolutional Neural Networks(R-CNN)variants over ViT-based models due to the latter’s increased latency and resource requirements.YOLOV5,YOLOV8,YOLOV11,Faster R-CNN,and Mask R-CNN were trained on the annotated dataset and evaluated using key performance metrics(Recall,Accuracy,F1-score,and Precision).YOLOV11 demonstrated balanced excellence with 93.0%precision,89.9%recall,and a 90.6%F1-score in object detection,as well as 96.9%precision,95.3%recall,and a 96.5%F1-score in instance segmentation with an average inference time of 25 ms per frame(≈40 FPS),demonstrating real-time performance.Leveraging these results,a YOLOV11-based windows application was successfully deployed in a real-time assembly line environment,where it accurately processed live video streams to detect and segment tools within toolkits,demonstrating its practical effectiveness in industrial automation.The application is capable of precisely measuring socket dimensions by utilising edge detection techniques on YOLOv11 segmentation masks,in addition to detection and segmentation.This makes it possible to do specification-level quality control right on the assembly line,which improves the ability to examine things in real time.The implementation is a big step forward for intelligent manufacturing in the Industry 4.0 paradigm.It provides a scalable,efficient,and accurate way to do automated inspection and dimensional verification activities.
文摘Weakly Supervised Semantic Segmentation(WSSS),which relies only on image-level labels,has attracted significant attention for its cost-effectiveness and scalability.Existing methods mainly enhance inter-class distinctions and employ data augmentation to mitigate semantic ambiguity and reduce spurious activations.However,they often neglect the complex contextual dependencies among image patches,resulting in incomplete local representations and limited segmentation accuracy.To address these issues,we propose the Context Patch Fusion with Class Token Enhancement(CPF-CTE)framework,which exploits contextual relations among patches to enrich feature repre-sentations and improve segmentation.At its core,the Contextual-Fusion Bidirectional Long Short-Term Memory(CF-BiLSTM)module captures spatial dependencies between patches and enables bidirectional information flow,yield-ing a more comprehensive understanding of spatial correlations.This strengthens feature learning and segmentation robustness.Moreover,we introduce learnable class tokens that dynamically encode and refine class-specific semantics,enhancing discriminative capability.By effectively integrating spatial and semantic cues,CPF-CTE produces richer and more accurate representations of image content.Extensive experiments on PASCAL VOC 2012 and MS COCO 2014 validate that CPF-CTE consistently surpasses prior WSSS methods.
文摘This systematic review aims to comprehensively examine and compare deep learning methods for brain tumor segmentation and classification using MRI and other imaging modalities,focusing on recent trends from 2022 to 2025.The primary objective is to evaluate methodological advancements,model performance,dataset usage,and existing challenges in developing clinically robust AI systems.We included peer-reviewed journal articles and highimpact conference papers published between 2022 and 2025,written in English,that proposed or evaluated deep learning methods for brain tumor segmentation and/or classification.Excluded were non-open-access publications,books,and non-English articles.A structured search was conducted across Scopus,Google Scholar,Wiley,and Taylor&Francis,with the last search performed in August 2025.Risk of bias was not formally quantified but considered during full-text screening based on dataset diversity,validation methods,and availability of performance metrics.We used narrative synthesis and tabular benchmarking to compare performance metrics(e.g.,accuracy,Dice score)across model types(CNN,Transformer,Hybrid),imaging modalities,and datasets.A total of 49 studies were included(43 journal articles and 6 conference papers).These studies spanned over 9 public datasets(e.g.,BraTS,Figshare,REMBRANDT,MOLAB)and utilized a range of imaging modalities,predominantly MRI.Hybrid models,especially ResViT and UNetFormer,consistently achieved high performance,with classification accuracy exceeding 98%and segmentation Dice scores above 0.90 across multiple studies.Transformers and hybrid architectures showed increasing adoption post2023.Many studies lacked external validation and were evaluated only on a few benchmark datasets,raising concerns about generalizability and dataset bias.Few studies addressed clinical interpretability or uncertainty quantification.Despite promising results,particularly for hybrid deep learning models,widespread clinical adoption remains limited due to lack of validation,interpretability concerns,and real-world deployment barriers.
基金provided by the Science Research Project of Hebei Education Department under grant No.BJK2024115.
文摘High-resolution remote sensing images(HRSIs)are now an essential data source for gathering surface information due to advancements in remote sensing data capture technologies.However,their significant scale changes and wealth of spatial details pose challenges for semantic segmentation.While convolutional neural networks(CNNs)excel at capturing local features,they are limited in modeling long-range dependencies.Conversely,transformers utilize multihead self-attention to integrate global context effectively,but this approach often incurs a high computational cost.This paper proposes a global-local multiscale context network(GLMCNet)to extract both global and local multiscale contextual information from HRSIs.A detail-enhanced filtering module(DEFM)is proposed at the end of the encoder to refine the encoder outputs further,thereby enhancing the key details extracted by the encoder and effectively suppressing redundant information.In addition,a global-local multiscale transformer block(GLMTB)is proposed in the decoding stage to enable the modeling of rich multiscale global and local information.We also design a stair fusion mechanism to transmit deep semantic information from deep to shallow layers progressively.Finally,we propose the semantic awareness enhancement module(SAEM),which further enhances the representation of multiscale semantic features through spatial attention and covariance channel attention.Extensive ablation analyses and comparative experiments were conducted to evaluate the performance of the proposed method.Specifically,our method achieved a mean Intersection over Union(mIoU)of 86.89%on the ISPRS Potsdam dataset and 84.34%on the ISPRS Vaihingen dataset,outperforming existing models such as ABCNet and BANet.
基金supported by the National Natural Science Foundation of China(Grant Nos.52304139,52325403)the CCTEG Coal Mining Research Institute funding(Grant No.KCYJY-2024-MS-10).
文摘3D laser scanning technology is widely used in underground openings for high-precision,rapid,and nondestructive structural evaluations.Segmenting large 3D point cloud datasets,particularly in coal mine roadways with multi-scale targets,remains challenging.This paper proposes an enhanced segmentation method integrating improved PointNet++with a coverage-voted strategy.The coverage-voted strategy reduces data while preserving multi-scale target topology.The segmentation is achieved using an enhanced PointNet++algorithm with a normalization preprocessing head,resulting in a 94%accuracy for common supporting components.Ablation experiments show that the preprocessing head and coverage strategies increase segmentation accuracy by 20%and 2%,respectively,and improve Intersection over Union(IoU)for bearing plate segmentation by 58%and 20%.The accuracy of the current pretraining segmentation model may be affected by variations in surface support components,but it can be readily enhanced through re-optimization with additional labeled point cloud data.This proposed method,combined with a previously developed machine learning model that links rock bolt load and the deformation field of its bearing plate,provides a robust technique for simultaneously measuring the load of multiple rock bolts in a single laser scan.
基金supported by the Innovation Foundation of National Commercial Aircraft Manufacturing Engineering Technology Research Center(No.COMAC-SFGS-2022-1877)in part by the National Natural Science Foundation of China(No.92048301)。
文摘Cabin cables,as critical components of an aircraft's electrical system,significantly impact the operational efficiency and safety of the aircraft.The existing cable segmentation methods in civil aviation cabins are limited,especially in automation,heavily dependent on large amounts of data and resources,lacking the flexibility to adapt to different scenarios.To address these challenges,this paper introduces a novel image segmentation model,CableSAM,specifically designed for automated segmentation of cabin cables.CableSAM improves segmentation efficiency and accuracy using knowledge distillation and employs a context ensemble strategy.It accurately segments cables in various scenarios with minimal input prompts.Comparative experiments on three cable datasets demonstrate that CableSAM surpasses other advanced cable segmentation methods in performance.
基金financially supported by the Natural Science Foundation of China(42301492)the Open Fund of Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering(2022SDSJ04,2024SDSJ03)+1 种基金the Opening Fund of Key Laboratory of Geological Survey and Evaluation of Ministry of Education(GLAB 2023ZR01,GLAB2024ZR08)the Fundamental Research Funds for the Central Universities.
文摘Automatic segmentation and recognition of content and element information in color geological map are of great significance for researchers to analyze the distribution of mineral resources and predict disaster information.This article focuses on color planar raster geological map(geological maps include planar geological maps,columnar maps,and profiles).While existing deep learning approaches are often used to segment general images,their performance is limited due to complex elements,diverse regional features,and complicated backgrounds for color geological map in the domain of geoscience.To address the issue,a color geological map segmentation model is proposed that combines the Felz clustering algorithm and an improved SE-UNet deep learning network(named GeoMSeg).Firstly,a symmetrical encoder-decoder structure backbone network based on UNet is constructed,and the channel attention mechanism SENet has been incorporated to augment the network’s capacity for feature representation,enabling the model to purposefully extract map information.The SE-UNet network is employed for feature extraction from the geological map and obtain coarse segmentation results.Secondly,the Felz clustering algorithm is used for super pixel pre-segmentation of geological maps.The coarse segmentation results are refined and modified based on the super pixel pre-segmentation results to obtain the final segmentation results.This study applies GeoMSeg to the constructed dataset,and the experimental results show that the algorithm proposed in this paper has superior performance compared to other mainstream map segmentation models,with an accuracy of 91.89%and a MIoU of 71.91%.
基金funded by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China(grant number 22KJD440001)Changzhou Science&Technology Program(grant number CJ20220232).
文摘The Internet of Vehicles (IoV) has become an important direction in the field of intelligent transportation, in which vehicle positioning is a crucial part. SLAM (Simultaneous Localization and Mapping) technology plays a crucial role in vehicle localization and navigation. Traditional Simultaneous Localization and Mapping (SLAM) systems are designed for use in static environments, and they can result in poor performance in terms of accuracy and robustness when used in dynamic environments where objects are in constant movement. To address this issue, a new real-time visual SLAM system called MG-SLAM has been developed. Based on ORB-SLAM2, MG-SLAM incorporates a dynamic target detection process that enables the detection of both known and unknown moving objects. In this process, a separate semantic segmentation thread is required to segment dynamic target instances, and the Mask R-CNN algorithm is applied on the Graphics Processing Unit (GPU) to accelerate segmentation. To reduce computational cost, only key frames are segmented to identify known dynamic objects. Additionally, a multi-view geometry method is adopted to detect unknown moving objects. The results demonstrate that MG-SLAM achieves higher precision, with an improvement from 0.2730 m to 0.0135 m in precision. Moreover, the processing time required by MG-SLAM is significantly reduced compared to other dynamic scene SLAM algorithms, which illustrates its efficacy in locating objects in dynamic scenes.
基金supported by the National Natural Science Foundation of China under Grant 62373009.
文摘Indoor scene semantic segmentation is essential for enabling robots to understand and interact with their environments effectively.However,numerous challenges remain unresolved,particularly in single-robot systems,which often struggle with the complexity and variability of indoor scenes.To address these limitations,we introduce a novel multi-robot collaborative framework based on multiplex interactive learning(MPIL)in which each robot specialises in a distinct visual task within a unified multitask architecture.During training,the framework employs task-specific decoders and cross-task feature sharing to enhance collaborative optimisation.At inference time,robots operate independently with optimised models,enabling scalable,asynchronous and efficient deployment in real-world scenarios.Specifically,MPIL employs specially designed modules that integrate RGB and depth data,refine feature representations and facilitate the simultaneous execution of multiple tasks,such as instance segmentation,scene classification and semantic segmentation.By leveraging these modules,distinct agents within multi-robot systems can effectively handle specialised tasks,thereby enhancing the overall system's flexibility and adaptability.This collaborative effort maximises the strengths of each robot,resulting in a more comprehensive understanding of environments.Extensive experiments on two public benchmark datasets demonstrate MPIL's competitive performance compared to state-of-the-art approaches,highlighting the effectiveness and robustness of our multi-robot system in complex indoor environments.
基金supported by the Shandong Provincial Natural Science Foundation (Nos.ZR2023MF062 and ZR2021MF115)the Introduction and Cultivation Program for Young Innovative Talents of Universities in Shandong (No.2021QCYY003)。
文摘Considering the three-dimensional(3D) U-Net lacks sufficient local feature extraction for image features and lacks attention to the fusion of high-and low-level features, we propose a new model called 3DMAU-Net based on the 3D U-Net architecture for liver region segmentation. Our model replaces the last two layers of the 3D U-Net with a sliding window-based multilayer perceptron(SMLP), enabling better extraction of local image features. We also design a high-and low-level feature fusion dilated convolution block that focuses on local features and better supplements the surrounding information of the target region. This block is embedded in the entire encoding process, ensuring that the overall network is not simply downsampling. Before each feature extraction, the input features are processed by the dilated convolution block. We validate our experiments on the liver tumor segmentation challenge 2017(Lits2017) dataset, and our model achieves a Dice coefficient of 0.95, which is an improvement of 0.015 compared to the 3D U-Net model. Furthermore, we compare our results with other segmentation methods, and our model consistently outperforms them.
基金supported by Gansu Natural Science Foundation Programme(No.24JRRA231)National Natural Science Foundation of China(No.62061023)Gansu Provincial Education,Science and Technology Innovation and Industry(No.2021CYZC-04)。
文摘Brain tumor segmentation is critical in clinical diagnosis and treatment planning.Existing methods for brain tumor segmentation with missing modalities often struggle when dealing with multiple missing modalities,a common scenario in real-world clinical settings.These methods primarily focus on handling a single missing modality at a time,making them insufficiently robust for the additional complexity encountered with incomplete data containing various missing modality combinations.Additionally,most existing methods rely on single models,which may limit their performance and increase the risk of overfitting the training data.This work proposes a novel method called the ensemble adversarial co-training neural network(EACNet)for accurate brain tumor segmentation from multi-modal magnetic resonance imaging(MRI)scans with multiple missing modalities.The proposed method consists of three key modules:the ensemble of pre-trained models,which captures diverse feature representations from the MRI data by employing an ensemble of pre-trained models;adversarial learning,which leverages a competitive training approach involving two models;a generator model,which creates realistic missing data,while sub-networks acting as discriminators learn to distinguish real data from the generated“fake”data.Co-training framework utilizes the information extracted by the multimodal path(trained on complete scans)to guide the learning process in the path handling missing modalities.The model potentially compensates for missing information through co-training interactions by exploiting the relationships between available modalities and the tumor segmentation task.EACNet was evaluated on the BraTS2018 and BraTS2020 challenge datasets and achieved state-of-the-art and competitive performance respectively.Notably,the segmentation results for the whole tumor(WT)dice similarity coefficient(DSC)reached 89.27%,surpassing the performance of existing methods.The analysis suggests that the ensemble approach offers potential benefits,and the adversarial co-training contributes to the increased robustness and accuracy of EACNet for brain tumor segmentation of MRI scans with missing modalities.The experimental results show that EACNet has promising results for the task of brain tumor segmentation of MRI scans with missing modalities and is a better candidate for real-world clinical applications.
文摘Few-shot image semantic segmentation aims to achieve pixel-level classification for novel classes using only a few labeled examples.The method first trains the segmentation model on base classes,and then adapts it to novel classes.Although existing methods have achieved remarkable performance in few-shot image semantic segmentation,they still face the following challenges.Traditional methods typically rely on mask average pooling to generate single-category prototype vectors and perform feature matching via metric learning,but they exhibit significant limitations in modeling inter-category relationships and addressing complex background interference.Inspired by the analogy-based transfer mechanisms in cognitive psychology,we propose a Generalized Prototype Network(GPNet)to enhance the model's generalization ability for unseen categories and improve robustness in feature matching.GPNet consists of two key modules.The first is a generalized prototype enhancement module,which explores potential inter-category relationships to construct more discriminative category prototype representations.The second is a multi-scale feature alignment module,which dynamically aligns support and query features across multiple scales using an attention mechanism,thus mitigating background interference in complex scenarios.Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art approaches on several few-shot semantic segmentation benchmarks,validating its effectiveness and generalization capabilities.
文摘Medical image segmentation has become a cornerstone for many healthcare applications,allowing for the automated extraction of critical information from images such as Computed Tomography(CT)scans,Magnetic Resonance Imaging(MRIs),and X-rays.The introduction of U-Net in 2015 has significantly advanced segmentation capabilities,especially for small datasets commonly found in medical imaging.Since then,various modifications to the original U-Net architecture have been proposed to enhance segmentation accuracy and tackle challenges like class imbalance,data scarcity,and multi-modal image processing.This paper provides a detailed review and comparison of several U-Net-based architectures,focusing on their effectiveness in medical image segmentation tasks.We evaluate performance metrics such as Dice Similarity Coefficient(DSC)and Intersection over Union(IoU)across different U-Net variants including HmsU-Net,CrossU-Net,mResU-Net,and others.Our results indicate that architectural enhancements such as transformers,attention mechanisms,and residual connections improve segmentation performance across diverse medical imaging applications,including tumor detection,organ segmentation,and lesion identification.The study also identifies current challenges in the field,including data variability,limited dataset sizes,and issues with class imbalance.Based on these findings,the paper suggests potential future directions for improving the robustness and clinical applicability of U-Net-based models in medical image segmentation.
文摘This paper presents CW-HRNet,a high-resolution,lightweight crack segmentation network designed to address challenges in complex scenes with slender,deformable,and blurred crack structures.The model incorporates two key modules:Constrained Deformable Convolution(CDC),which stabilizes geometric alignment by applying a tanh limiter and learnable scaling factor to the predicted offsets,and the Wavelet Frequency Enhancement Module(WFEM),which decomposes features using Haar wavelets to preserve low-frequency structures while enhancing high-frequency boundaries and textures.Evaluations on the CrackSeg9k benchmark demonstrate CW-HRNet’s superior performance,achieving 82.39%mIoU with only 7.49M parameters and 10.34 GFLOPs,outperforming HrSegNet-B48 by 1.83% in segmentation accuracy with minimal complexity overhead.The model also shows strong cross-dataset generalization,achieving 60.01%mIoU and 66.22%F1 on Asphalt3k without fine-tuning.These results highlight CW-HRNet’s favorable accuracyefficiency trade-off for real-world crack segmentation tasks.