Recent studies indicate that millions of individuals suffer from renal diseases,with renal carcinoma,a type of kidney cancer,emerging as both a chronic illness and a significant cause of mortality.Magnetic Resonance I...Recent studies indicate that millions of individuals suffer from renal diseases,with renal carcinoma,a type of kidney cancer,emerging as both a chronic illness and a significant cause of mortality.Magnetic Resonance Imaging(MRI)and Computed Tomography(CT)have become essential tools for diagnosing and assessing kidney disorders.However,accurate analysis of thesemedical images is critical for detecting and evaluating tumor severity.This study introduces an integrated hybrid framework that combines three complementary deep learning models for kidney tumor segmentation from MRI images.The proposed framework fuses a customized U-Net and Mask R-CNN using a weighted scheme to achieve semantic and instance-level segmentation.The fused outputs are further refined through edge detection using Stochastic FeatureMapping Neural Networks(SFMNN),while volumetric consistency is ensured through Improved Mini-Batch K-Means(IMBKM)clustering integrated with an Encoder-Decoder Convolutional Neural Network(EDCNN).The outputs of these three stages are combined through a weighted fusion mechanism,with optimal weights determined empirically.Experiments on MRI scans from the TCGA-KIRC dataset demonstrate that the proposed hybrid framework significantly outperforms standalone models,achieving a Dice Score of 92.5%,an IoU of 87.8%,a Precision of 93.1%,a Recall of 90.8%,and a Hausdorff Distance of 2.8 mm.These findings validate that the weighted integration of complementary architectures effectively overcomes key limitations in kidney tumor segmentation,leading to improved diagnostic accuracy and robustness in medical image analysis.展开更多
The deep learning technology has shown impressive performance in various vision tasks such as image classification, object detection and semantic segmentation. In particular, recent advances of deep learning technique...The deep learning technology has shown impressive performance in various vision tasks such as image classification, object detection and semantic segmentation. In particular, recent advances of deep learning techniques bring encouraging performance to fine-grained image classification which aims to distinguish subordinate-level categories, such as bird species or dog breeds. This task is extremely challenging due to high intra-class and low inter-class variance. In this paper, we review four types of deep learning based fine-grained image classification approaches, including the general convolutional neural networks (CNNs), part detection based, ensemble of networks based and visual attention based fine-grained image classification approaches. Besides, the deep learning based semantic segmentation approaches are also covered in this paper. The region proposal based and fully convolutional networks based approaches for semantic segmentation are introduced respectively.展开更多
Neurons can be abstractly represented as skeletons due to the filament nature of neurites.With the rapid development of imaging and image analysis techniques,an increasing amount of neuron skeleton data is being produ...Neurons can be abstractly represented as skeletons due to the filament nature of neurites.With the rapid development of imaging and image analysis techniques,an increasing amount of neuron skeleton data is being produced.In some scienti fic studies,it is necessary to dissect the axons and dendrites,which is typically done manually and is both tedious and time-consuming.To automate this process,we have developed a method that relies solely on neuronal skeletons using Geometric Deep Learning(GDL).We demonstrate the effectiveness of this method using pyramidal neurons in mammalian brains,and the results are promising for its application in neuroscience studies.展开更多
The process of segmenting point cloud data into several homogeneous areas with points in the same region having the same attributes is known as 3D segmentation.Segmentation is challenging with point cloud data due to...The process of segmenting point cloud data into several homogeneous areas with points in the same region having the same attributes is known as 3D segmentation.Segmentation is challenging with point cloud data due to substantial redundancy,fluctuating sample density and lack of apparent organization.The research area has a wide range of robotics applications,including intelligent vehicles,autonomous mapping and navigation.A number of researchers have introduced various methodologies and algorithms.Deep learning has been successfully used to a spectrum of 2D vision domains as a prevailing A.I.methods.However,due to the specific problems of processing point clouds with deep neural networks,deep learning on point clouds is still in its initial stages.This study examines many strategies that have been presented to 3D instance and semantic segmentation and gives a complete assessment of current developments in deep learning-based 3D segmentation.In these approaches’benefits,draw backs,and design mechanisms are studied and addressed.This study evaluates the impact of various segmentation algorithms on competitiveness on various publicly accessible datasets,as well as the most often used pipelines,their advantages and limits,insightful findings and intriguing future research directions.展开更多
Image semantic segmentation is an important branch of computer vision of a wide variety of practical applications such as medical image analysis,autonomous driving,virtual or augmented reality,etc.In recent years,due ...Image semantic segmentation is an important branch of computer vision of a wide variety of practical applications such as medical image analysis,autonomous driving,virtual or augmented reality,etc.In recent years,due to the remarkable performance of transformer and multilayer perceptron(MLP)in computer vision,which is equivalent to convolutional neural network(CNN),there has been a substantial amount of image semantic segmentation works aimed at developing different types of deep learning architecture.This survey aims to provide a comprehensive overview of deep learning methods in the field of general image semantic segmentation.Firstly,the commonly used image segmentation datasets are listed.Next,extensive pioneering works are deeply studied from multiple perspectives(e.g.,network structures,feature fusion methods,attention mechanisms),and are divided into four categories according to different network architectures:CNN-based architectures,transformer-based architectures,MLP-based architectures,and others.Furthermore,this paper presents some common evaluation metrics and compares the respective advantages and limitations of popular techniques both in terms of architectural design and their experimental value on the most widely used datasets.Finally,possible future research directions and challenges are discussed for the reference of other researchers.展开更多
Microscopy imaging is fundamental in analyzing bacterial morphology and dynamics,offering critical insights into bacterial physiology and pathogenicity.Image segmentation techniques enable quantitative analysis of bac...Microscopy imaging is fundamental in analyzing bacterial morphology and dynamics,offering critical insights into bacterial physiology and pathogenicity.Image segmentation techniques enable quantitative analysis of bacterial structures,facilitating precise measurement of morphological variations and population behaviors at single-cell resolution.This paper reviews advancements in bacterial image segmentation,emphasizing the shift from traditional thresholding and watershed methods to deep learning-driven approaches.Convolutional neural networks(CNNs),U-Net architectures,and three-dimensional(3D)frameworks excel at segmenting dense biofilms and resolving antibiotic-induced morphological changes.These methods combine automated feature extraction with physics-informed postprocessing.Despite progress,challenges persist in computational efficiency,cross-species generalizability,and integration with multimodal experimental workflows.Future progress will depend on improving model robustness across species and imaging modalities,integrating multimodal data for phenotype-function mapping,and developing standard pipelines that link computational tools with clinical diagnostics.These innovations will expand microbial phenotyping beyond structural analysis,enabling deeper insights into bacterial physiology and ecological interactions.展开更多
This systematic review aims to comprehensively examine and compare deep learning methods for brain tumor segmentation and classification using MRI and other imaging modalities,focusing on recent trends from 2022 to 20...This systematic review aims to comprehensively examine and compare deep learning methods for brain tumor segmentation and classification using MRI and other imaging modalities,focusing on recent trends from 2022 to 2025.The primary objective is to evaluate methodological advancements,model performance,dataset usage,and existing challenges in developing clinically robust AI systems.We included peer-reviewed journal articles and highimpact conference papers published between 2022 and 2025,written in English,that proposed or evaluated deep learning methods for brain tumor segmentation and/or classification.Excluded were non-open-access publications,books,and non-English articles.A structured search was conducted across Scopus,Google Scholar,Wiley,and Taylor&Francis,with the last search performed in August 2025.Risk of bias was not formally quantified but considered during full-text screening based on dataset diversity,validation methods,and availability of performance metrics.We used narrative synthesis and tabular benchmarking to compare performance metrics(e.g.,accuracy,Dice score)across model types(CNN,Transformer,Hybrid),imaging modalities,and datasets.A total of 49 studies were included(43 journal articles and 6 conference papers).These studies spanned over 9 public datasets(e.g.,BraTS,Figshare,REMBRANDT,MOLAB)and utilized a range of imaging modalities,predominantly MRI.Hybrid models,especially ResViT and UNetFormer,consistently achieved high performance,with classification accuracy exceeding 98%and segmentation Dice scores above 0.90 across multiple studies.Transformers and hybrid architectures showed increasing adoption post2023.Many studies lacked external validation and were evaluated only on a few benchmark datasets,raising concerns about generalizability and dataset bias.Few studies addressed clinical interpretability or uncertainty quantification.Despite promising results,particularly for hybrid deep learning models,widespread clinical adoption remains limited due to lack of validation,interpretability concerns,and real-world deployment barriers.展开更多
Modern manufacturing processes have become more reliant on automation because of the accelerated transition from Industry 3.0 to Industry 4.0.Manual inspection of products on assembly lines remains inefficient,prone t...Modern manufacturing processes have become more reliant on automation because of the accelerated transition from Industry 3.0 to Industry 4.0.Manual inspection of products on assembly lines remains inefficient,prone to errors and lacks consistency,emphasizing the need for a reliable and automated inspection system.Leveraging both object detection and image segmentation approaches,this research proposes a vision-based solution for the detection of various kinds of tools in the toolkit using deep learning(DL)models.Two Intel RealSense D455f depth cameras were arranged in a top down configuration to capture both RGB and depth images of the toolkits.After applying multiple constraints and enhancing them through preprocessing and augmentation,a dataset consisting of 3300 annotated RGB-D photos was generated.Several DL models were selected through a comprehensive assessment of mean Average Precision(mAP),precision-recall equilibrium,inference latency(target≥30 FPS),and computational burden,resulting in a preference for YOLO and Region-based Convolutional Neural Networks(R-CNN)variants over ViT-based models due to the latter’s increased latency and resource requirements.YOLOV5,YOLOV8,YOLOV11,Faster R-CNN,and Mask R-CNN were trained on the annotated dataset and evaluated using key performance metrics(Recall,Accuracy,F1-score,and Precision).YOLOV11 demonstrated balanced excellence with 93.0%precision,89.9%recall,and a 90.6%F1-score in object detection,as well as 96.9%precision,95.3%recall,and a 96.5%F1-score in instance segmentation with an average inference time of 25 ms per frame(≈40 FPS),demonstrating real-time performance.Leveraging these results,a YOLOV11-based windows application was successfully deployed in a real-time assembly line environment,where it accurately processed live video streams to detect and segment tools within toolkits,demonstrating its practical effectiveness in industrial automation.The application is capable of precisely measuring socket dimensions by utilising edge detection techniques on YOLOv11 segmentation masks,in addition to detection and segmentation.This makes it possible to do specification-level quality control right on the assembly line,which improves the ability to examine things in real time.The implementation is a big step forward for intelligent manufacturing in the Industry 4.0 paradigm.It provides a scalable,efficient,and accurate way to do automated inspection and dimensional verification activities.展开更多
Advanced traffic monitoring systems encounter substantial challenges in vehicle detection and classification due to the limitations of conventional methods,which often demand extensive computational resources and stru...Advanced traffic monitoring systems encounter substantial challenges in vehicle detection and classification due to the limitations of conventional methods,which often demand extensive computational resources and struggle with diverse data acquisition techniques.This research presents a novel approach for vehicle classification and recognition in aerial image sequences,integrating multiple advanced techniques to enhance detection accuracy.The proposed model begins with preprocessing using Multiscale Retinex(MSR)to enhance image quality,followed by Expectation-Maximization(EM)Segmentation for precise foreground object identification.Vehicle detection is performed using the state-of-the-art YOLOv10 framework,while feature extraction incorporates Maximally Stable Extremal Regions(MSER),Dense Scale-Invariant Feature Transform(Dense SIFT),and Zernike Moments Features to capture distinct object characteristics.Feature optimization is further refined through a Hybrid Swarm-based Optimization algorithm,ensuring optimal feature selection for improved classification performance.The final classification is conducted using a Vision Transformer,leveraging its robust learning capabilities for enhanced accuracy.Experimental evaluations on benchmark datasets,including UAVDT and the Unmanned Aerial Vehicle Intruder Dataset(UAVID),demonstrate the superiority of the proposed approach,achieving an accuracy of 94.40%on UAVDT and 93.57%on UAVID.The results highlight the efficacy of the model in significantly enhancing vehicle detection and classification in aerial imagery,outperforming existing methodologies and offering a statistically validated improvement for intelligent traffic monitoring systems compared to existing approaches.展开更多
Deep learning-based methods have become alternatives to traditional numerical weather prediction systems,offering faster computation and the ability to utilize large historical datasets.However,the application of deep...Deep learning-based methods have become alternatives to traditional numerical weather prediction systems,offering faster computation and the ability to utilize large historical datasets.However,the application of deep learning to medium-range regional weather forecasting with limited data remains a significant challenge.In this work,three key solutions are proposed:(1)motivated by the need to improve model performance in data-scarce regional forecasting scenarios,the authors innovatively apply semantic segmentation models,to better capture spatiotemporal features and improve prediction accuracy;(2)recognizing the challenge of overfitting and the inability of traditional noise-based data augmentation methods to effectively enhance model robustness,a novel learnable Gaussian noise mechanism is introduced that allows the model to adaptively optimize perturbations for different locations,ensuring more effective learning;and(3)to address the issue of error accumulation in autoregressive prediction,as well as the challenge of learning difficulty and the lack of intermediate data utilization in one-shot prediction,the authors propose a cascade prediction approach that effectively resolves these problems while significantly improving model forecasting performance.The method achieves a competitive result in The East China Regional AI Medium Range Weather Forecasting Competition.Ablation experiments further validate the effectiveness of each component,highlighting their contributions to enhancing prediction performance.展开更多
The convolutional neural network(CNN)method based on DeepLabv3+has some problems in the semantic segmentation task of high-resolution remote sensing images,such as fixed receiving field size of feature extraction,lack...The convolutional neural network(CNN)method based on DeepLabv3+has some problems in the semantic segmentation task of high-resolution remote sensing images,such as fixed receiving field size of feature extraction,lack of semantic information,high decoder magnification,and insufficient detail retention ability.A hierarchical feature fusion network(HFFNet)was proposed.Firstly,a combination of transformer and CNN architectures was employed for feature extraction from images of varying resolutions.The extracted features were processed independently.Subsequently,the features from the transformer and CNN were fused under the guidance of features from different sources.This fusion process assisted in restoring information more comprehensively during the decoding stage.Furthermore,a spatial channel attention module was designed in the final stage of decoding to refine features and reduce the semantic gap between shallow CNN features and deep decoder features.The experimental results showed that HFFNet had superior performance on UAVid,LoveDA,Potsdam,and Vaihingen datasets,and its cross-linking index was better than DeepLabv3+and other competing methods,showing strong generalization ability.展开更多
Early detection of the Covid-19 disease is essential due to its higher rate of infection affecting tens of millions of people,and its high number of deaths also by 7%.For that purpose,a proposed model of several stage...Early detection of the Covid-19 disease is essential due to its higher rate of infection affecting tens of millions of people,and its high number of deaths also by 7%.For that purpose,a proposed model of several stages was developed.The first stage is optimizing the images using dynamic adaptive histogram equalization,performing a semantic segmentation using DeepLabv3Plus,then augmenting the data by flipping it horizontally,rotating it,then flipping it vertically.The second stage builds a custom convolutional neural network model using several pre-trained ImageNet.Finally,the model compares the pre-trained data to the new output,while repeatedly trimming the best-performing models to reduce complexity and improve memory efficiency.Several experiments were done using different techniques and parameters.Accordingly,the proposed model achieved an average accuracy of 99.6%and an area under the curve of 0.996 in the Covid-19 detection.This paper will discuss how to train a customized intelligent convolutional neural network using various parameters on a set of chest X-rays with an accuracy of 99.6%.展开更多
Semantic segmentation of eye images is a complex task with important applications in human–computer interaction,cognitive science,and neuroscience.Achieving real-time,accurate,and robust segmentation algorithms is cr...Semantic segmentation of eye images is a complex task with important applications in human–computer interaction,cognitive science,and neuroscience.Achieving real-time,accurate,and robust segmentation algorithms is crucial for computationally limited portable devices such as augmented reality and virtual reality.With the rapid advancements in deep learning,many network models have been developed specifically for eye image segmentation.Some methods divide the segmentation process into multiple stages to achieve model parameter miniaturization while enhancing output through post processing techniques to improve segmentation accuracy.These approaches significantly increase the inference time.Other networks adopt more complex encoding and decoding modules to achieve end-to-end output,which requires substantial computation.Therefore,balancing the model’s size,accuracy,and computational complexity is essential.To address these challenges,we propose a lightweight asymmetric UNet architecture and a projection loss function.We utilize ResNet-3 layer blocks to enhance feature extraction efficiency in the encoding stage.In the decoding stage,we employ regular convolutions and skip connections to upscale the feature maps from the latent space to the original image size,balancing the model size and segmentation accuracy.In addition,we leverage the geometric features of the eye region and design a projection loss function to further improve the segmentation accuracy without adding any additional inference computational cost.We validate our approach on the OpenEDS2019 dataset for virtual reality and achieve state-of-the-art performance with 95.33%mean intersection over union(mIoU).Our model has only 0.63M parameters and 350 FPS,which are 68%and 200%of the state-of-the-art model RITNet,respectively.展开更多
Segmenting a breast ultrasound image is still challenging due to the presence of speckle noise,dependency on the operator,and the variation of image quality.This paper presents the UltraSegNet architecture that addres...Segmenting a breast ultrasound image is still challenging due to the presence of speckle noise,dependency on the operator,and the variation of image quality.This paper presents the UltraSegNet architecture that addresses these challenges through three key technical innovations:This work adds three things:(1)a changed ResNet-50 backbone with sequential 3×3 convolutions to keep fine anatomical details that are needed for finding lesion boundaries;(2)a computationally efficient regional attention mechanism that works on high-resolution features without using a transformer’s extra memory;and(3)an adaptive feature fusion strategy that changes local and global featuresbasedonhowthe image isbeing used.Extensive evaluation on two distinct datasets demonstrates UltraSegNet’s superior performance:On the BUSI dataset,it obtains a precision of 0.915,a recall of 0.908,and an F1 score of 0.911.In the UDAIT dataset,it achieves robust performance across the board,with a precision of 0.901 and recall of 0.894.Importantly,these improvements are achieved at clinically feasible computation times,taking 235 ms per image on standard GPU hardware.Notably,UltraSegNet does amazingly well on difficult small lesions(less than 10 mm),achieving a detection accuracy of 0.891.This is a huge improvement over traditional methods that have a hard time with small-scale features,as standard models can only achieve 0.63–0.71 accuracy.This improvement in small lesion detection is particularly crucial for early-stage breast cancer identification.Results from this work demonstrate that UltraSegNet can be practically deployable in clinical workflows to improve breast cancer screening accuracy.展开更多
BACKGROUND Upper gastrointestinal(UGI)diseases present diagnostic challenges during endoscopy due to visual similarities,indistinct boundaries,and observer variability,which can lead to missed diagnoses and delayed tr...BACKGROUND Upper gastrointestinal(UGI)diseases present diagnostic challenges during endoscopy due to visual similarities,indistinct boundaries,and observer variability,which can lead to missed diagnoses and delayed treatment.Automated segmentation using deep learning(DL)models offers the potential to assist endoscopists,improve diagnostic accuracy,and reduce workload.However,multi-class UGI disease segmentation remains underexplored,with limited annotated datasets and insufficient focus on clinical validation.This study hypothesizes that comparative analysis of different DL architectures can identify models suitable for clinical application,providing actionable insights to reduce diagnostic errors and support clinical decision-making in endoscopic practice.AIM To evaluate 17 state-of-the-art DL models for multi-class UGI disease segmentation,emphasizing clinical translation and real-world applicability.METHODS This study evaluated 17 DL models spanning convolutional neural network(CNN)-,transformer-,and mambabased architectures using a self-collected dataset from two hospitals in Macao and Xiangyang(3313 images,9 classes)and the public EDD2020 dataset(386 images,5 classes).Models were assessed for segmentation performance and performance-efficiency trade-off.Statistical analyses were conducted to examine performance differences across architectures.Generalization capability was measured through a cross-dataset evaluation(training models on the self-collected dataset and testing on the EDD2020 dataset).RESULTS Swin-UMamba achieved the highest segmentation performance across both datasets[intersection over union(IoU):89.06%±0.20%self-collected,77.53%±0.32%EDD2020],followed by SegFormer(IoU:88.94%±0.38%selfcollected,77.20%±0.98%EDD2020)and ConvNeXt+UPerNet(IoU:88.48%±0.09%self-collected,76.90%±0.61%EDD2020).Statistical analyses showed no significant differences between paradigms,though hierarchical architectures with pre-trained encoders consistently outperformed simpler designs.SegFormer achieved the best balance of accuracy and computational efficiency with a performance-efficiency trade-off score of 92.02%,making it suitable for real-time clinical use.Cross-dataset evaluation revealed significant performance drops,with generalization retention rates of 64.78%to 71.52%.Transformer-based models,particularly pyramid vision transformer v2+efficient multi-scale convolutional decoding(IoU:63.35%±1.44%),generalized better than CNN-and mambabased models.CONCLUSION Hierarchical architectures like Swin-UMamba and SegFormer show promise for UGI disease segmentation,reducing missed diagnoses and improving workflows,but robust clinical validation is crucial for real-world deployment.展开更多
Semantic segmentation plays a foundational role in biomedical image analysis, providing precise information about cellular, tissue, and organ structures in both biological and medical imaging modalities. Traditional a...Semantic segmentation plays a foundational role in biomedical image analysis, providing precise information about cellular, tissue, and organ structures in both biological and medical imaging modalities. Traditional approaches often fail in the face of challenges such as low contrast, morphological variability, and densely packed structures. Recent advancements in deep learning have transformed segmentation capabilities through the integration of fine-scale detail preservation, coarse-scale contextual modeling, and multi-scale feature fusion. This work provides a comprehensive analysis of state-of-the-art deep learning models, including U-Net variants, attention-based frameworks, and Transformer-integrated networks, highlighting innovations that improve accuracy, generalizability, and computational efficiency. Key architectural components such as convolution operations, shallow and deep blocks, skip connections, and hybrid encoders are examined for their roles in enhancing spatial representation and semantic consistency. We further discuss the importance of hierarchical and instance-aware segmentation and annotation in interpreting complex biological scenes and multiplexed medical images. By bridging methodological developments with diverse application domains, this paper outlines current trends and future directions for semantic segmentation, emphasizing its critical role in facilitating annotation, diagnosis, and discovery in biomedical research.展开更多
Remote sensing image segmentation has a wide range of applications in land cover classification,urban building recognition,crop monitoring,and other fields.In recent years,with the booming development of deep learning...Remote sensing image segmentation has a wide range of applications in land cover classification,urban building recognition,crop monitoring,and other fields.In recent years,with the booming development of deep learning,remote sensing image segmentation models based on deep learning have gradually emerged and produced a large number of scientific research achievements.This article is based on deep learning and reviews the latest achievements in remote sensing image segmentation,exploring future development directions.Firstly,the basic concepts,characteristics,classification,tasks,and commonly used datasets of remote sensingimages are presented.Secondly,the segmentation models based on deep learning were classified and summarized,and the principles,characteristics,and applications of various models were presented.Then,the key technologies involved in deep learning remote sensing image segmentation were introduced.Finally,the future development direction and applicationprospects of remote sensing image segmentation were discussed.This article reviews the latest research achievements in remote sensing image segmentationfrom the perspective of deep learning,which can provide reference and inspiration for the research of remote sensing image segmentation.展开更多
Automated prostate cancer detection in magnetic resonance imaging(MRI)scans is of significant importance for cancer patient management.Most existing computer-aided diagnosis systems adopt segmentation methods while ob...Automated prostate cancer detection in magnetic resonance imaging(MRI)scans is of significant importance for cancer patient management.Most existing computer-aided diagnosis systems adopt segmentation methods while object detection approaches recently show promising results.The authors have(1)carefully compared performances of most-developed segmentation and object detection methods in localising prostate imaging reporting and data system(PIRADS)-labelled prostate lesions on MRI scans;(2)proposed an additional customised set of lesion-level localisation sensitivity and precision;(3)proposed efficient ways to ensemble the segmentation and object detection methods for improved performances.The ground-truth(GT)perspective lesion-level sensitivity and prediction-perspective lesion-level precision are reported,to quantify the ratios of true positive voxels being detected by algorithms over the number of voxels in the GT labelled regions and predicted regions.The two networks are trained independently on 549 clinical patients data with PIRADS-V2 as GT labels,and tested on 161 internal and 100 external MRI scans.At the lesion level,nnDetection outperforms nnUNet for detecting both PIRADS≥3 and PIRADS≥4 lesions in majority cases.For example,at the average false positive prediction per patient being 3,nnDetection achieves a greater Intersection-of-Union(IoU)-based sensitivity than nnUNet for detecting PIRADS≥3 lesions,being 80.78%�1.50%versus 60.40%�1.64%(p<0.01).At the voxel level,nnUnet is in general superior or comparable to nnDetection.The proposed ensemble methods achieve improved or comparable lesion-level accuracy,in all tested clinical scenarios.For example,at 3 false positives,the lesion-wise ensemble method achieves 82.24%�1.43%sensitivity versus 80.78%�1.50%(nnDetection)and 60.40%�1.64%(nnUNet)for detecting PIRADS≥3 lesions.Consistent conclusions are also drawn from results on the external data set.展开更多
Remote sensing images contain a wealth of geospatial information.To accurately identify different geospatial categories and extract relevant data,image semantic segmentation plays a crucial role.In recent years,deep l...Remote sensing images contain a wealth of geospatial information.To accurately identify different geospatial categories and extract relevant data,image semantic segmentation plays a crucial role.In recent years,deep learning technology has brought significant breakthroughs to semantic segmentation of remote sensing images,significantly enhancing its performance.This paper investigates the application of deep learning technologies in remote sensing image semantic segmentation,based on Convolutional Neural Networks(CNN)and Transformer-based semantic segmentation methods.It conducts an in-depth comparison of their structural characteristics and applicable scenarios,summarizes the achievements and shortcomings of existing research,and provides technical references and theoretical support for future studies,thereby contributing to the further development of deep learning technology in the field of remote sensing.Research results indicate that CNN-based semantic segmentation methods still hold advantages in extracting local features and achieving efficient segmentation,whereas Transformers address CNN's limitations in global context modeling and long-range dependency capture.Therefore,the collaborative integration of CNN and Transformers will become an important research direction for enhancing model performance in the future.展开更多
Currently, numerous automatic fabric defect detection algorithms have been proposed. Traditional machine vision algorithms that set separate parameters for different textures and defects rely on the manual design of c...Currently, numerous automatic fabric defect detection algorithms have been proposed. Traditional machine vision algorithms that set separate parameters for different textures and defects rely on the manual design of corresponding features to complete the detection. To overcome the limitations of traditional algorithms, deep learning-based correlative algorithms can extract more complex image features and perform better in image classification and object detection. A pixel-level defect segmentation methodology using DeepLabv3+, a classical semantic segmentation network, is proposed in this paper. Based on ResNet-18,ResNet-50 and Mobilenetv2, three DeepLabv3+ networks are constructed, which are trained and tested from data sets produced by capturing or publicizing images. The experimental results show that the performance of three DeepLabv3+ networks is close to one another on the four indicators proposed(Precision, Recall, F1-score and Accuracy), proving them to achieve defect detection and semantic segmentation, which provide new ideas and technical support for fabric defect detection.展开更多
基金funded by the Ongoing Research Funding Program-Research Chairs(ORF-RC-2025-2400),King Saud University,Riyadh,Saudi Arabia。
文摘Recent studies indicate that millions of individuals suffer from renal diseases,with renal carcinoma,a type of kidney cancer,emerging as both a chronic illness and a significant cause of mortality.Magnetic Resonance Imaging(MRI)and Computed Tomography(CT)have become essential tools for diagnosing and assessing kidney disorders.However,accurate analysis of thesemedical images is critical for detecting and evaluating tumor severity.This study introduces an integrated hybrid framework that combines three complementary deep learning models for kidney tumor segmentation from MRI images.The proposed framework fuses a customized U-Net and Mask R-CNN using a weighted scheme to achieve semantic and instance-level segmentation.The fused outputs are further refined through edge detection using Stochastic FeatureMapping Neural Networks(SFMNN),while volumetric consistency is ensured through Improved Mini-Batch K-Means(IMBKM)clustering integrated with an Encoder-Decoder Convolutional Neural Network(EDCNN).The outputs of these three stages are combined through a weighted fusion mechanism,with optimal weights determined empirically.Experiments on MRI scans from the TCGA-KIRC dataset demonstrate that the proposed hybrid framework significantly outperforms standalone models,achieving a Dice Score of 92.5%,an IoU of 87.8%,a Precision of 93.1%,a Recall of 90.8%,and a Hausdorff Distance of 2.8 mm.These findings validate that the weighted integration of complementary architectures effectively overcomes key limitations in kidney tumor segmentation,leading to improved diagnostic accuracy and robustness in medical image analysis.
基金supported by the National Natural Science Foundation of China(Nos.61373121 and 61328205)Program for Sichuan Provincial Science Fund for Distinguished Young Scholars(No.13QNJJ0149)+1 种基金the Fundamental Research Funds for the Central UniversitiesChina Scholarship Council(No.201507000032)
文摘The deep learning technology has shown impressive performance in various vision tasks such as image classification, object detection and semantic segmentation. In particular, recent advances of deep learning techniques bring encouraging performance to fine-grained image classification which aims to distinguish subordinate-level categories, such as bird species or dog breeds. This task is extremely challenging due to high intra-class and low inter-class variance. In this paper, we review four types of deep learning based fine-grained image classification approaches, including the general convolutional neural networks (CNNs), part detection based, ensemble of networks based and visual attention based fine-grained image classification approaches. Besides, the deep learning based semantic segmentation approaches are also covered in this paper. The region proposal based and fully convolutional networks based approaches for semantic segmentation are introduced respectively.
基金supported by the Simons Foundation,the National Natural Science Foundation of China(No.NSFC61405038)the Fujian provincial fund(No.2020J01453).
文摘Neurons can be abstractly represented as skeletons due to the filament nature of neurites.With the rapid development of imaging and image analysis techniques,an increasing amount of neuron skeleton data is being produced.In some scienti fic studies,it is necessary to dissect the axons and dendrites,which is typically done manually and is both tedious and time-consuming.To automate this process,we have developed a method that relies solely on neuronal skeletons using Geometric Deep Learning(GDL).We demonstrate the effectiveness of this method using pyramidal neurons in mammalian brains,and the results are promising for its application in neuroscience studies.
基金This research was supported by the BB21 plus funded by Busan Metropolitan City and Busan Institute for Talent and Lifelong Education(BIT)and a grant from Tongmyong University Innovated University Research Park(I-URP)funded by Busan Metropolitan City,Republic of Korea.
文摘The process of segmenting point cloud data into several homogeneous areas with points in the same region having the same attributes is known as 3D segmentation.Segmentation is challenging with point cloud data due to substantial redundancy,fluctuating sample density and lack of apparent organization.The research area has a wide range of robotics applications,including intelligent vehicles,autonomous mapping and navigation.A number of researchers have introduced various methodologies and algorithms.Deep learning has been successfully used to a spectrum of 2D vision domains as a prevailing A.I.methods.However,due to the specific problems of processing point clouds with deep neural networks,deep learning on point clouds is still in its initial stages.This study examines many strategies that have been presented to 3D instance and semantic segmentation and gives a complete assessment of current developments in deep learning-based 3D segmentation.In these approaches’benefits,draw backs,and design mechanisms are studied and addressed.This study evaluates the impact of various segmentation algorithms on competitiveness on various publicly accessible datasets,as well as the most often used pipelines,their advantages and limits,insightful findings and intriguing future research directions.
基金supported by the Major science and technology project of Hainan Province(Grant No.ZDKJ2020012)National Natural Science Foundation of China(Grant No.62162024 and 62162022)+1 种基金Key Projects in Hainan Province(Grant ZDYF2021GXJS003 and Grant ZDYF2020040)Graduate Innovation Project(Grant No.Qhys2021-187).
文摘Image semantic segmentation is an important branch of computer vision of a wide variety of practical applications such as medical image analysis,autonomous driving,virtual or augmented reality,etc.In recent years,due to the remarkable performance of transformer and multilayer perceptron(MLP)in computer vision,which is equivalent to convolutional neural network(CNN),there has been a substantial amount of image semantic segmentation works aimed at developing different types of deep learning architecture.This survey aims to provide a comprehensive overview of deep learning methods in the field of general image semantic segmentation.Firstly,the commonly used image segmentation datasets are listed.Next,extensive pioneering works are deeply studied from multiple perspectives(e.g.,network structures,feature fusion methods,attention mechanisms),and are divided into four categories according to different network architectures:CNN-based architectures,transformer-based architectures,MLP-based architectures,and others.Furthermore,this paper presents some common evaluation metrics and compares the respective advantages and limitations of popular techniques both in terms of architectural design and their experimental value on the most widely used datasets.Finally,possible future research directions and challenges are discussed for the reference of other researchers.
基金financially supported by the Open Project Program of Wuhan National Laboratory for Optoelectronics(No.2022WNLOKF009)the National Natural Science Foundation of China(No.62475216)+2 种基金the Key Research and Development Program of Shaanxi(No.2024GH-ZDXM-37)the Fujian Provincial Natural Science Foundation of China(No.2024J01060)the Startup Program of XMU,and the Fundamental Research Funds for the Central Universities.
文摘Microscopy imaging is fundamental in analyzing bacterial morphology and dynamics,offering critical insights into bacterial physiology and pathogenicity.Image segmentation techniques enable quantitative analysis of bacterial structures,facilitating precise measurement of morphological variations and population behaviors at single-cell resolution.This paper reviews advancements in bacterial image segmentation,emphasizing the shift from traditional thresholding and watershed methods to deep learning-driven approaches.Convolutional neural networks(CNNs),U-Net architectures,and three-dimensional(3D)frameworks excel at segmenting dense biofilms and resolving antibiotic-induced morphological changes.These methods combine automated feature extraction with physics-informed postprocessing.Despite progress,challenges persist in computational efficiency,cross-species generalizability,and integration with multimodal experimental workflows.Future progress will depend on improving model robustness across species and imaging modalities,integrating multimodal data for phenotype-function mapping,and developing standard pipelines that link computational tools with clinical diagnostics.These innovations will expand microbial phenotyping beyond structural analysis,enabling deeper insights into bacterial physiology and ecological interactions.
文摘This systematic review aims to comprehensively examine and compare deep learning methods for brain tumor segmentation and classification using MRI and other imaging modalities,focusing on recent trends from 2022 to 2025.The primary objective is to evaluate methodological advancements,model performance,dataset usage,and existing challenges in developing clinically robust AI systems.We included peer-reviewed journal articles and highimpact conference papers published between 2022 and 2025,written in English,that proposed or evaluated deep learning methods for brain tumor segmentation and/or classification.Excluded were non-open-access publications,books,and non-English articles.A structured search was conducted across Scopus,Google Scholar,Wiley,and Taylor&Francis,with the last search performed in August 2025.Risk of bias was not formally quantified but considered during full-text screening based on dataset diversity,validation methods,and availability of performance metrics.We used narrative synthesis and tabular benchmarking to compare performance metrics(e.g.,accuracy,Dice score)across model types(CNN,Transformer,Hybrid),imaging modalities,and datasets.A total of 49 studies were included(43 journal articles and 6 conference papers).These studies spanned over 9 public datasets(e.g.,BraTS,Figshare,REMBRANDT,MOLAB)and utilized a range of imaging modalities,predominantly MRI.Hybrid models,especially ResViT and UNetFormer,consistently achieved high performance,with classification accuracy exceeding 98%and segmentation Dice scores above 0.90 across multiple studies.Transformers and hybrid architectures showed increasing adoption post2023.Many studies lacked external validation and were evaluated only on a few benchmark datasets,raising concerns about generalizability and dataset bias.Few studies addressed clinical interpretability or uncertainty quantification.Despite promising results,particularly for hybrid deep learning models,widespread clinical adoption remains limited due to lack of validation,interpretability concerns,and real-world deployment barriers.
文摘Modern manufacturing processes have become more reliant on automation because of the accelerated transition from Industry 3.0 to Industry 4.0.Manual inspection of products on assembly lines remains inefficient,prone to errors and lacks consistency,emphasizing the need for a reliable and automated inspection system.Leveraging both object detection and image segmentation approaches,this research proposes a vision-based solution for the detection of various kinds of tools in the toolkit using deep learning(DL)models.Two Intel RealSense D455f depth cameras were arranged in a top down configuration to capture both RGB and depth images of the toolkits.After applying multiple constraints and enhancing them through preprocessing and augmentation,a dataset consisting of 3300 annotated RGB-D photos was generated.Several DL models were selected through a comprehensive assessment of mean Average Precision(mAP),precision-recall equilibrium,inference latency(target≥30 FPS),and computational burden,resulting in a preference for YOLO and Region-based Convolutional Neural Networks(R-CNN)variants over ViT-based models due to the latter’s increased latency and resource requirements.YOLOV5,YOLOV8,YOLOV11,Faster R-CNN,and Mask R-CNN were trained on the annotated dataset and evaluated using key performance metrics(Recall,Accuracy,F1-score,and Precision).YOLOV11 demonstrated balanced excellence with 93.0%precision,89.9%recall,and a 90.6%F1-score in object detection,as well as 96.9%precision,95.3%recall,and a 96.5%F1-score in instance segmentation with an average inference time of 25 ms per frame(≈40 FPS),demonstrating real-time performance.Leveraging these results,a YOLOV11-based windows application was successfully deployed in a real-time assembly line environment,where it accurately processed live video streams to detect and segment tools within toolkits,demonstrating its practical effectiveness in industrial automation.The application is capable of precisely measuring socket dimensions by utilising edge detection techniques on YOLOv11 segmentation masks,in addition to detection and segmentation.This makes it possible to do specification-level quality control right on the assembly line,which improves the ability to examine things in real time.The implementation is a big step forward for intelligent manufacturing in the Industry 4.0 paradigm.It provides a scalable,efficient,and accurate way to do automated inspection and dimensional verification activities.
文摘Advanced traffic monitoring systems encounter substantial challenges in vehicle detection and classification due to the limitations of conventional methods,which often demand extensive computational resources and struggle with diverse data acquisition techniques.This research presents a novel approach for vehicle classification and recognition in aerial image sequences,integrating multiple advanced techniques to enhance detection accuracy.The proposed model begins with preprocessing using Multiscale Retinex(MSR)to enhance image quality,followed by Expectation-Maximization(EM)Segmentation for precise foreground object identification.Vehicle detection is performed using the state-of-the-art YOLOv10 framework,while feature extraction incorporates Maximally Stable Extremal Regions(MSER),Dense Scale-Invariant Feature Transform(Dense SIFT),and Zernike Moments Features to capture distinct object characteristics.Feature optimization is further refined through a Hybrid Swarm-based Optimization algorithm,ensuring optimal feature selection for improved classification performance.The final classification is conducted using a Vision Transformer,leveraging its robust learning capabilities for enhanced accuracy.Experimental evaluations on benchmark datasets,including UAVDT and the Unmanned Aerial Vehicle Intruder Dataset(UAVID),demonstrate the superiority of the proposed approach,achieving an accuracy of 94.40%on UAVDT and 93.57%on UAVID.The results highlight the efficacy of the model in significantly enhancing vehicle detection and classification in aerial imagery,outperforming existing methodologies and offering a statistically validated improvement for intelligent traffic monitoring systems compared to existing approaches.
基金supported by the National Natural Science Foundation of China[grant number 62376217]the Young Elite Scientists Sponsorship Program by CAST[grant number 2023QNRC001]the Joint Research Project for Meteorological Capacity Improvement[grant number 24NLTSZ003]。
文摘Deep learning-based methods have become alternatives to traditional numerical weather prediction systems,offering faster computation and the ability to utilize large historical datasets.However,the application of deep learning to medium-range regional weather forecasting with limited data remains a significant challenge.In this work,three key solutions are proposed:(1)motivated by the need to improve model performance in data-scarce regional forecasting scenarios,the authors innovatively apply semantic segmentation models,to better capture spatiotemporal features and improve prediction accuracy;(2)recognizing the challenge of overfitting and the inability of traditional noise-based data augmentation methods to effectively enhance model robustness,a novel learnable Gaussian noise mechanism is introduced that allows the model to adaptively optimize perturbations for different locations,ensuring more effective learning;and(3)to address the issue of error accumulation in autoregressive prediction,as well as the challenge of learning difficulty and the lack of intermediate data utilization in one-shot prediction,the authors propose a cascade prediction approach that effectively resolves these problems while significantly improving model forecasting performance.The method achieves a competitive result in The East China Regional AI Medium Range Weather Forecasting Competition.Ablation experiments further validate the effectiveness of each component,highlighting their contributions to enhancing prediction performance.
基金supported by National Natural Science Foundation of China(No.52374155)Anhui Provincial Natural Science Foundation(No.2308085 MF218).
文摘The convolutional neural network(CNN)method based on DeepLabv3+has some problems in the semantic segmentation task of high-resolution remote sensing images,such as fixed receiving field size of feature extraction,lack of semantic information,high decoder magnification,and insufficient detail retention ability.A hierarchical feature fusion network(HFFNet)was proposed.Firstly,a combination of transformer and CNN architectures was employed for feature extraction from images of varying resolutions.The extracted features were processed independently.Subsequently,the features from the transformer and CNN were fused under the guidance of features from different sources.This fusion process assisted in restoring information more comprehensively during the decoding stage.Furthermore,a spatial channel attention module was designed in the final stage of decoding to refine features and reduce the semantic gap between shallow CNN features and deep decoder features.The experimental results showed that HFFNet had superior performance on UAVid,LoveDA,Potsdam,and Vaihingen datasets,and its cross-linking index was better than DeepLabv3+and other competing methods,showing strong generalization ability.
基金This work was supported by the National Research Foundation of Korea-Grant funded by the Korean Government(Ministry of Science and ICT)-NRF-2020R1A2B5B02002478).There was no additional external funding received for this study.
文摘Early detection of the Covid-19 disease is essential due to its higher rate of infection affecting tens of millions of people,and its high number of deaths also by 7%.For that purpose,a proposed model of several stages was developed.The first stage is optimizing the images using dynamic adaptive histogram equalization,performing a semantic segmentation using DeepLabv3Plus,then augmenting the data by flipping it horizontally,rotating it,then flipping it vertically.The second stage builds a custom convolutional neural network model using several pre-trained ImageNet.Finally,the model compares the pre-trained data to the new output,while repeatedly trimming the best-performing models to reduce complexity and improve memory efficiency.Several experiments were done using different techniques and parameters.Accordingly,the proposed model achieved an average accuracy of 99.6%and an area under the curve of 0.996 in the Covid-19 detection.This paper will discuss how to train a customized intelligent convolutional neural network using various parameters on a set of chest X-rays with an accuracy of 99.6%.
基金supported by the HFIPS Director’s Foundation(YZJJ202207-TS),the National Natural Science Foundation of China(82371931)the Natural Science Foundation of Anhui Province(2008085MC69)+3 种基金the Natural Science Foundation of Hefei City(2021033)the General Scientific Research Project of Anhui Provincial Health Commission(AHWJ2021b150)the Collaborative Innovation Program of Hefei Science Center,CAS(2021HSC-CIP013)the Anhui Province Key Research and Development Project(202204295107020004).
文摘Semantic segmentation of eye images is a complex task with important applications in human–computer interaction,cognitive science,and neuroscience.Achieving real-time,accurate,and robust segmentation algorithms is crucial for computationally limited portable devices such as augmented reality and virtual reality.With the rapid advancements in deep learning,many network models have been developed specifically for eye image segmentation.Some methods divide the segmentation process into multiple stages to achieve model parameter miniaturization while enhancing output through post processing techniques to improve segmentation accuracy.These approaches significantly increase the inference time.Other networks adopt more complex encoding and decoding modules to achieve end-to-end output,which requires substantial computation.Therefore,balancing the model’s size,accuracy,and computational complexity is essential.To address these challenges,we propose a lightweight asymmetric UNet architecture and a projection loss function.We utilize ResNet-3 layer blocks to enhance feature extraction efficiency in the encoding stage.In the decoding stage,we employ regular convolutions and skip connections to upscale the feature maps from the latent space to the original image size,balancing the model size and segmentation accuracy.In addition,we leverage the geometric features of the eye region and design a projection loss function to further improve the segmentation accuracy without adding any additional inference computational cost.We validate our approach on the OpenEDS2019 dataset for virtual reality and achieve state-of-the-art performance with 95.33%mean intersection over union(mIoU).Our model has only 0.63M parameters and 350 FPS,which are 68%and 200%of the state-of-the-art model RITNet,respectively.
基金funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R435),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Segmenting a breast ultrasound image is still challenging due to the presence of speckle noise,dependency on the operator,and the variation of image quality.This paper presents the UltraSegNet architecture that addresses these challenges through three key technical innovations:This work adds three things:(1)a changed ResNet-50 backbone with sequential 3×3 convolutions to keep fine anatomical details that are needed for finding lesion boundaries;(2)a computationally efficient regional attention mechanism that works on high-resolution features without using a transformer’s extra memory;and(3)an adaptive feature fusion strategy that changes local and global featuresbasedonhowthe image isbeing used.Extensive evaluation on two distinct datasets demonstrates UltraSegNet’s superior performance:On the BUSI dataset,it obtains a precision of 0.915,a recall of 0.908,and an F1 score of 0.911.In the UDAIT dataset,it achieves robust performance across the board,with a precision of 0.901 and recall of 0.894.Importantly,these improvements are achieved at clinically feasible computation times,taking 235 ms per image on standard GPU hardware.Notably,UltraSegNet does amazingly well on difficult small lesions(less than 10 mm),achieving a detection accuracy of 0.891.This is a huge improvement over traditional methods that have a hard time with small-scale features,as standard models can only achieve 0.63–0.71 accuracy.This improvement in small lesion detection is particularly crucial for early-stage breast cancer identification.Results from this work demonstrate that UltraSegNet can be practically deployable in clinical workflows to improve breast cancer screening accuracy.
基金Supported by the Guangdong Basic and Applied Basic Research Foundation,No.2021B1515130003the Key Research and Development Plan of Hubei Province,No.2022BCE034the Natural Science Foundation of Hubei Province,No.2024AFB1054.
文摘BACKGROUND Upper gastrointestinal(UGI)diseases present diagnostic challenges during endoscopy due to visual similarities,indistinct boundaries,and observer variability,which can lead to missed diagnoses and delayed treatment.Automated segmentation using deep learning(DL)models offers the potential to assist endoscopists,improve diagnostic accuracy,and reduce workload.However,multi-class UGI disease segmentation remains underexplored,with limited annotated datasets and insufficient focus on clinical validation.This study hypothesizes that comparative analysis of different DL architectures can identify models suitable for clinical application,providing actionable insights to reduce diagnostic errors and support clinical decision-making in endoscopic practice.AIM To evaluate 17 state-of-the-art DL models for multi-class UGI disease segmentation,emphasizing clinical translation and real-world applicability.METHODS This study evaluated 17 DL models spanning convolutional neural network(CNN)-,transformer-,and mambabased architectures using a self-collected dataset from two hospitals in Macao and Xiangyang(3313 images,9 classes)and the public EDD2020 dataset(386 images,5 classes).Models were assessed for segmentation performance and performance-efficiency trade-off.Statistical analyses were conducted to examine performance differences across architectures.Generalization capability was measured through a cross-dataset evaluation(training models on the self-collected dataset and testing on the EDD2020 dataset).RESULTS Swin-UMamba achieved the highest segmentation performance across both datasets[intersection over union(IoU):89.06%±0.20%self-collected,77.53%±0.32%EDD2020],followed by SegFormer(IoU:88.94%±0.38%selfcollected,77.20%±0.98%EDD2020)and ConvNeXt+UPerNet(IoU:88.48%±0.09%self-collected,76.90%±0.61%EDD2020).Statistical analyses showed no significant differences between paradigms,though hierarchical architectures with pre-trained encoders consistently outperformed simpler designs.SegFormer achieved the best balance of accuracy and computational efficiency with a performance-efficiency trade-off score of 92.02%,making it suitable for real-time clinical use.Cross-dataset evaluation revealed significant performance drops,with generalization retention rates of 64.78%to 71.52%.Transformer-based models,particularly pyramid vision transformer v2+efficient multi-scale convolutional decoding(IoU:63.35%±1.44%),generalized better than CNN-and mambabased models.CONCLUSION Hierarchical architectures like Swin-UMamba and SegFormer show promise for UGI disease segmentation,reducing missed diagnoses and improving workflows,but robust clinical validation is crucial for real-world deployment.
基金Open Access funding provided by the National Institutes of Health(NIH)The funding for this project was provided by NCATS Intramural Fund.
文摘Semantic segmentation plays a foundational role in biomedical image analysis, providing precise information about cellular, tissue, and organ structures in both biological and medical imaging modalities. Traditional approaches often fail in the face of challenges such as low contrast, morphological variability, and densely packed structures. Recent advancements in deep learning have transformed segmentation capabilities through the integration of fine-scale detail preservation, coarse-scale contextual modeling, and multi-scale feature fusion. This work provides a comprehensive analysis of state-of-the-art deep learning models, including U-Net variants, attention-based frameworks, and Transformer-integrated networks, highlighting innovations that improve accuracy, generalizability, and computational efficiency. Key architectural components such as convolution operations, shallow and deep blocks, skip connections, and hybrid encoders are examined for their roles in enhancing spatial representation and semantic consistency. We further discuss the importance of hierarchical and instance-aware segmentation and annotation in interpreting complex biological scenes and multiplexed medical images. By bridging methodological developments with diverse application domains, this paper outlines current trends and future directions for semantic segmentation, emphasizing its critical role in facilitating annotation, diagnosis, and discovery in biomedical research.
文摘Remote sensing image segmentation has a wide range of applications in land cover classification,urban building recognition,crop monitoring,and other fields.In recent years,with the booming development of deep learning,remote sensing image segmentation models based on deep learning have gradually emerged and produced a large number of scientific research achievements.This article is based on deep learning and reviews the latest achievements in remote sensing image segmentation,exploring future development directions.Firstly,the basic concepts,characteristics,classification,tasks,and commonly used datasets of remote sensingimages are presented.Secondly,the segmentation models based on deep learning were classified and summarized,and the principles,characteristics,and applications of various models were presented.Then,the key technologies involved in deep learning remote sensing image segmentation were introduced.Finally,the future development direction and applicationprospects of remote sensing image segmentation were discussed.This article reviews the latest research achievements in remote sensing image segmentationfrom the perspective of deep learning,which can provide reference and inspiration for the research of remote sensing image segmentation.
基金National Natural Science Foundation of China,Grant/Award Number:62303275International Alliance for Cancer Early Detection,Grant/Award Numbers:C28070/A30912,C73666/A31378Wellcome/EPSRC Centre for Interventional and Surgical Sciences,Grant/Award Number:203145Z/16/Z。
文摘Automated prostate cancer detection in magnetic resonance imaging(MRI)scans is of significant importance for cancer patient management.Most existing computer-aided diagnosis systems adopt segmentation methods while object detection approaches recently show promising results.The authors have(1)carefully compared performances of most-developed segmentation and object detection methods in localising prostate imaging reporting and data system(PIRADS)-labelled prostate lesions on MRI scans;(2)proposed an additional customised set of lesion-level localisation sensitivity and precision;(3)proposed efficient ways to ensemble the segmentation and object detection methods for improved performances.The ground-truth(GT)perspective lesion-level sensitivity and prediction-perspective lesion-level precision are reported,to quantify the ratios of true positive voxels being detected by algorithms over the number of voxels in the GT labelled regions and predicted regions.The two networks are trained independently on 549 clinical patients data with PIRADS-V2 as GT labels,and tested on 161 internal and 100 external MRI scans.At the lesion level,nnDetection outperforms nnUNet for detecting both PIRADS≥3 and PIRADS≥4 lesions in majority cases.For example,at the average false positive prediction per patient being 3,nnDetection achieves a greater Intersection-of-Union(IoU)-based sensitivity than nnUNet for detecting PIRADS≥3 lesions,being 80.78%�1.50%versus 60.40%�1.64%(p<0.01).At the voxel level,nnUnet is in general superior or comparable to nnDetection.The proposed ensemble methods achieve improved or comparable lesion-level accuracy,in all tested clinical scenarios.For example,at 3 false positives,the lesion-wise ensemble method achieves 82.24%�1.43%sensitivity versus 80.78%�1.50%(nnDetection)and 60.40%�1.64%(nnUNet)for detecting PIRADS≥3 lesions.Consistent conclusions are also drawn from results on the external data set.
文摘Remote sensing images contain a wealth of geospatial information.To accurately identify different geospatial categories and extract relevant data,image semantic segmentation plays a crucial role.In recent years,deep learning technology has brought significant breakthroughs to semantic segmentation of remote sensing images,significantly enhancing its performance.This paper investigates the application of deep learning technologies in remote sensing image semantic segmentation,based on Convolutional Neural Networks(CNN)and Transformer-based semantic segmentation methods.It conducts an in-depth comparison of their structural characteristics and applicable scenarios,summarizes the achievements and shortcomings of existing research,and provides technical references and theoretical support for future studies,thereby contributing to the further development of deep learning technology in the field of remote sensing.Research results indicate that CNN-based semantic segmentation methods still hold advantages in extracting local features and achieving efficient segmentation,whereas Transformers address CNN's limitations in global context modeling and long-range dependency capture.Therefore,the collaborative integration of CNN and Transformers will become an important research direction for enhancing model performance in the future.
基金Supported by the National Natural Science Foundation of China(61876106)Shanghai Local Capacity-Building Project(19030501200)。
文摘Currently, numerous automatic fabric defect detection algorithms have been proposed. Traditional machine vision algorithms that set separate parameters for different textures and defects rely on the manual design of corresponding features to complete the detection. To overcome the limitations of traditional algorithms, deep learning-based correlative algorithms can extract more complex image features and perform better in image classification and object detection. A pixel-level defect segmentation methodology using DeepLabv3+, a classical semantic segmentation network, is proposed in this paper. Based on ResNet-18,ResNet-50 and Mobilenetv2, three DeepLabv3+ networks are constructed, which are trained and tested from data sets produced by capturing or publicizing images. The experimental results show that the performance of three DeepLabv3+ networks is close to one another on the four indicators proposed(Precision, Recall, F1-score and Accuracy), proving them to achieve defect detection and semantic segmentation, which provide new ideas and technical support for fabric defect detection.