Image segmentation is crucial for various research areas. Manycomputer vision applications depend on segmenting images to understandthe scene, such as autonomous driving, surveillance systems, robotics, andmedical ima...Image segmentation is crucial for various research areas. Manycomputer vision applications depend on segmenting images to understandthe scene, such as autonomous driving, surveillance systems, robotics, andmedical imaging. With the recent advances in deep learning (DL) and itsconfounding results in image segmentation, more attention has been drawnto its use in medical image segmentation. This article introduces a surveyof the state-of-the-art deep convolution neural network (CNN) models andmechanisms utilized in image segmentation. First, segmentation models arecategorized based on their model architecture and primary working principle.Then, CNN categories are described, and various models are discussed withineach category. Compared with other existing surveys, several applicationswith multiple architectural adaptations are discussed within each category.A comparative summary is included to give the reader insights into utilizedarchitectures in different applications and datasets. This study focuses onmedical image segmentation applications, where the most widely used architecturesare illustrated, and other promising models are suggested that haveproven their success in different domains. Finally, the present work discussescurrent limitations and solutions along with future trends in the field.展开更多
Objective To propose two novel methods based on deep learning for computer-aided tongue diagnosis,including tongue image segmentation and tongue color classification,improving their diagnostic accuracy.Methods LabelMe...Objective To propose two novel methods based on deep learning for computer-aided tongue diagnosis,including tongue image segmentation and tongue color classification,improving their diagnostic accuracy.Methods LabelMe was used to label the tongue mask and Snake model to optimize the labeling results.A new dataset was constructed for tongue image segmentation.Tongue color was marked to build a classified dataset for network training.In this research,the Inception+Atrous Spatial Pyramid Pooling(ASPP)+UNet(IAUNet)method was proposed for tongue image segmentation,based on the existing UNet,Inception,and atrous convolution.Moreover,the Tongue Color Classification Net(TCCNet)was constructed with reference to ResNet,Inception,and Triple-Loss.Several important measurement indexes were selected to evaluate and compare the effects of the novel and existing methods for tongue segmentation and tongue color classification.IAUNet was compared with existing mainstream methods such as UNet and DeepLabV3+for tongue segmentation.TCCNet for tongue color classification was compared with VGG16 and GoogLeNet.Results IAUNet can accurately segment the tongue from original images.The results showed that the Mean Intersection over Union(MIoU)of IAUNet reached 96.30%,and its Mean Pixel Accuracy(MPA),mean Average Precision(mAP),F1-Score,G-Score,and Area Under Curve(AUC)reached 97.86%,99.18%,96.71%,96.82%,and 99.71%,respectively,suggesting IAUNet produced better segmentation than other methods,with fewer parameters.Triplet-Loss was applied in the proposed TCCNet to separate different embedded colors.The experiment yielded ideal results,with F1-Score and mAP of the TCCNet reached 88.86% and 93.49%,respectively.Conclusion IAUNet based on deep learning for tongue segmentation is better than traditional ones.IAUNet can not only produce ideal tongue segmentation,but have better effects than those of PSPNet,SegNet,UNet,and DeepLabV3+,the traditional networks.As for tongue color classification,the proposed network,TCCNet,had better F1-Score and mAP values as compared with other neural networks such as VGG16 and GoogLeNet.展开更多
Modern manufacturing processes have become more reliant on automation because of the accelerated transition from Industry 3.0 to Industry 4.0.Manual inspection of products on assembly lines remains inefficient,prone t...Modern manufacturing processes have become more reliant on automation because of the accelerated transition from Industry 3.0 to Industry 4.0.Manual inspection of products on assembly lines remains inefficient,prone to errors and lacks consistency,emphasizing the need for a reliable and automated inspection system.Leveraging both object detection and image segmentation approaches,this research proposes a vision-based solution for the detection of various kinds of tools in the toolkit using deep learning(DL)models.Two Intel RealSense D455f depth cameras were arranged in a top down configuration to capture both RGB and depth images of the toolkits.After applying multiple constraints and enhancing them through preprocessing and augmentation,a dataset consisting of 3300 annotated RGB-D photos was generated.Several DL models were selected through a comprehensive assessment of mean Average Precision(mAP),precision-recall equilibrium,inference latency(target≥30 FPS),and computational burden,resulting in a preference for YOLO and Region-based Convolutional Neural Networks(R-CNN)variants over ViT-based models due to the latter’s increased latency and resource requirements.YOLOV5,YOLOV8,YOLOV11,Faster R-CNN,and Mask R-CNN were trained on the annotated dataset and evaluated using key performance metrics(Recall,Accuracy,F1-score,and Precision).YOLOV11 demonstrated balanced excellence with 93.0%precision,89.9%recall,and a 90.6%F1-score in object detection,as well as 96.9%precision,95.3%recall,and a 96.5%F1-score in instance segmentation with an average inference time of 25 ms per frame(≈40 FPS),demonstrating real-time performance.Leveraging these results,a YOLOV11-based windows application was successfully deployed in a real-time assembly line environment,where it accurately processed live video streams to detect and segment tools within toolkits,demonstrating its practical effectiveness in industrial automation.The application is capable of precisely measuring socket dimensions by utilising edge detection techniques on YOLOv11 segmentation masks,in addition to detection and segmentation.This makes it possible to do specification-level quality control right on the assembly line,which improves the ability to examine things in real time.The implementation is a big step forward for intelligent manufacturing in the Industry 4.0 paradigm.It provides a scalable,efficient,and accurate way to do automated inspection and dimensional verification activities.展开更多
Medical image segmentation,i.e.,labeling structures of interest in medical images,is crucial for disease diagnosis and treatment in radiology.In reversible data hiding in medical images(RDHMI),segmentation consists of...Medical image segmentation,i.e.,labeling structures of interest in medical images,is crucial for disease diagnosis and treatment in radiology.In reversible data hiding in medical images(RDHMI),segmentation consists of only two regions:the focal and nonfocal regions.The focal region mainly contains information for diagnosis,while the nonfocal region serves as the monochrome background.The current traditional segmentation methods utilized in RDHMI are inaccurate for complex medical images,and manual segmentation is time-consuming,poorly reproducible,and operator-dependent.Implementing state-of-the-art deep learning(DL)models will facilitate key benefits,but the lack of domain-specific labels for existing medical datasets makes it impossible.To address this problem,this study provides labels of existing medical datasets based on a hybrid segmentation approach to facilitate the implementation of DL segmentation models in this domain.First,an initial segmentation based on a 33 kernel is performed to analyze×identified contour pixels before classifying pixels into focal and nonfocal regions.Then,several human expert raters evaluate and classify the generated labels into accurate and inaccurate labels.The inaccurate labels undergo manual segmentation by medical practitioners and are scored based on a hierarchical voting scheme before being assigned to the proposed dataset.To ensure reliability and integrity in the proposed dataset,we evaluate the accurate automated labels with manually segmented labels by medical practitioners using five assessment metrics:dice coefficient,Jaccard index,precision,recall,and accuracy.The experimental results show labels in the proposed dataset are consistent with the subjective judgment of human experts,with an average accuracy score of 94%and dice coefficient scores between 90%-99%.The study further proposes a ResNet-UNet with concatenated spatial and channel squeeze and excitation(scSE)architecture for semantic segmentation to validate and illustrate the usefulness of the proposed dataset.The results demonstrate the superior performance of the proposed architecture in accurately separating the focal and nonfocal regions compared to state-of-the-art architectures.Dataset information is released under the following URL:https://www.kaggle.com/lordamoah/datasets(accessed on 31 March 2025).展开更多
Segmenting a breast ultrasound image is still challenging due to the presence of speckle noise,dependency on the operator,and the variation of image quality.This paper presents the UltraSegNet architecture that addres...Segmenting a breast ultrasound image is still challenging due to the presence of speckle noise,dependency on the operator,and the variation of image quality.This paper presents the UltraSegNet architecture that addresses these challenges through three key technical innovations:This work adds three things:(1)a changed ResNet-50 backbone with sequential 3×3 convolutions to keep fine anatomical details that are needed for finding lesion boundaries;(2)a computationally efficient regional attention mechanism that works on high-resolution features without using a transformer’s extra memory;and(3)an adaptive feature fusion strategy that changes local and global featuresbasedonhowthe image isbeing used.Extensive evaluation on two distinct datasets demonstrates UltraSegNet’s superior performance:On the BUSI dataset,it obtains a precision of 0.915,a recall of 0.908,and an F1 score of 0.911.In the UDAIT dataset,it achieves robust performance across the board,with a precision of 0.901 and recall of 0.894.Importantly,these improvements are achieved at clinically feasible computation times,taking 235 ms per image on standard GPU hardware.Notably,UltraSegNet does amazingly well on difficult small lesions(less than 10 mm),achieving a detection accuracy of 0.891.This is a huge improvement over traditional methods that have a hard time with small-scale features,as standard models can only achieve 0.63–0.71 accuracy.This improvement in small lesion detection is particularly crucial for early-stage breast cancer identification.Results from this work demonstrate that UltraSegNet can be practically deployable in clinical workflows to improve breast cancer screening accuracy.展开更多
Myasthenia Gravis(MG)is an autoimmune neuromuscular disease.Given that extraocular muscle manifestations are the initial and primary symptoms in most patients,ocular muscle assessment is regarded necessary early scree...Myasthenia Gravis(MG)is an autoimmune neuromuscular disease.Given that extraocular muscle manifestations are the initial and primary symptoms in most patients,ocular muscle assessment is regarded necessary early screening tool.To overcome the limitations of the manual clinical method,an intuitive idea is to collect data via imaging devices,followed by analysis or processing using Deep Learning(DL)techniques(particularly image segmentation approaches)to enable automatic MG evaluation.Unfortunately,their clinical applications in this field have not been thoroughly explored.To bridge this gap,our study prospectively establishes a new DL-based system to promote the diagnosis of MG disease,with a complete workflow including facial data acquisition,eye region localization,and ocular structure segmentation.Experimental results demonstrate that the proposed system achieves superior segmentation performance of ocular structure.Moreover,it markedly improves the diagnostic accuracy of doctors.In the future,this endeavor can offer highly promising MG monitoring tools for healthcare professionals,patients,and regions with limited medical resources.展开更多
Deep learning(DL),derived from the domain of Artificial Neural Networks(ANN),forms one of the most essential components of modern deep learning algorithms.DL segmentation models rely on layer-by-layer convolution-base...Deep learning(DL),derived from the domain of Artificial Neural Networks(ANN),forms one of the most essential components of modern deep learning algorithms.DL segmentation models rely on layer-by-layer convolution-based feature representation,guided by forward and backward propagation.Acritical aspect of this process is the selection of an appropriate activation function(AF)to ensure robustmodel learning.However,existing activation functions often fail to effectively address the vanishing gradient problem or are complicated by the need for manual parameter tuning.Most current research on activation function design focuses on classification tasks using natural image datasets such asMNIST,CIFAR-10,and CIFAR-100.To address this gap,this study proposesMed-ReLU,a novel activation function specifically designed for medical image segmentation.Med-ReLU prevents deep learning models fromsuffering dead neurons or vanishing gradient issues.It is a hybrid activation function that combines the properties of ReLU and Softsign.For positive inputs,Med-ReLU adopts the linear behavior of ReLU to avoid vanishing gradients,while for negative inputs,it exhibits the Softsign’s polynomial convergence,ensuring robust training and avoiding inactive neurons across the training set.The training performance and segmentation accuracy ofMed-ReLU have been thoroughly evaluated,demonstrating stable learning behavior and resistance to overfitting.It consistently outperforms state-of-the-art activation functions inmedical image segmentation tasks.Designed as a parameter-free function,Med-ReLU is simple to implement in complex deep learning architectures,and its effectiveness spans various neural network models and anomaly detection scenarios.展开更多
Deep learning now underpins many state-of-the-art systems for biomedical image and signal processing,enabling automated lesion detection,physiological monitoring,and therapy planning with accuracy that rivals expert p...Deep learning now underpins many state-of-the-art systems for biomedical image and signal processing,enabling automated lesion detection,physiological monitoring,and therapy planning with accuracy that rivals expert performance.This survey reviews the principal model families as convolutional,recurrent,generative,reinforcement,autoencoder,and transfer-learning approaches as emphasising how their architectural choices map to tasks such as segmentation,classification,reconstruction,and anomaly detection.A dedicated treatment of multimodal fusion networks shows how imaging features can be integrated with genomic profiles and clinical records to yield more robust,context-aware predictions.To support clinical adoption,we outline post-hoc explainability techniques(Grad-CAM,SHAP,LIME)and describe emerging intrinsically interpretable designs that expose decision logic to end users.Regulatory guidance from the U.S.FDA,the European Medicines Agency,and the EU AI Act is summarised,linking transparency and lifecycle-monitoring requirements to concrete development practices.Remaining challenges as data imbalance,computational cost,privacy constraints,and cross-domain generalization are discussed alongside promising solutions such as federated learning,uncertainty quantification,and lightweight 3-D architectures.The article therefore offers researchers,clinicians,and policymakers a concise,practice-oriented roadmap for deploying trustworthy deep-learning systems in healthcare.展开更多
Automated prostate cancer detection in magnetic resonance imaging(MRI)scans is of significant importance for cancer patient management.Most existing computer-aided diagnosis systems adopt segmentation methods while ob...Automated prostate cancer detection in magnetic resonance imaging(MRI)scans is of significant importance for cancer patient management.Most existing computer-aided diagnosis systems adopt segmentation methods while object detection approaches recently show promising results.The authors have(1)carefully compared performances of most-developed segmentation and object detection methods in localising prostate imaging reporting and data system(PIRADS)-labelled prostate lesions on MRI scans;(2)proposed an additional customised set of lesion-level localisation sensitivity and precision;(3)proposed efficient ways to ensemble the segmentation and object detection methods for improved performances.The ground-truth(GT)perspective lesion-level sensitivity and prediction-perspective lesion-level precision are reported,to quantify the ratios of true positive voxels being detected by algorithms over the number of voxels in the GT labelled regions and predicted regions.The two networks are trained independently on 549 clinical patients data with PIRADS-V2 as GT labels,and tested on 161 internal and 100 external MRI scans.At the lesion level,nnDetection outperforms nnUNet for detecting both PIRADS≥3 and PIRADS≥4 lesions in majority cases.For example,at the average false positive prediction per patient being 3,nnDetection achieves a greater Intersection-of-Union(IoU)-based sensitivity than nnUNet for detecting PIRADS≥3 lesions,being 80.78%�1.50%versus 60.40%�1.64%(p<0.01).At the voxel level,nnUnet is in general superior or comparable to nnDetection.The proposed ensemble methods achieve improved or comparable lesion-level accuracy,in all tested clinical scenarios.For example,at 3 false positives,the lesion-wise ensemble method achieves 82.24%�1.43%sensitivity versus 80.78%�1.50%(nnDetection)and 60.40%�1.64%(nnUNet)for detecting PIRADS≥3 lesions.Consistent conclusions are also drawn from results on the external data set.展开更多
This paper presents a novel computerized technique for the segmentation of nuclei in hematoxylin and eosin(H&E)stained histopathology images.The purpose of this study is to overcome the challenges faced in automat...This paper presents a novel computerized technique for the segmentation of nuclei in hematoxylin and eosin(H&E)stained histopathology images.The purpose of this study is to overcome the challenges faced in automated nuclei segmentation due to the diversity of nuclei structures that arise from differences in tissue types and staining protocols,as well as the segmentation of variable-sized and overlapping nuclei.To this extent,the approach proposed in this study uses an ensemble of the UNet architecture with various Convolutional Neural Networks(CNN)architectures as encoder backbones,along with stain normalization and test time augmentation,to improve segmentation accuracy.Additionally,this paper employs a Structure-Preserving Color Normalization(SPCN)technique as a preprocessing step for stain normalization.The proposed model was trained and tested on both single-organ and multi-organ datasets,yielding an F1 score of 84.11%,mean Intersection over Union(IoU)of 81.67%,dice score of 84.11%,accuracy of 92.58%and precision of 83.78%on the multi-organ dataset,and an F1 score of 87.04%,mean IoU of 86.66%,dice score of 87.04%,accuracy of 96.69%and precision of 87.57%on the single-organ dataset.These findings demonstrate that the proposed model ensemble coupled with the right pre-processing and post-processing techniques enhances nuclei segmentation capabilities.展开更多
BACKGROUND Upper gastrointestinal(UGI)diseases present diagnostic challenges during endoscopy due to visual similarities,indistinct boundaries,and observer variability,which can lead to missed diagnoses and delayed tr...BACKGROUND Upper gastrointestinal(UGI)diseases present diagnostic challenges during endoscopy due to visual similarities,indistinct boundaries,and observer variability,which can lead to missed diagnoses and delayed treatment.Automated segmentation using deep learning(DL)models offers the potential to assist endoscopists,improve diagnostic accuracy,and reduce workload.However,multi-class UGI disease segmentation remains underexplored,with limited annotated datasets and insufficient focus on clinical validation.This study hypothesizes that comparative analysis of different DL architectures can identify models suitable for clinical application,providing actionable insights to reduce diagnostic errors and support clinical decision-making in endoscopic practice.AIM To evaluate 17 state-of-the-art DL models for multi-class UGI disease segmentation,emphasizing clinical translation and real-world applicability.METHODS This study evaluated 17 DL models spanning convolutional neural network(CNN)-,transformer-,and mambabased architectures using a self-collected dataset from two hospitals in Macao and Xiangyang(3313 images,9 classes)and the public EDD2020 dataset(386 images,5 classes).Models were assessed for segmentation performance and performance-efficiency trade-off.Statistical analyses were conducted to examine performance differences across architectures.Generalization capability was measured through a cross-dataset evaluation(training models on the self-collected dataset and testing on the EDD2020 dataset).RESULTS Swin-UMamba achieved the highest segmentation performance across both datasets[intersection over union(IoU):89.06%±0.20%self-collected,77.53%±0.32%EDD2020],followed by SegFormer(IoU:88.94%±0.38%selfcollected,77.20%±0.98%EDD2020)and ConvNeXt+UPerNet(IoU:88.48%±0.09%self-collected,76.90%±0.61%EDD2020).Statistical analyses showed no significant differences between paradigms,though hierarchical architectures with pre-trained encoders consistently outperformed simpler designs.SegFormer achieved the best balance of accuracy and computational efficiency with a performance-efficiency trade-off score of 92.02%,making it suitable for real-time clinical use.Cross-dataset evaluation revealed significant performance drops,with generalization retention rates of 64.78%to 71.52%.Transformer-based models,particularly pyramid vision transformer v2+efficient multi-scale convolutional decoding(IoU:63.35%±1.44%),generalized better than CNN-and mambabased models.CONCLUSION Hierarchical architectures like Swin-UMamba and SegFormer show promise for UGI disease segmentation,reducing missed diagnoses and improving workflows,but robust clinical validation is crucial for real-world deployment.展开更多
Deep learning is widely used for lesion segmentation in medical images due to its breakthrough performance.Loss functions are critical in a deep learning pipeline,and they play important roles in segmenting performanc...Deep learning is widely used for lesion segmentation in medical images due to its breakthrough performance.Loss functions are critical in a deep learning pipeline,and they play important roles in segmenting performance.Dice loss is the most commonly used loss function in medical image segmentation,but it also has some disadvantages.In this paper,we discuss the advantages and disadvantages of the Dice loss function,and group the extensions of the Dice loss according to its improved purpose.The performances of some extensions are compared according to core references.Because different loss functions have different performances in different tasks,automatic loss function selection will be the potential direction in the future.展开更多
Microstructural classification is typically done manually by human experts,which gives rise to uncertainties due to subjectivity and reduces the overall efficiency.A high-throughput characterization is proposed based ...Microstructural classification is typically done manually by human experts,which gives rise to uncertainties due to subjectivity and reduces the overall efficiency.A high-throughput characterization is proposed based on deep learning,rapid acquisition technology,and mathematical statistics for the recognition,segmentation,and quantification of microstructure in weathering steel.The segmentation results showed that this method was accurate and efficient,and the segmentation of inclusions and pearlite phase achieved accuracy of 89.95%and 90.86%,respectively.The time required for batch processing by MIPAR software involving thresholding segmentation,morphological processing,and small area deletion was 1.05 s for a single image.By comparison,our system required only 0.102 s,which is ten times faster than the commercial software.The quantification results were extracted from large volumes of sequential image data(150 mm^(2),62,216 images,1024×1024 pixels),which ensure comprehensive statistics.Microstructure information,such as three-dimensional density distribution and the frequency of the minimum spatial distance of inclusions on the sample surface of 150 mm^(2),were quantified by extracting the coordinates and sizes of individual features.A refined characterization method for two-dimensional structures and spatial information that is unattainable when performing manually or with software is provided.That will be useful for understanding properties or behaviors of weathering steel,and reducing the resort to physical testing.展开更多
A large number of nodule minerals exist in the deep sea.Based on the factors of difficulty in shooting,high economic cost and high accuracy of resource assessment,large-scale planned commercial mining has not yet been...A large number of nodule minerals exist in the deep sea.Based on the factors of difficulty in shooting,high economic cost and high accuracy of resource assessment,large-scale planned commercial mining has not yet been conducted.Only experimental mining has been carried out in areas with high mineral density and obvious benefits after mineral resource assessment.As an efficient method for deep-sea mineral resource assessment,the deep towing system is equipped with a visual system for mineral resource analysis using collected images and videos,which has become a key component of resource assessment.Therefore,high accuracy in deep-sea mineral image segmentation is the primary goal of the segmentation algorithm.In this paper,the existing deep-sea nodule mineral image segmentation algorithms are studied in depth and divided into traditional and deep learning-based segmentation methods,and the advantages and disadvantages of each are compared and summarized.The deep learning methods show great advantages in deep-sea mineral image segmentation,and there is a great improvement in segmentation accuracy and efficiency compared with the traditional methods.Then,the mineral image dataset and segmentation evaluation metrics are listed.Finally,possible future research topics and improvement measures are discussed for the reference of other researchers.展开更多
Although deep learning methods have been widely applied in medical image lesion segmentation,it is still challenging to apply them for segmenting ischemic stroke lesions,which are different from brain tumors in lesion...Although deep learning methods have been widely applied in medical image lesion segmentation,it is still challenging to apply them for segmenting ischemic stroke lesions,which are different from brain tumors in lesion characteristics,segmentation difficulty,algorithm maturity,and segmentation accuracy.Three main stages are used to describe the manifestations of stroke.For acute ischemic stroke,the size of the lesions is similar to that of brain tumors,and the current deep learning methods have been able to achieve a high segmentation accuracy.For sub-acute and chronic ischemic stroke,the segmentation results of mainstream deep learning algorithms are still unsatisfactory as lesions in these stages are small and diffuse.By using three scientific search engines including CNKI,Web of Science and Google Scholar,this paper aims to comprehensively understand the state-of-the-art deep learning algorithms applied to segmenting ischemic stroke lesions.For the first time,this paper discusses the current situation,challenges,and development directions of deep learning algorithms applied to ischemic stroke lesion segmentation in different stages.In the future,a system that can directly identify different stroke stages and automatically select the suitable network architecture for the stroke lesion segmentation needs to be proposed.展开更多
Accurate segmentation of CT images of liver tumors is an important adjunct for the liver diagnosis and treatment of liver diseases.In recent years,due to the great improvement of hard device,many deep learning based m...Accurate segmentation of CT images of liver tumors is an important adjunct for the liver diagnosis and treatment of liver diseases.In recent years,due to the great improvement of hard device,many deep learning based methods have been proposed for automatic liver segmentation.Among them,there are the plain neural network headed by FCN and the residual neural network headed by Resnet,both of which have many variations.They have achieved certain achievements in medical image segmentation.In this paper,we firstly select five representative structures,i.e.,FCN,U-Net,Segnet,Resnet and Densenet,to investigate their performance on liver segmentation.Since original Resnet and Densenet could not perform image segmentation directly,we make some adjustments for them to perform live segmentation.Our experimental results show that Densenet performs the best on liver segmentation,followed by Resnet.Both perform much better than Segnet,U-Net,and FCN.Among Segnet,U-Net,and FCN,U-Net performs the best,followed by Segnet.FCN performs the worst.展开更多
As colon cancer is among the top causes of death, there is a growinginterest in developing improved techniques for the early detection of colonpolyps. Given the close relation between colon polyps and colon cancer,the...As colon cancer is among the top causes of death, there is a growinginterest in developing improved techniques for the early detection of colonpolyps. Given the close relation between colon polyps and colon cancer,their detection helps avoid cancer cases. The increment in the availability ofcolorectal screening tests and the number of colonoscopies have increasedthe burden on the medical personnel. In this article, the application of deeplearning techniques for the detection and segmentation of colon polyps incolonoscopies is presented. Four techniques were implemented and evaluated:Mask-RCNN, PANet, Cascade R-CNN and Hybrid Task Cascade (HTC).These were trained and tested using CVC-Colon database, ETIS-LARIBPolyp, and a proprietary dataset. Three experiments were conducted to assessthe techniques performance: (1) Training and testing using each databaseindependently, (2) Mergingd the databases and testing on each database independently using a merged test set, and (3) Training on each dataset and testingon the merged test set. In our experiments, PANet architecture has the bestperformance in Polyp detection, and HTC was the most accurate to segmentthem. This approach allows us to employ Deep Learning techniques to assisthealthcare professionals in the medical diagnosis for colon cancer. It is anticipated that this approach can be part of a framework for a semi-automatedpolyp detection in colonoscopies.展开更多
Every day,websites and personal archives create more and more photos.The size of these archives is immeasurable.The comfort of use of these huge digital image gatherings donates to their admiration.However,not all of ...Every day,websites and personal archives create more and more photos.The size of these archives is immeasurable.The comfort of use of these huge digital image gatherings donates to their admiration.However,not all of these folders deliver relevant indexing information.From the outcomes,it is dif-ficult to discover data that the user can be absorbed in.Therefore,in order to determine the significance of the data,it is important to identify the contents in an informative manner.Image annotation can be one of the greatest problematic domains in multimedia research and computer vision.Hence,in this paper,Adap-tive Convolutional Deep Learning Model(ACDLM)is developed for automatic image annotation.Initially,the databases are collected from the open-source system which consists of some labelled images(for training phase)and some unlabeled images{Corel 5 K,MSRC v2}.After that,the images are sent to the pre-processing step such as colour space quantization and texture color class map.The pre-processed images are sent to the segmentation approach for efficient labelling technique using J-image segmentation(JSEG).Thefinal step is an auto-matic annotation using ACDLM which is a combination of Convolutional Neural Network(CNN)and Honey Badger Algorithm(HBA).Based on the proposed classifier,the unlabeled images are labelled.The proposed methodology is imple-mented in MATLAB and performance is evaluated by performance metrics such as accuracy,precision,recall and F1_Measure.With the assistance of the pro-posed methodology,the unlabeled images are labelled.展开更多
Image segmentation is attracting increasing attention in the field of medical image analysis.Since widespread utilization across various medical applications,ensuring and improving segmentation accuracy has become a c...Image segmentation is attracting increasing attention in the field of medical image analysis.Since widespread utilization across various medical applications,ensuring and improving segmentation accuracy has become a crucial topic of research.With advances in deep learning,researchers have developed numerous methods that combine Transformers and convolutional neural networks(CNNs)to create highly accurate models for medical image segmentation.However,efforts to further enhance accuracy by developing larger and more complex models or training with more extensive datasets,significantly increase computational resource consumption.To address this problem,we propose BiCLIP-nnFormer(the prefix"Bi"refers to the use of two distinct CLIP models),a virtual multimodal instrument that leverages CLIP models to enhance the segmentation performance of a medical segmentation model nnFormer.Since two CLIP models(PMC-CLIP and CoCa-CLIP)are pre-trained on large datasets,they do not require additional training,thus conserving computation resources.These models are used offline to extract image and text embeddings from medical images.These embeddings are then processed by the proposed 3D CLIP adapter,which adapts the CLIP knowledge for segmentation tasks by fine-tuning.Finally,the adapted embeddings are fused with feature maps extracted from the nnFormer encoder for generating predicted masks.This process enriches the representation capabilities of the feature maps by integrating global multimodal information,leading to more precise segmentation predictions.We demonstrate the superiority of BiCLIP-nnFormer and the effectiveness of using CLIP models to enhance nnFormer through experiments on two public datasets,namely the Synapse multi-organ segmentation dataset(Synapse)and the Automatic Cardiac Diagnosis Challenge dataset(ACDC),as well as a self-annotated lung multi-category segmentation dataset(LMCS).展开更多
Microseismic monitoring is essential for understanding subsurface dynamics and optimizing oil and gas pro-duction.However,traditional methods for the automatic detection of microseismic events rely heavily on characte...Microseismic monitoring is essential for understanding subsurface dynamics and optimizing oil and gas pro-duction.However,traditional methods for the automatic detection of microseismic events rely heavily on characteristic functions and human intervention,often resulting in suboptimal performance when dealing with complex and noisy data.In this study,we propose a novel approach that leverages deep learning frame to extract multiscale features from microseismic data using a TransUNet neural network.Our model integrates the ad-vantages of Transformer and UNet architectures to achieve high accuracy in multivariate image segmentation and precise picking of P-wave and S-wave first arrivals simultaneously.We validate our approach using both synthetic and field microseismic datasets recorded from gas storage monitoring and roof fracturing in a coal seam.The robustness of the proposed method has been verified in the testing of synthetic data with various levels of Gaussian and real background noises extracted from field data.The comparisons of the proposed method with UNet and SwinUNet in terms of the model architecture and classification performance demonstrate the Tran-sUNet achieves the optimal balance in its architecture and inference speed.With relatively low inference time and network complexity,it operates effectively in high-precision microseismic phase pickings.This advancement holds significant promise for enhancing microseismic monitoring technology in hydraulic fracturing and reser-voir monitoring applications.展开更多
基金supported by the Information Technology Industry Development Agency (ITIDA),Egypt (Project No.CFP181).
文摘Image segmentation is crucial for various research areas. Manycomputer vision applications depend on segmenting images to understandthe scene, such as autonomous driving, surveillance systems, robotics, andmedical imaging. With the recent advances in deep learning (DL) and itsconfounding results in image segmentation, more attention has been drawnto its use in medical image segmentation. This article introduces a surveyof the state-of-the-art deep convolution neural network (CNN) models andmechanisms utilized in image segmentation. First, segmentation models arecategorized based on their model architecture and primary working principle.Then, CNN categories are described, and various models are discussed withineach category. Compared with other existing surveys, several applicationswith multiple architectural adaptations are discussed within each category.A comparative summary is included to give the reader insights into utilizedarchitectures in different applications and datasets. This study focuses onmedical image segmentation applications, where the most widely used architecturesare illustrated, and other promising models are suggested that haveproven their success in different domains. Finally, the present work discussescurrent limitations and solutions along with future trends in the field.
基金Scientific Research Project of the Education Department of Hunan Province(20C1435)Open Fund Project for Computer Science and Technology of Hunan University of Chinese Medicine(2018JK05).
文摘Objective To propose two novel methods based on deep learning for computer-aided tongue diagnosis,including tongue image segmentation and tongue color classification,improving their diagnostic accuracy.Methods LabelMe was used to label the tongue mask and Snake model to optimize the labeling results.A new dataset was constructed for tongue image segmentation.Tongue color was marked to build a classified dataset for network training.In this research,the Inception+Atrous Spatial Pyramid Pooling(ASPP)+UNet(IAUNet)method was proposed for tongue image segmentation,based on the existing UNet,Inception,and atrous convolution.Moreover,the Tongue Color Classification Net(TCCNet)was constructed with reference to ResNet,Inception,and Triple-Loss.Several important measurement indexes were selected to evaluate and compare the effects of the novel and existing methods for tongue segmentation and tongue color classification.IAUNet was compared with existing mainstream methods such as UNet and DeepLabV3+for tongue segmentation.TCCNet for tongue color classification was compared with VGG16 and GoogLeNet.Results IAUNet can accurately segment the tongue from original images.The results showed that the Mean Intersection over Union(MIoU)of IAUNet reached 96.30%,and its Mean Pixel Accuracy(MPA),mean Average Precision(mAP),F1-Score,G-Score,and Area Under Curve(AUC)reached 97.86%,99.18%,96.71%,96.82%,and 99.71%,respectively,suggesting IAUNet produced better segmentation than other methods,with fewer parameters.Triplet-Loss was applied in the proposed TCCNet to separate different embedded colors.The experiment yielded ideal results,with F1-Score and mAP of the TCCNet reached 88.86% and 93.49%,respectively.Conclusion IAUNet based on deep learning for tongue segmentation is better than traditional ones.IAUNet can not only produce ideal tongue segmentation,but have better effects than those of PSPNet,SegNet,UNet,and DeepLabV3+,the traditional networks.As for tongue color classification,the proposed network,TCCNet,had better F1-Score and mAP values as compared with other neural networks such as VGG16 and GoogLeNet.
文摘Modern manufacturing processes have become more reliant on automation because of the accelerated transition from Industry 3.0 to Industry 4.0.Manual inspection of products on assembly lines remains inefficient,prone to errors and lacks consistency,emphasizing the need for a reliable and automated inspection system.Leveraging both object detection and image segmentation approaches,this research proposes a vision-based solution for the detection of various kinds of tools in the toolkit using deep learning(DL)models.Two Intel RealSense D455f depth cameras were arranged in a top down configuration to capture both RGB and depth images of the toolkits.After applying multiple constraints and enhancing them through preprocessing and augmentation,a dataset consisting of 3300 annotated RGB-D photos was generated.Several DL models were selected through a comprehensive assessment of mean Average Precision(mAP),precision-recall equilibrium,inference latency(target≥30 FPS),and computational burden,resulting in a preference for YOLO and Region-based Convolutional Neural Networks(R-CNN)variants over ViT-based models due to the latter’s increased latency and resource requirements.YOLOV5,YOLOV8,YOLOV11,Faster R-CNN,and Mask R-CNN were trained on the annotated dataset and evaluated using key performance metrics(Recall,Accuracy,F1-score,and Precision).YOLOV11 demonstrated balanced excellence with 93.0%precision,89.9%recall,and a 90.6%F1-score in object detection,as well as 96.9%precision,95.3%recall,and a 96.5%F1-score in instance segmentation with an average inference time of 25 ms per frame(≈40 FPS),demonstrating real-time performance.Leveraging these results,a YOLOV11-based windows application was successfully deployed in a real-time assembly line environment,where it accurately processed live video streams to detect and segment tools within toolkits,demonstrating its practical effectiveness in industrial automation.The application is capable of precisely measuring socket dimensions by utilising edge detection techniques on YOLOv11 segmentation masks,in addition to detection and segmentation.This makes it possible to do specification-level quality control right on the assembly line,which improves the ability to examine things in real time.The implementation is a big step forward for intelligent manufacturing in the Industry 4.0 paradigm.It provides a scalable,efficient,and accurate way to do automated inspection and dimensional verification activities.
基金supported by the National Natural Science Foundation of China(Grant Nos.62072250,61772281,61702235,U1636117,U1804263,62172435,61872203 and 61802212)the Zhongyuan Science and Technology Innovation Leading Talent Project of China(Grant No.214200510019)+3 种基金the Suqian Municipal Science and Technology Plan Project in 2020(S202015)the Plan for Scientific Talent of Henan Province(Grant No.2018JR0018)the Opening Project of Guangdong Provincial Key Laboratory of Information Security Technology(Grant No.2020B1212060078)the Priority Academic Program Development of Jiangsu Higher Education Institutions(PAPD)Fund.
文摘Medical image segmentation,i.e.,labeling structures of interest in medical images,is crucial for disease diagnosis and treatment in radiology.In reversible data hiding in medical images(RDHMI),segmentation consists of only two regions:the focal and nonfocal regions.The focal region mainly contains information for diagnosis,while the nonfocal region serves as the monochrome background.The current traditional segmentation methods utilized in RDHMI are inaccurate for complex medical images,and manual segmentation is time-consuming,poorly reproducible,and operator-dependent.Implementing state-of-the-art deep learning(DL)models will facilitate key benefits,but the lack of domain-specific labels for existing medical datasets makes it impossible.To address this problem,this study provides labels of existing medical datasets based on a hybrid segmentation approach to facilitate the implementation of DL segmentation models in this domain.First,an initial segmentation based on a 33 kernel is performed to analyze×identified contour pixels before classifying pixels into focal and nonfocal regions.Then,several human expert raters evaluate and classify the generated labels into accurate and inaccurate labels.The inaccurate labels undergo manual segmentation by medical practitioners and are scored based on a hierarchical voting scheme before being assigned to the proposed dataset.To ensure reliability and integrity in the proposed dataset,we evaluate the accurate automated labels with manually segmented labels by medical practitioners using five assessment metrics:dice coefficient,Jaccard index,precision,recall,and accuracy.The experimental results show labels in the proposed dataset are consistent with the subjective judgment of human experts,with an average accuracy score of 94%and dice coefficient scores between 90%-99%.The study further proposes a ResNet-UNet with concatenated spatial and channel squeeze and excitation(scSE)architecture for semantic segmentation to validate and illustrate the usefulness of the proposed dataset.The results demonstrate the superior performance of the proposed architecture in accurately separating the focal and nonfocal regions compared to state-of-the-art architectures.Dataset information is released under the following URL:https://www.kaggle.com/lordamoah/datasets(accessed on 31 March 2025).
基金funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R435),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Segmenting a breast ultrasound image is still challenging due to the presence of speckle noise,dependency on the operator,and the variation of image quality.This paper presents the UltraSegNet architecture that addresses these challenges through three key technical innovations:This work adds three things:(1)a changed ResNet-50 backbone with sequential 3×3 convolutions to keep fine anatomical details that are needed for finding lesion boundaries;(2)a computationally efficient regional attention mechanism that works on high-resolution features without using a transformer’s extra memory;and(3)an adaptive feature fusion strategy that changes local and global featuresbasedonhowthe image isbeing used.Extensive evaluation on two distinct datasets demonstrates UltraSegNet’s superior performance:On the BUSI dataset,it obtains a precision of 0.915,a recall of 0.908,and an F1 score of 0.911.In the UDAIT dataset,it achieves robust performance across the board,with a precision of 0.901 and recall of 0.894.Importantly,these improvements are achieved at clinically feasible computation times,taking 235 ms per image on standard GPU hardware.Notably,UltraSegNet does amazingly well on difficult small lesions(less than 10 mm),achieving a detection accuracy of 0.891.This is a huge improvement over traditional methods that have a hard time with small-scale features,as standard models can only achieve 0.63–0.71 accuracy.This improvement in small lesion detection is particularly crucial for early-stage breast cancer identification.Results from this work demonstrate that UltraSegNet can be practically deployable in clinical workflows to improve breast cancer screening accuracy.
基金funded by the National High Level Hospital Clinical Research Funding(No.BJ-2023-111).
文摘Myasthenia Gravis(MG)is an autoimmune neuromuscular disease.Given that extraocular muscle manifestations are the initial and primary symptoms in most patients,ocular muscle assessment is regarded necessary early screening tool.To overcome the limitations of the manual clinical method,an intuitive idea is to collect data via imaging devices,followed by analysis or processing using Deep Learning(DL)techniques(particularly image segmentation approaches)to enable automatic MG evaluation.Unfortunately,their clinical applications in this field have not been thoroughly explored.To bridge this gap,our study prospectively establishes a new DL-based system to promote the diagnosis of MG disease,with a complete workflow including facial data acquisition,eye region localization,and ocular structure segmentation.Experimental results demonstrate that the proposed system achieves superior segmentation performance of ocular structure.Moreover,it markedly improves the diagnostic accuracy of doctors.In the future,this endeavor can offer highly promising MG monitoring tools for healthcare professionals,patients,and regions with limited medical resources.
基金The researchers would like to thank the Deanship of Graduate Studies and Scientific Research at Qassim University for financial support(QU-APC-2025).
文摘Deep learning(DL),derived from the domain of Artificial Neural Networks(ANN),forms one of the most essential components of modern deep learning algorithms.DL segmentation models rely on layer-by-layer convolution-based feature representation,guided by forward and backward propagation.Acritical aspect of this process is the selection of an appropriate activation function(AF)to ensure robustmodel learning.However,existing activation functions often fail to effectively address the vanishing gradient problem or are complicated by the need for manual parameter tuning.Most current research on activation function design focuses on classification tasks using natural image datasets such asMNIST,CIFAR-10,and CIFAR-100.To address this gap,this study proposesMed-ReLU,a novel activation function specifically designed for medical image segmentation.Med-ReLU prevents deep learning models fromsuffering dead neurons or vanishing gradient issues.It is a hybrid activation function that combines the properties of ReLU and Softsign.For positive inputs,Med-ReLU adopts the linear behavior of ReLU to avoid vanishing gradients,while for negative inputs,it exhibits the Softsign’s polynomial convergence,ensuring robust training and avoiding inactive neurons across the training set.The training performance and segmentation accuracy ofMed-ReLU have been thoroughly evaluated,demonstrating stable learning behavior and resistance to overfitting.It consistently outperforms state-of-the-art activation functions inmedical image segmentation tasks.Designed as a parameter-free function,Med-ReLU is simple to implement in complex deep learning architectures,and its effectiveness spans various neural network models and anomaly detection scenarios.
基金supported by the Science Committee of the Ministry of Higher Education and Science of the Republic of Kazakhstan within the framework of grant AP23489899“Applying Deep Learning and Neuroimaging Methods for Brain Stroke Diagnosis”.
文摘Deep learning now underpins many state-of-the-art systems for biomedical image and signal processing,enabling automated lesion detection,physiological monitoring,and therapy planning with accuracy that rivals expert performance.This survey reviews the principal model families as convolutional,recurrent,generative,reinforcement,autoencoder,and transfer-learning approaches as emphasising how their architectural choices map to tasks such as segmentation,classification,reconstruction,and anomaly detection.A dedicated treatment of multimodal fusion networks shows how imaging features can be integrated with genomic profiles and clinical records to yield more robust,context-aware predictions.To support clinical adoption,we outline post-hoc explainability techniques(Grad-CAM,SHAP,LIME)and describe emerging intrinsically interpretable designs that expose decision logic to end users.Regulatory guidance from the U.S.FDA,the European Medicines Agency,and the EU AI Act is summarised,linking transparency and lifecycle-monitoring requirements to concrete development practices.Remaining challenges as data imbalance,computational cost,privacy constraints,and cross-domain generalization are discussed alongside promising solutions such as federated learning,uncertainty quantification,and lightweight 3-D architectures.The article therefore offers researchers,clinicians,and policymakers a concise,practice-oriented roadmap for deploying trustworthy deep-learning systems in healthcare.
基金National Natural Science Foundation of China,Grant/Award Number:62303275International Alliance for Cancer Early Detection,Grant/Award Numbers:C28070/A30912,C73666/A31378Wellcome/EPSRC Centre for Interventional and Surgical Sciences,Grant/Award Number:203145Z/16/Z。
文摘Automated prostate cancer detection in magnetic resonance imaging(MRI)scans is of significant importance for cancer patient management.Most existing computer-aided diagnosis systems adopt segmentation methods while object detection approaches recently show promising results.The authors have(1)carefully compared performances of most-developed segmentation and object detection methods in localising prostate imaging reporting and data system(PIRADS)-labelled prostate lesions on MRI scans;(2)proposed an additional customised set of lesion-level localisation sensitivity and precision;(3)proposed efficient ways to ensemble the segmentation and object detection methods for improved performances.The ground-truth(GT)perspective lesion-level sensitivity and prediction-perspective lesion-level precision are reported,to quantify the ratios of true positive voxels being detected by algorithms over the number of voxels in the GT labelled regions and predicted regions.The two networks are trained independently on 549 clinical patients data with PIRADS-V2 as GT labels,and tested on 161 internal and 100 external MRI scans.At the lesion level,nnDetection outperforms nnUNet for detecting both PIRADS≥3 and PIRADS≥4 lesions in majority cases.For example,at the average false positive prediction per patient being 3,nnDetection achieves a greater Intersection-of-Union(IoU)-based sensitivity than nnUNet for detecting PIRADS≥3 lesions,being 80.78%�1.50%versus 60.40%�1.64%(p<0.01).At the voxel level,nnUnet is in general superior or comparable to nnDetection.The proposed ensemble methods achieve improved or comparable lesion-level accuracy,in all tested clinical scenarios.For example,at 3 false positives,the lesion-wise ensemble method achieves 82.24%�1.43%sensitivity versus 80.78%�1.50%(nnDetection)and 60.40%�1.64%(nnUNet)for detecting PIRADS≥3 lesions.Consistent conclusions are also drawn from results on the external data set.
文摘This paper presents a novel computerized technique for the segmentation of nuclei in hematoxylin and eosin(H&E)stained histopathology images.The purpose of this study is to overcome the challenges faced in automated nuclei segmentation due to the diversity of nuclei structures that arise from differences in tissue types and staining protocols,as well as the segmentation of variable-sized and overlapping nuclei.To this extent,the approach proposed in this study uses an ensemble of the UNet architecture with various Convolutional Neural Networks(CNN)architectures as encoder backbones,along with stain normalization and test time augmentation,to improve segmentation accuracy.Additionally,this paper employs a Structure-Preserving Color Normalization(SPCN)technique as a preprocessing step for stain normalization.The proposed model was trained and tested on both single-organ and multi-organ datasets,yielding an F1 score of 84.11%,mean Intersection over Union(IoU)of 81.67%,dice score of 84.11%,accuracy of 92.58%and precision of 83.78%on the multi-organ dataset,and an F1 score of 87.04%,mean IoU of 86.66%,dice score of 87.04%,accuracy of 96.69%and precision of 87.57%on the single-organ dataset.These findings demonstrate that the proposed model ensemble coupled with the right pre-processing and post-processing techniques enhances nuclei segmentation capabilities.
基金Supported by the Guangdong Basic and Applied Basic Research Foundation,No.2021B1515130003the Key Research and Development Plan of Hubei Province,No.2022BCE034the Natural Science Foundation of Hubei Province,No.2024AFB1054.
文摘BACKGROUND Upper gastrointestinal(UGI)diseases present diagnostic challenges during endoscopy due to visual similarities,indistinct boundaries,and observer variability,which can lead to missed diagnoses and delayed treatment.Automated segmentation using deep learning(DL)models offers the potential to assist endoscopists,improve diagnostic accuracy,and reduce workload.However,multi-class UGI disease segmentation remains underexplored,with limited annotated datasets and insufficient focus on clinical validation.This study hypothesizes that comparative analysis of different DL architectures can identify models suitable for clinical application,providing actionable insights to reduce diagnostic errors and support clinical decision-making in endoscopic practice.AIM To evaluate 17 state-of-the-art DL models for multi-class UGI disease segmentation,emphasizing clinical translation and real-world applicability.METHODS This study evaluated 17 DL models spanning convolutional neural network(CNN)-,transformer-,and mambabased architectures using a self-collected dataset from two hospitals in Macao and Xiangyang(3313 images,9 classes)and the public EDD2020 dataset(386 images,5 classes).Models were assessed for segmentation performance and performance-efficiency trade-off.Statistical analyses were conducted to examine performance differences across architectures.Generalization capability was measured through a cross-dataset evaluation(training models on the self-collected dataset and testing on the EDD2020 dataset).RESULTS Swin-UMamba achieved the highest segmentation performance across both datasets[intersection over union(IoU):89.06%±0.20%self-collected,77.53%±0.32%EDD2020],followed by SegFormer(IoU:88.94%±0.38%selfcollected,77.20%±0.98%EDD2020)and ConvNeXt+UPerNet(IoU:88.48%±0.09%self-collected,76.90%±0.61%EDD2020).Statistical analyses showed no significant differences between paradigms,though hierarchical architectures with pre-trained encoders consistently outperformed simpler designs.SegFormer achieved the best balance of accuracy and computational efficiency with a performance-efficiency trade-off score of 92.02%,making it suitable for real-time clinical use.Cross-dataset evaluation revealed significant performance drops,with generalization retention rates of 64.78%to 71.52%.Transformer-based models,particularly pyramid vision transformer v2+efficient multi-scale convolutional decoding(IoU:63.35%±1.44%),generalized better than CNN-and mambabased models.CONCLUSION Hierarchical architectures like Swin-UMamba and SegFormer show promise for UGI disease segmentation,reducing missed diagnoses and improving workflows,but robust clinical validation is crucial for real-world deployment.
文摘Deep learning is widely used for lesion segmentation in medical images due to its breakthrough performance.Loss functions are critical in a deep learning pipeline,and they play important roles in segmenting performance.Dice loss is the most commonly used loss function in medical image segmentation,but it also has some disadvantages.In this paper,we discuss the advantages and disadvantages of the Dice loss function,and group the extensions of the Dice loss according to its improved purpose.The performances of some extensions are compared according to core references.Because different loss functions have different performances in different tasks,automatic loss function selection will be the potential direction in the future.
基金supported by the National Key Research and Development Program of China(No.2017YFB0702303).
文摘Microstructural classification is typically done manually by human experts,which gives rise to uncertainties due to subjectivity and reduces the overall efficiency.A high-throughput characterization is proposed based on deep learning,rapid acquisition technology,and mathematical statistics for the recognition,segmentation,and quantification of microstructure in weathering steel.The segmentation results showed that this method was accurate and efficient,and the segmentation of inclusions and pearlite phase achieved accuracy of 89.95%and 90.86%,respectively.The time required for batch processing by MIPAR software involving thresholding segmentation,morphological processing,and small area deletion was 1.05 s for a single image.By comparison,our system required only 0.102 s,which is ten times faster than the commercial software.The quantification results were extracted from large volumes of sequential image data(150 mm^(2),62,216 images,1024×1024 pixels),which ensure comprehensive statistics.Microstructure information,such as three-dimensional density distribution and the frequency of the minimum spatial distance of inclusions on the sample surface of 150 mm^(2),were quantified by extracting the coordinates and sizes of individual features.A refined characterization method for two-dimensional structures and spatial information that is unattainable when performing manually or with software is provided.That will be useful for understanding properties or behaviors of weathering steel,and reducing the resort to physical testing.
基金This work was supported in part by the National Science Foundation Project of P.R.China under Grant No.52071349,No.U1906234partially supported by the Open Project Program of Key Laboratory ofMarine Environmental Survey Technology and Application,Ministry of Natural Resource MESTA-2020-B001+1 种基金Young and Middle-aged Talents Project of the State Ethnic Affairs Commission,the Crossdisciplinary Research Project of Minzu University of China(2020MDJC08)the Graduate Research and Practice Projects of Minzu University of China(SZKY2021039).
文摘A large number of nodule minerals exist in the deep sea.Based on the factors of difficulty in shooting,high economic cost and high accuracy of resource assessment,large-scale planned commercial mining has not yet been conducted.Only experimental mining has been carried out in areas with high mineral density and obvious benefits after mineral resource assessment.As an efficient method for deep-sea mineral resource assessment,the deep towing system is equipped with a visual system for mineral resource analysis using collected images and videos,which has become a key component of resource assessment.Therefore,high accuracy in deep-sea mineral image segmentation is the primary goal of the segmentation algorithm.In this paper,the existing deep-sea nodule mineral image segmentation algorithms are studied in depth and divided into traditional and deep learning-based segmentation methods,and the advantages and disadvantages of each are compared and summarized.The deep learning methods show great advantages in deep-sea mineral image segmentation,and there is a great improvement in segmentation accuracy and efficiency compared with the traditional methods.Then,the mineral image dataset and segmentation evaluation metrics are listed.Finally,possible future research topics and improvement measures are discussed for the reference of other researchers.
文摘Although deep learning methods have been widely applied in medical image lesion segmentation,it is still challenging to apply them for segmenting ischemic stroke lesions,which are different from brain tumors in lesion characteristics,segmentation difficulty,algorithm maturity,and segmentation accuracy.Three main stages are used to describe the manifestations of stroke.For acute ischemic stroke,the size of the lesions is similar to that of brain tumors,and the current deep learning methods have been able to achieve a high segmentation accuracy.For sub-acute and chronic ischemic stroke,the segmentation results of mainstream deep learning algorithms are still unsatisfactory as lesions in these stages are small and diffuse.By using three scientific search engines including CNKI,Web of Science and Google Scholar,this paper aims to comprehensively understand the state-of-the-art deep learning algorithms applied to segmenting ischemic stroke lesions.For the first time,this paper discusses the current situation,challenges,and development directions of deep learning algorithms applied to ischemic stroke lesion segmentation in different stages.In the future,a system that can directly identify different stroke stages and automatically select the suitable network architecture for the stroke lesion segmentation needs to be proposed.
基金This research has been partially supported by National Science Foundation under grant IIS-1115417the National Natural Science Foundation of China under grant 61728205,61876217+1 种基金the“double first-class”international cooperation and development scientific research project of Changsha University of Science and Technology(No.2018IC25)the Science and Technology Development Project of Suzhou under grant SZS201609 and SYG201707.
文摘Accurate segmentation of CT images of liver tumors is an important adjunct for the liver diagnosis and treatment of liver diseases.In recent years,due to the great improvement of hard device,many deep learning based methods have been proposed for automatic liver segmentation.Among them,there are the plain neural network headed by FCN and the residual neural network headed by Resnet,both of which have many variations.They have achieved certain achievements in medical image segmentation.In this paper,we firstly select five representative structures,i.e.,FCN,U-Net,Segnet,Resnet and Densenet,to investigate their performance on liver segmentation.Since original Resnet and Densenet could not perform image segmentation directly,we make some adjustments for them to perform live segmentation.Our experimental results show that Densenet performs the best on liver segmentation,followed by Resnet.Both perform much better than Segnet,U-Net,and FCN.Among Segnet,U-Net,and FCN,U-Net performs the best,followed by Segnet.FCN performs the worst.
基金supported by the Basque Government“Aids for health research projects”and the publication fees supported by the Basque Government Department of Education(eVIDA Certified Group IT905-16).
文摘As colon cancer is among the top causes of death, there is a growinginterest in developing improved techniques for the early detection of colonpolyps. Given the close relation between colon polyps and colon cancer,their detection helps avoid cancer cases. The increment in the availability ofcolorectal screening tests and the number of colonoscopies have increasedthe burden on the medical personnel. In this article, the application of deeplearning techniques for the detection and segmentation of colon polyps incolonoscopies is presented. Four techniques were implemented and evaluated:Mask-RCNN, PANet, Cascade R-CNN and Hybrid Task Cascade (HTC).These were trained and tested using CVC-Colon database, ETIS-LARIBPolyp, and a proprietary dataset. Three experiments were conducted to assessthe techniques performance: (1) Training and testing using each databaseindependently, (2) Mergingd the databases and testing on each database independently using a merged test set, and (3) Training on each dataset and testingon the merged test set. In our experiments, PANet architecture has the bestperformance in Polyp detection, and HTC was the most accurate to segmentthem. This approach allows us to employ Deep Learning techniques to assisthealthcare professionals in the medical diagnosis for colon cancer. It is anticipated that this approach can be part of a framework for a semi-automatedpolyp detection in colonoscopies.
文摘Every day,websites and personal archives create more and more photos.The size of these archives is immeasurable.The comfort of use of these huge digital image gatherings donates to their admiration.However,not all of these folders deliver relevant indexing information.From the outcomes,it is dif-ficult to discover data that the user can be absorbed in.Therefore,in order to determine the significance of the data,it is important to identify the contents in an informative manner.Image annotation can be one of the greatest problematic domains in multimedia research and computer vision.Hence,in this paper,Adap-tive Convolutional Deep Learning Model(ACDLM)is developed for automatic image annotation.Initially,the databases are collected from the open-source system which consists of some labelled images(for training phase)and some unlabeled images{Corel 5 K,MSRC v2}.After that,the images are sent to the pre-processing step such as colour space quantization and texture color class map.The pre-processed images are sent to the segmentation approach for efficient labelling technique using J-image segmentation(JSEG).Thefinal step is an auto-matic annotation using ACDLM which is a combination of Convolutional Neural Network(CNN)and Honey Badger Algorithm(HBA).Based on the proposed classifier,the unlabeled images are labelled.The proposed methodology is imple-mented in MATLAB and performance is evaluated by performance metrics such as accuracy,precision,recall and F1_Measure.With the assistance of the pro-posed methodology,the unlabeled images are labelled.
基金funded by the National Natural Science Foundation of China(Grant No.6240072655)the Hubei Provincial Key Research and Development Program(Grant No.2023BCB151)+1 种基金the Wuhan Natural Science Foundation Exploration Program(Chenguang Program,Grant No.2024040801020202)the Natural Science Foundation of Hubei Province of China(Grant No.2025AFB148).
文摘Image segmentation is attracting increasing attention in the field of medical image analysis.Since widespread utilization across various medical applications,ensuring and improving segmentation accuracy has become a crucial topic of research.With advances in deep learning,researchers have developed numerous methods that combine Transformers and convolutional neural networks(CNNs)to create highly accurate models for medical image segmentation.However,efforts to further enhance accuracy by developing larger and more complex models or training with more extensive datasets,significantly increase computational resource consumption.To address this problem,we propose BiCLIP-nnFormer(the prefix"Bi"refers to the use of two distinct CLIP models),a virtual multimodal instrument that leverages CLIP models to enhance the segmentation performance of a medical segmentation model nnFormer.Since two CLIP models(PMC-CLIP and CoCa-CLIP)are pre-trained on large datasets,they do not require additional training,thus conserving computation resources.These models are used offline to extract image and text embeddings from medical images.These embeddings are then processed by the proposed 3D CLIP adapter,which adapts the CLIP knowledge for segmentation tasks by fine-tuning.Finally,the adapted embeddings are fused with feature maps extracted from the nnFormer encoder for generating predicted masks.This process enriches the representation capabilities of the feature maps by integrating global multimodal information,leading to more precise segmentation predictions.We demonstrate the superiority of BiCLIP-nnFormer and the effectiveness of using CLIP models to enhance nnFormer through experiments on two public datasets,namely the Synapse multi-organ segmentation dataset(Synapse)and the Automatic Cardiac Diagnosis Challenge dataset(ACDC),as well as a self-annotated lung multi-category segmentation dataset(LMCS).
基金supported by a National Natural Science Foundation of China(Grant number 41974150 and 42174158)Natural Science Basic Research Program of Shaanxi(2023-JC-YB-220).
文摘Microseismic monitoring is essential for understanding subsurface dynamics and optimizing oil and gas pro-duction.However,traditional methods for the automatic detection of microseismic events rely heavily on characteristic functions and human intervention,often resulting in suboptimal performance when dealing with complex and noisy data.In this study,we propose a novel approach that leverages deep learning frame to extract multiscale features from microseismic data using a TransUNet neural network.Our model integrates the ad-vantages of Transformer and UNet architectures to achieve high accuracy in multivariate image segmentation and precise picking of P-wave and S-wave first arrivals simultaneously.We validate our approach using both synthetic and field microseismic datasets recorded from gas storage monitoring and roof fracturing in a coal seam.The robustness of the proposed method has been verified in the testing of synthetic data with various levels of Gaussian and real background noises extracted from field data.The comparisons of the proposed method with UNet and SwinUNet in terms of the model architecture and classification performance demonstrate the Tran-sUNet achieves the optimal balance in its architecture and inference speed.With relatively low inference time and network complexity,it operates effectively in high-precision microseismic phase pickings.This advancement holds significant promise for enhancing microseismic monitoring technology in hydraulic fracturing and reser-voir monitoring applications.