Brain tumor segmentation from Magnetic Resonance Imaging(MRI)supports neurologists and radiologists in analyzing tumors and developing personalized treatment plans,making it a crucial yet challenging task.Supervised m...Brain tumor segmentation from Magnetic Resonance Imaging(MRI)supports neurologists and radiologists in analyzing tumors and developing personalized treatment plans,making it a crucial yet challenging task.Supervised models such as 3D U-Net perform well in this domain,but their accuracy significantly improves with appropriate preprocessing.This paper demonstrates the effectiveness of preprocessing in brain tumor segmentation by applying a pre-segmentation step based on the Generalized Gaussian Mixture Model(GGMM)to T1 contrastenhanced MRI scans from the BraTS 2020 dataset.The Expectation-Maximization(EM)algorithm is employed to estimate parameters for four tissue classes,generating a new pre-segmented channel that enhances the training and performance of the 3DU-Net model.The proposed GGMM+3D U-Net framework achieved a Dice coefficient of 0.88 for whole tumor segmentation,outperforming both the standard multiscale 3D U-Net(0.84)and MMU-Net(0.85).It also delivered higher Intersection over Union(IoU)scores compared to models trained without preprocessing or with simpler GMM-based segmentation.These results,supported by qualitative visualizations,suggest that GGMM-based preprocessing should be integrated into brain tumor segmentation pipelines to optimize performance.展开更多
Lower back pain is one of the most common medical problems in the world and it is experienced by a huge percentage of people everywhere.Due to its ability to produce a detailed view of the soft tissues,including the s...Lower back pain is one of the most common medical problems in the world and it is experienced by a huge percentage of people everywhere.Due to its ability to produce a detailed view of the soft tissues,including the spinal cord,nerves,intervertebral discs,and vertebrae,Magnetic Resonance Imaging is thought to be the most effective method for imaging the spine.The semantic segmentation of vertebrae plays a major role in the diagnostic process of lumbar diseases.It is difficult to semantically partition the vertebrae in Magnetic Resonance Images from the surrounding variety of tissues,including muscles,ligaments,and intervertebral discs.U-Net is a powerful deep-learning architecture to handle the challenges of medical image analysis tasks and achieves high segmentation accuracy.This work proposes a modified U-Net architecture namely MU-Net,consisting of the Meijering convolutional layer that incorporates the Meijering filter to perform the semantic segmentation of lumbar vertebrae L1 to L5 and sacral vertebra S1.Pseudo-colour mask images were generated and used as ground truth for training the model.The work has been carried out on 1312 images expanded from T1-weighted mid-sagittal MRI images of 515 patients in the Lumbar Spine MRI Dataset publicly available from Mendeley Data.The proposed MU-Net model for the semantic segmentation of the lumbar vertebrae gives better performance with 98.79%of pixel accuracy(PA),98.66%of dice similarity coefficient(DSC),97.36%of Jaccard coefficient,and 92.55%mean Intersection over Union(mean IoU)metrics using the mentioned dataset.展开更多
Brain tumors present significant challenges in medical diagnosis and treatment,where early detection is crucial for reducing morbidity and mortality rates.This research introduces a novel deep learning model,the Progr...Brain tumors present significant challenges in medical diagnosis and treatment,where early detection is crucial for reducing morbidity and mortality rates.This research introduces a novel deep learning model,the Progressive Layered U-Net(PLU-Net),designed to improve brain tumor segmentation accuracy from Magnetic Resonance Imaging(MRI)scans.The PLU-Net extends the standard U-Net architecture by incorporating progressive layering,attention mechanisms,and multi-scale data augmentation.The progressive layering involves a cascaded structure that refines segmentation masks across multiple stages,allowing the model to capture features at different scales and resolutions.Attention gates within the convolutional layers selectively focus on relevant features while suppressing irrelevant ones,enhancing the model's ability to delineate tumor boundaries.Additionally,multi-scale data augmentation techniques increase the diversity of training data and boost the model's generalization capabilities.Evaluated on the BraTS 2021 dataset,the PLU-Net achieved state-of-the-art performance with a dice coefficient of 0.91,specificity of 0.92,sensitivity of 0.89,Hausdorff95 of 2.5,outperforming other modified U-Net architectures in segmentation accuracy.These results underscore the effectiveness of the PLU-Net in improving brain tumor segmentation from MRI scans,supporting clinicians in early diagnosis,treatment planning,and the development of new therapies.展开更多
Medical image segmentation has become a cornerstone for many healthcare applications,allowing for the automated extraction of critical information from images such as Computed Tomography(CT)scans,Magnetic Resonance Im...Medical image segmentation has become a cornerstone for many healthcare applications,allowing for the automated extraction of critical information from images such as Computed Tomography(CT)scans,Magnetic Resonance Imaging(MRIs),and X-rays.The introduction of U-Net in 2015 has significantly advanced segmentation capabilities,especially for small datasets commonly found in medical imaging.Since then,various modifications to the original U-Net architecture have been proposed to enhance segmentation accuracy and tackle challenges like class imbalance,data scarcity,and multi-modal image processing.This paper provides a detailed review and comparison of several U-Net-based architectures,focusing on their effectiveness in medical image segmentation tasks.We evaluate performance metrics such as Dice Similarity Coefficient(DSC)and Intersection over Union(IoU)across different U-Net variants including HmsU-Net,CrossU-Net,mResU-Net,and others.Our results indicate that architectural enhancements such as transformers,attention mechanisms,and residual connections improve segmentation performance across diverse medical imaging applications,including tumor detection,organ segmentation,and lesion identification.The study also identifies current challenges in the field,including data variability,limited dataset sizes,and issues with class imbalance.Based on these findings,the paper suggests potential future directions for improving the robustness and clinical applicability of U-Net-based models in medical image segmentation.展开更多
Background:Diabetic retinopathy(DR)is one of the primary causes of visual impairment globally,resulting from microvascular abnormalities in the retina.Accurate segmentation of retinal blood vessels from fundus images ...Background:Diabetic retinopathy(DR)is one of the primary causes of visual impairment globally,resulting from microvascular abnormalities in the retina.Accurate segmentation of retinal blood vessels from fundus images plays a pivotal role in the early diagnosis,progression monitoring,and treatment planning of DR and related ocular conditions.Traditional convolutional neural networks often struggle with capturing the intricate structures of thin vessels under varied illumination and contrast conditions.Methods:In this study,we propose an improved U-Net-based framework named MSAC U-Net,which enhances feature extraction and reconstruction through multiscale and attention-based modules.Specifically,the encoder replaces standard convolutions with a Multiscale Asymmetric Convolution(MSAC)block,incorporating parallel 1×n,n×1,and n×n kernels at different scales(3×3,5×5,7×7)to effectively capture fine-grained vascular structures.To further refine spatial representation,skip connections are utilized,and the decoder is augmented with dual activation strategies,Squeeze-and-Excitation blocks,and Convolutional Block Attention Modules for improved contextual understanding.Results:The model was evaluated on the publicly available DRIVE dataset.It achieved an accuracy of 96.48%,sensitivity of 88.31%,specificity of 97.90%,and an AUC of 98.59%,demonstrating superior performance compared to several state-of-the-art segmentation methods.Conclusion:The proposed MSAC U-Net provides a robust and accurate approach for retinal vessel segmentation,offering substantial clinical value in the early detection and management of diabetic retinopathy.Its design contributes to enhanced segmentation reliability and may serve as a foundation for broader applications in medical image analysis.展开更多
Nuclei segmentation is a challenging task in histopathology images.It is challenging due to the small size of objects,low contrast,touching boundaries,and complex structure of nuclei.Their segmentation and counting pl...Nuclei segmentation is a challenging task in histopathology images.It is challenging due to the small size of objects,low contrast,touching boundaries,and complex structure of nuclei.Their segmentation and counting play an important role in cancer identification and its grading.In this study,WaveSeg-UNet,a lightweight model,is introduced to segment cancerous nuclei having touching boundaries.Residual blocks are used for feature extraction.Only one feature extractor block is used in each level of the encoder and decoder.Normally,images degrade quality and lose important information during down-sampling.To overcome this loss,discrete wavelet transform(DWT)alongside maxpooling is used in the down-sampling process.Inverse DWT is used to regenerate original images during up-sampling.In the bottleneck of the proposed model,atrous spatial channel pyramid pooling(ASCPP)is used to extract effective high-level features.The ASCPP is the modified pyramid pooling having atrous layers to increase the area of the receptive field.Spatial and channel-based attention are used to focus on the location and class of the identified objects.Finally,watershed transform is used as a post processing technique to identify and refine touching boundaries of nuclei.Nuclei are identified and counted to facilitate pathologists.The same domain of transfer learning is used to retrain the model for domain adaptability.Results of the proposed model are compared with state-of-the-art models,and it outperformed the existing studies.展开更多
In this paper,we introduce an innovative method for computer-aided design(CAD)segmentation by concatenating meshes and CAD models.Many previous CAD segmentation methods have achieved impressive performance using singl...In this paper,we introduce an innovative method for computer-aided design(CAD)segmentation by concatenating meshes and CAD models.Many previous CAD segmentation methods have achieved impressive performance using single representations,such as meshes,CAD,and point clouds.However,existing methods cannot effectively combine different three-dimensional model types for the direct conversion,alignment,and integrity maintenance of geometric and topological information.Hence,we propose an integration approach that combines the geometric accuracy of CAD data with the flexibility of mesh representations,as well as introduce a unique hybrid representation that combines CAD and mesh models to enhance segmentation accuracy.To combine these two model types,our hybrid system utilizes advanced-neural-network techniques to convert CAD models into mesh models.For complex CAD models,model segmentation is crucial for model retrieval and reuse.In partial retrieval,it aims to segment a complex CAD model into several simple components.The first component of our hybrid system involves advanced mesh-labeling algorithms that harness the digitization of CAD properties to mesh models.The second component integrates labelled face features for CAD segmentation by leveraging the abundant multisemantic information embedded in CAD models.This combination of mesh and CAD not only refines the accuracy of boundary delineation but also provides a comprehensive understanding of the underlying object semantics.This study uses the Fusion 360 Gallery dataset.Experimental results indicate that our hybrid method can segment these models with higher accuracy than other methods that use single representations.展开更多
Thyroid nodules,a common disorder in the endocrine system,require accurate segmentation in ultrasound images for effective diagnosis and treatment.However,achieving precise segmentation remains a challenge due to vari...Thyroid nodules,a common disorder in the endocrine system,require accurate segmentation in ultrasound images for effective diagnosis and treatment.However,achieving precise segmentation remains a challenge due to various factors,including scattering noise,low contrast,and limited resolution in ultrasound images.Although existing segmentation models have made progress,they still suffer from several limitations,such as high error rates,low generalizability,overfitting,limited feature learning capability,etc.To address these challenges,this paper proposes a Multi-level Relation Transformer-based U-Net(MLRT-UNet)to improve thyroid nodule segmentation.The MLRTUNet leverages a novel Relation Transformer,which processes images at multiple scales,overcoming the limitations of traditional encoding methods.This transformer integrates both local and global features effectively through selfattention and cross-attention units,capturing intricate relationships within the data.The approach also introduces a Co-operative Transformer Fusion(CTF)module to combine multi-scale features from different encoding layers,enhancing the model’s ability to capture complex patterns in the data.Furthermore,the Relation Transformer block enhances long-distance dependencies during the decoding process,improving segmentation accuracy.Experimental results showthat the MLRT-UNet achieves high segmentation accuracy,reaching 98.2% on the Digital Database Thyroid Image(DDT)dataset,97.8% on the Thyroid Nodule 3493(TG3K)dataset,and 98.2% on the Thyroid Nodule3K(TN3K)dataset.These findings demonstrate that the proposed method significantly enhances the accuracy of thyroid nodule segmentation,addressing the limitations of existing models.展开更多
This study presents an advanced method for post-mortem person identification using the segmentation of skeletal structures from chest X-ray images.The proposed approach employs the Attention U-Net architecture,enhance...This study presents an advanced method for post-mortem person identification using the segmentation of skeletal structures from chest X-ray images.The proposed approach employs the Attention U-Net architecture,enhanced with gated attention mechanisms,to refine segmentation by emphasizing spatially relevant anatomical features while suppressing irrelevant details.By isolating skeletal structures which remain stable over time compared to soft tissues,this method leverages bones as reliable biometric markers for identity verification.The model integrates custom-designed encoder and decoder blocks with attention gates,achieving high segmentation precision.To evaluate the impact of architectural choices,we conducted an ablation study comparing Attention U-Net with and without attentionmechanisms,alongside an analysis of data augmentation effects.Training and evaluation were performed on a curated chest X-ray dataset,with segmentation performance measured using Dice score,precision,and loss functions,achieving over 98% precision and 94% Dice score.The extracted bone structures were further processed to derive unique biometric patterns,enabling robust and privacy-preserving person identification.Our findings highlight the effectiveness of attentionmechanisms in improving segmentation accuracy and underscore the potential of chest bonebased biometrics in forensic and medical imaging.This work paves the way for integrating artificial intelligence into real-world forensic workflows,offering a non-invasive and reliable solution for post-mortem identification.展开更多
With the continuous development of artificial intelligence and machine learning techniques,there have been effective methods supporting the work of dermatologist in the field of skin cancer detection.However,object si...With the continuous development of artificial intelligence and machine learning techniques,there have been effective methods supporting the work of dermatologist in the field of skin cancer detection.However,object significant challenges have been presented in accurately segmenting melanomas in dermoscopic images due to the objects that could interfere human observations,such as bubbles and scales.To address these challenges,we propose a dual U-Net network framework for skin melanoma segmentation.In our proposed architecture,we introduce several innovative components that aim to enhance the performance and capabilities of the traditional U-Net.First,we establish a novel framework that links two simplified U-Nets,enabling more comprehensive information exchange and feature integration throughout the network.Second,after cascading the second U-Net,we introduce a skip connection between the decoder and encoder networks,and incorporate a modified receptive field block(MRFB),which is designed to capture multi-scale spatial information.Third,to further enhance the feature representation capabilities,we add a multi-path convolution block attention module(MCBAM)to the first two layers of the first U-Net encoding,and integrate a new squeeze-and-excitation(SE)mechanism with residual connections in the second U-Net.To illustrate the performance of our proposed model,we conducted comprehensive experiments on widely recognized skin datasets.On the ISIC-2017 dataset,the IoU value of our proposed model increased from 0.6406 to 0.6819 and the Dice coefficient increased from 0.7625 to 0.8023.On the ISIC-2018 dataset,the IoU value of proposed model also improved from 0.7138 to 0.7709,while the Dice coefficient increased from 0.8285 to 0.8665.Furthermore,the generalization experiments conducted on the jaw cyst dataset from Quzhou People’s Hospital further verified the outstanding segmentation performance of the proposed model.These findings collectively affirm the potential of our approach as a valuable tool in supporting clinical decision-making in the field of skin cancer detection,as well as advancing research in medical image analysis.展开更多
Computer-vision and deep-learning techniques are widely applied to detect,monitor,and assess pavement conditions including road crack detection.Traditional methods fail to achieve satisfactory accuracy and generalizat...Computer-vision and deep-learning techniques are widely applied to detect,monitor,and assess pavement conditions including road crack detection.Traditional methods fail to achieve satisfactory accuracy and generalization performance in for crack detection.Complex network model can generate redundant feature maps and computational complexity.Therefore,this paper proposes a novel model compression framework based on deep learning to detect road cracks,which can improve the detection efficiency and accuracy.A distillation loss function is proposed to compress the teacher model,followed by channel pruning.Meanwhile,a multi-dilation model is proposed to improve the accuracy of the model pruned.The proposed method is tested on the public database CrackForest dataset(CFD).The experimental results show that the proposed method is more efficient and accurate than other state-of-art methods.展开更多
Despite its remarkable performance on natural images,the segment anything model(SAM)lacks domain-specific information in medical imaging.and faces the challenge of losing local multi-scale information in the encoding ...Despite its remarkable performance on natural images,the segment anything model(SAM)lacks domain-specific information in medical imaging.and faces the challenge of losing local multi-scale information in the encoding phase.This paper presents a medical image segmentation model based on SAM with a local multi-scale feature encoder(LMSFE-SAM)to address the issues above.Firstly,based on the SAM,a local multi-scale feature encoder is introduced to improve the representation of features within local receptive field,thereby supplying the Vision Transformer(ViT)branch in SAM with enriched local multi-scale contextual information.At the same time,a multiaxial Hadamard product module(MHPM)is incorporated into the local multi-scale feature encoder in a lightweight manner to reduce the quadratic complexity and noise interference.Subsequently,a cross-branch balancing adapter is designed to balance the local and global information between the local multi-scale feature encoder and the ViT encoder in SAM.Finally,to obtain smaller input image size and to mitigate overlapping in patch embeddings,the size of the input image is reduced from 1024×1024 pixels to 256×256 pixels,and a multidimensional information adaptation component is developed,which includes feature adapters,position adapters,and channel-spatial adapters.This component effectively integrates the information from small-sized medical images into SAM,enhancing its suitability for clinical deployment.The proposed model demonstrates an average enhancement ranging from 0.0387 to 0.3191 across six objective evaluation metrics on BUSI,DDTI,and TN3K datasets compared to eight other representative image segmentation models.This significantly enhances the performance of the SAM on medical images,providing clinicians with a powerful tool in clinical diagnosis.展开更多
Objective This study aimed to explore a novel method that integrates the segmentation guidance classification and the dif-fusion model augmentation to realize the automatic classification for tibial plateau fractures(...Objective This study aimed to explore a novel method that integrates the segmentation guidance classification and the dif-fusion model augmentation to realize the automatic classification for tibial plateau fractures(TPFs).Methods YOLOv8n-cls was used to construct a baseline model on the data of 3781 patients from the Orthopedic Trauma Center of Wuhan Union Hospital.Additionally,a segmentation-guided classification approach was proposed.To enhance the dataset,a diffusion model was further demonstrated for data augmentation.Results The novel method that integrated the segmentation-guided classification and diffusion model augmentation sig-nificantly improved the accuracy and robustness of fracture classification.The average accuracy of classification for TPFs rose from 0.844 to 0.896.The comprehensive performance of the dual-stream model was also significantly enhanced after many rounds of training,with both the macro-area under the curve(AUC)and the micro-AUC increasing from 0.94 to 0.97.By utilizing diffusion model augmentation and segmentation map integration,the model demonstrated superior efficacy in identifying SchatzkerⅠ,achieving an accuracy of 0.880.It yielded an accuracy of 0.898 for SchatzkerⅡandⅢand 0.913 for SchatzkerⅣ;for SchatzkerⅤandⅥ,the accuracy was 0.887;and for intercondylar ridge fracture,the accuracy was 0.923.Conclusion The dual-stream attention-based classification network,which has been verified by many experiments,exhibited great potential in predicting the classification of TPFs.This method facilitates automatic TPF assessment and may assist surgeons in the rapid formulation of surgical plans.展开更多
Accurate and efficient brain tumor segmentation is essential for early diagnosis,treatment planning,and clinical decision-making.However,the complex structure of brain anatomy and the heterogeneous nature of tumors pr...Accurate and efficient brain tumor segmentation is essential for early diagnosis,treatment planning,and clinical decision-making.However,the complex structure of brain anatomy and the heterogeneous nature of tumors present significant challenges for precise anomaly detection.While U-Net-based architectures have demonstrated strong performance in medical image segmentation,there remains room for improvement in feature extraction and localization accuracy.In this study,we propose a novel hybrid model designed to enhance 3D brain tumor segmentation.The architecture incorporates a 3D ResNet encoder known for mitigating the vanishing gradient problem and a 3D U-Net decoder.Additionally,to enhance the model’s generalization ability,Squeeze and Excitation attention mechanism is integrated.We introduce Gabor filter banks into the encoder to further strengthen the model’s ability to extract robust and transformation-invariant features from the complex and irregular shapes typical in medical imaging.This approach,which is not well explored in current U-Net-based segmentation frameworks,provides a unique advantage by enhancing texture-aware feature representation.Specifically,Gabor filters help extract distinctive low-level texture features,reducing the effects of texture interference and facilitating faster convergence during the early stages of training.Our model achieved Dice scores of 0.881,0.846,and 0.819 for Whole Tumor(WT),Tumor Core(TC),and Enhancing Tumor(ET),respectively,on the BraTS 2020 dataset.Cross-validation on the BraTS 2021 dataset further confirmed the model’s robustness,yielding Dice score values of 0.887 for WT,0.856 for TC,and 0.824 for ET.The proposed model outperforms several state-of-the-art existing models,particularly in accurately identifying small and complex tumor regions.Extensive evaluations suggest integrating advanced preprocessing with an attention-augmented hybrid architecture offers significant potential for reliable and clinically valuable brain tumor segmentation.展开更多
Background Magnetic resonance imaging(MRI)has played an important role in the rapid growth of medical imaging diagnostic technology,especially in the diagnosis and treatment of brain tumors owing to its non invasive c...Background Magnetic resonance imaging(MRI)has played an important role in the rapid growth of medical imaging diagnostic technology,especially in the diagnosis and treatment of brain tumors owing to its non invasive characteristics and superior soft tissue contrast.However,brain tumors are characterized by high non uniformity and non-obvious boundaries in MRI images because of their invasive and highly heterogeneous nature.In addition,the labeling of tumor areas is time-consuming and laborious.Methods To address these issues,this study uses a residual grouped convolution module,convolutional block attention module,and bilinear interpolation upsampling method to improve the classical segmentation network U-net.The influence of network normalization,loss function,and network depth on segmentation performance is further considered.Results In the experiments,the Dice score of the proposed segmentation model reached 97.581%,which is 12.438%higher than that of traditional U-net,demonstrating the effective segmentation of MRI brain tumor images.Conclusions In conclusion,we use the improved U-net network to achieve a good segmentation effect of brain tumor MRI images.展开更多
Retinal blood vessel segmentation is crucial for diagnosing ocular and cardiovascular diseases.Although the introduction of U-Net in 2015 by Olaf Ronneberger significantly advanced this field,yet issues like limited t...Retinal blood vessel segmentation is crucial for diagnosing ocular and cardiovascular diseases.Although the introduction of U-Net in 2015 by Olaf Ronneberger significantly advanced this field,yet issues like limited training data,imbalance data distribution,and inadequate feature extraction persist,hindering both the segmentation performance and optimal model generalization.Addressing these critical issues,the DEFFA-Unet is proposed featuring an additional encoder to process domain-invariant pre-processed inputs,thereby improving both richer feature encoding and enhanced model generalization.A feature filtering fusion module is developed to ensure the precise feature filtering and robust hybrid feature fusion.In response to the task-specific need for higher precision where false positives are very costly,traditional skip connections are replaced with the attention-guided feature reconstructing fusion module.Additionally,innovative data augmentation and balancing methods are proposed to counter data scarcity and distribution imbalance,further boosting the robustness and generalization of the model.With a comprehensive suite of evaluation metrics,extensive validations on four benchmark datasets(DRIVE,CHASEDB1,STARE,and HRF)and an SLO dataset(IOSTAR),demonstrate the proposed method’s superiority over both baseline and state-of-the-art models.Particularly the proposed method significantly outperforms the compared methods in cross-validation model generalization.展开更多
Deep learning(DL),derived from the domain of Artificial Neural Networks(ANN),forms one of the most essential components of modern deep learning algorithms.DL segmentation models rely on layer-by-layer convolution-base...Deep learning(DL),derived from the domain of Artificial Neural Networks(ANN),forms one of the most essential components of modern deep learning algorithms.DL segmentation models rely on layer-by-layer convolution-based feature representation,guided by forward and backward propagation.Acritical aspect of this process is the selection of an appropriate activation function(AF)to ensure robustmodel learning.However,existing activation functions often fail to effectively address the vanishing gradient problem or are complicated by the need for manual parameter tuning.Most current research on activation function design focuses on classification tasks using natural image datasets such asMNIST,CIFAR-10,and CIFAR-100.To address this gap,this study proposesMed-ReLU,a novel activation function specifically designed for medical image segmentation.Med-ReLU prevents deep learning models fromsuffering dead neurons or vanishing gradient issues.It is a hybrid activation function that combines the properties of ReLU and Softsign.For positive inputs,Med-ReLU adopts the linear behavior of ReLU to avoid vanishing gradients,while for negative inputs,it exhibits the Softsign’s polynomial convergence,ensuring robust training and avoiding inactive neurons across the training set.The training performance and segmentation accuracy ofMed-ReLU have been thoroughly evaluated,demonstrating stable learning behavior and resistance to overfitting.It consistently outperforms state-of-the-art activation functions inmedical image segmentation tasks.Designed as a parameter-free function,Med-ReLU is simple to implement in complex deep learning architectures,and its effectiveness spans various neural network models and anomaly detection scenarios.展开更多
The use of AI technologies in remote sensing(RS)tasks has been the focus of many individuals in both the professional and academic domains.Having more accessible interfaces and tools that allow people of little or no ...The use of AI technologies in remote sensing(RS)tasks has been the focus of many individuals in both the professional and academic domains.Having more accessible interfaces and tools that allow people of little or no experience to intuitively interact with RS data of multiple formats is a potential provided by this integration.However,the use of AI and AI agents to help automate RS-related tasks is still in its infancy stage,with some frameworks and interfaces built on top of well-known vision language models(VLM)such as GPT-4,segment anything model(SAM),and grounding DINO.These tools do promise and draw guidelines on the potentials and limitations of existing solutions concerning the use of said models.In this work,the state of the art AI foundation models(FM)are reviewed and used in a multi-modal manner to ingest RS imagery input and perform zero-shot object detection using natural language.The natural language input is then used to define the classes or labels the model should look for,then,both inputs are fed to the pipeline.The pipeline presented in this work makes up for the shortcomings of the general knowledge FMs by stacking pre-processing and post-processing applications on top of the FMs;these applications include tiling to produce uniform patches of the original image for faster detection,outlier rejection of redundant bounding boxes using statistical and machine learning methods.The pipeline was tested with UAV,aerial and satellite images taken over multiple areas.The accuracy for the semantic segmentation showed improvement from the original 64%to approximately 80%-99%by utilizing the pipeline and techniques proposed in this work.GitHub Repository:MohanadDiab/LangRS.展开更多
The current method for inspecting microholes in printed circuit boards(PCBs)involves preparing slices followed by optical microscope measurements.However,this approach suffers from low detection efficiency,poor reliab...The current method for inspecting microholes in printed circuit boards(PCBs)involves preparing slices followed by optical microscope measurements.However,this approach suffers from low detection efficiency,poor reliability,and insufficient measurement stability.Micro-CT enables the observation of the internal structures of the sample without the need for slicing,thereby presenting a promising new method for assessing the quality of microholes in PCBs.This study integrates computer vision technology with computed tomography(CT)to propose a method for detecting microhole wall roughness using a U-Net model and image processing algorithms.This study established an unplated copper PCB CT image dataset and trained an improved U-Net model.Validation of the test set demonstrated that the improved model effectively segmented microholes in the PCB CT images.Subsequently,the roughness of the holes’walls was assessed using a customized image-processing algorithm.Comparative analysis between CT detection based on various edge detection algorithms and slice detection revealed that CT detection employing the Canny algorithm closely approximates slice detection,yielding range and average errors of 2.92 and 1.64μm,respectively.Hence,the detection method proposed in this paper offers a novel approach for nondestructive testing of hole wall roughness in the PCB industry.展开更多
Data augmentation plays an important role in training deep neural model by expanding the size and diversity of the dataset.Initially,data augmentation mainly involved some simple transformations of images.Later,in ord...Data augmentation plays an important role in training deep neural model by expanding the size and diversity of the dataset.Initially,data augmentation mainly involved some simple transformations of images.Later,in order to increase the diversity and complexity of data,more advanced methods appeared and evolved to sophisticated generative models.However,these methods required a mass of computation of training or searching.In this paper,a novel training-free method that utilises the Pre-Trained Segment Anything Model(SAM)model as a data augmentation tool(PTSAM-DA)is proposed to generate the augmented annotations for images.Without the need for training,it obtains prompt boxes from the original annotations and then feeds the boxes to the pre-trained SAM to generate diverse and improved annotations.In this way,annotations are augmented more ingenious than simple manipulations without incurring huge computation for training a data augmentation model.Multiple comparative experiments on three datasets are conducted,including an in-house dataset,ADE20K and COCO2017.On this in-house dataset,namely Agricultural Plot Segmentation Dataset,maximum improvements of 3.77%and 8.92%are gained in two mainstream metrics,mIoU and mAcc,respectively.Consequently,large vision models like SAM are proven to be promising not only in image segmentation but also in data augmentation.展开更多
基金Princess Nourah Bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R826),Princess Nourah Bint Abdulrahman University,Riyadh,Saudi ArabiaNorthern Border University,Saudi Arabia,for supporting this work through project number(NBU-CRP-2025-2933).
文摘Brain tumor segmentation from Magnetic Resonance Imaging(MRI)supports neurologists and radiologists in analyzing tumors and developing personalized treatment plans,making it a crucial yet challenging task.Supervised models such as 3D U-Net perform well in this domain,but their accuracy significantly improves with appropriate preprocessing.This paper demonstrates the effectiveness of preprocessing in brain tumor segmentation by applying a pre-segmentation step based on the Generalized Gaussian Mixture Model(GGMM)to T1 contrastenhanced MRI scans from the BraTS 2020 dataset.The Expectation-Maximization(EM)algorithm is employed to estimate parameters for four tissue classes,generating a new pre-segmented channel that enhances the training and performance of the 3DU-Net model.The proposed GGMM+3D U-Net framework achieved a Dice coefficient of 0.88 for whole tumor segmentation,outperforming both the standard multiscale 3D U-Net(0.84)and MMU-Net(0.85).It also delivered higher Intersection over Union(IoU)scores compared to models trained without preprocessing or with simpler GMM-based segmentation.These results,supported by qualitative visualizations,suggest that GGMM-based preprocessing should be integrated into brain tumor segmentation pipelines to optimize performance.
文摘Lower back pain is one of the most common medical problems in the world and it is experienced by a huge percentage of people everywhere.Due to its ability to produce a detailed view of the soft tissues,including the spinal cord,nerves,intervertebral discs,and vertebrae,Magnetic Resonance Imaging is thought to be the most effective method for imaging the spine.The semantic segmentation of vertebrae plays a major role in the diagnostic process of lumbar diseases.It is difficult to semantically partition the vertebrae in Magnetic Resonance Images from the surrounding variety of tissues,including muscles,ligaments,and intervertebral discs.U-Net is a powerful deep-learning architecture to handle the challenges of medical image analysis tasks and achieves high segmentation accuracy.This work proposes a modified U-Net architecture namely MU-Net,consisting of the Meijering convolutional layer that incorporates the Meijering filter to perform the semantic segmentation of lumbar vertebrae L1 to L5 and sacral vertebra S1.Pseudo-colour mask images were generated and used as ground truth for training the model.The work has been carried out on 1312 images expanded from T1-weighted mid-sagittal MRI images of 515 patients in the Lumbar Spine MRI Dataset publicly available from Mendeley Data.The proposed MU-Net model for the semantic segmentation of the lumbar vertebrae gives better performance with 98.79%of pixel accuracy(PA),98.66%of dice similarity coefficient(DSC),97.36%of Jaccard coefficient,and 92.55%mean Intersection over Union(mean IoU)metrics using the mentioned dataset.
文摘Brain tumors present significant challenges in medical diagnosis and treatment,where early detection is crucial for reducing morbidity and mortality rates.This research introduces a novel deep learning model,the Progressive Layered U-Net(PLU-Net),designed to improve brain tumor segmentation accuracy from Magnetic Resonance Imaging(MRI)scans.The PLU-Net extends the standard U-Net architecture by incorporating progressive layering,attention mechanisms,and multi-scale data augmentation.The progressive layering involves a cascaded structure that refines segmentation masks across multiple stages,allowing the model to capture features at different scales and resolutions.Attention gates within the convolutional layers selectively focus on relevant features while suppressing irrelevant ones,enhancing the model's ability to delineate tumor boundaries.Additionally,multi-scale data augmentation techniques increase the diversity of training data and boost the model's generalization capabilities.Evaluated on the BraTS 2021 dataset,the PLU-Net achieved state-of-the-art performance with a dice coefficient of 0.91,specificity of 0.92,sensitivity of 0.89,Hausdorff95 of 2.5,outperforming other modified U-Net architectures in segmentation accuracy.These results underscore the effectiveness of the PLU-Net in improving brain tumor segmentation from MRI scans,supporting clinicians in early diagnosis,treatment planning,and the development of new therapies.
文摘Medical image segmentation has become a cornerstone for many healthcare applications,allowing for the automated extraction of critical information from images such as Computed Tomography(CT)scans,Magnetic Resonance Imaging(MRIs),and X-rays.The introduction of U-Net in 2015 has significantly advanced segmentation capabilities,especially for small datasets commonly found in medical imaging.Since then,various modifications to the original U-Net architecture have been proposed to enhance segmentation accuracy and tackle challenges like class imbalance,data scarcity,and multi-modal image processing.This paper provides a detailed review and comparison of several U-Net-based architectures,focusing on their effectiveness in medical image segmentation tasks.We evaluate performance metrics such as Dice Similarity Coefficient(DSC)and Intersection over Union(IoU)across different U-Net variants including HmsU-Net,CrossU-Net,mResU-Net,and others.Our results indicate that architectural enhancements such as transformers,attention mechanisms,and residual connections improve segmentation performance across diverse medical imaging applications,including tumor detection,organ segmentation,and lesion identification.The study also identifies current challenges in the field,including data variability,limited dataset sizes,and issues with class imbalance.Based on these findings,the paper suggests potential future directions for improving the robustness and clinical applicability of U-Net-based models in medical image segmentation.
基金supported by the Guangdong Basic and Applied Basic Research Foundation(2024A1515010987)the Medical Scientific Research Foundation of Guangdong Province(B2024035).
文摘Background:Diabetic retinopathy(DR)is one of the primary causes of visual impairment globally,resulting from microvascular abnormalities in the retina.Accurate segmentation of retinal blood vessels from fundus images plays a pivotal role in the early diagnosis,progression monitoring,and treatment planning of DR and related ocular conditions.Traditional convolutional neural networks often struggle with capturing the intricate structures of thin vessels under varied illumination and contrast conditions.Methods:In this study,we propose an improved U-Net-based framework named MSAC U-Net,which enhances feature extraction and reconstruction through multiscale and attention-based modules.Specifically,the encoder replaces standard convolutions with a Multiscale Asymmetric Convolution(MSAC)block,incorporating parallel 1×n,n×1,and n×n kernels at different scales(3×3,5×5,7×7)to effectively capture fine-grained vascular structures.To further refine spatial representation,skip connections are utilized,and the decoder is augmented with dual activation strategies,Squeeze-and-Excitation blocks,and Convolutional Block Attention Modules for improved contextual understanding.Results:The model was evaluated on the publicly available DRIVE dataset.It achieved an accuracy of 96.48%,sensitivity of 88.31%,specificity of 97.90%,and an AUC of 98.59%,demonstrating superior performance compared to several state-of-the-art segmentation methods.Conclusion:The proposed MSAC U-Net provides a robust and accurate approach for retinal vessel segmentation,offering substantial clinical value in the early detection and management of diabetic retinopathy.Its design contributes to enhanced segmentation reliability and may serve as a foundation for broader applications in medical image analysis.
文摘Nuclei segmentation is a challenging task in histopathology images.It is challenging due to the small size of objects,low contrast,touching boundaries,and complex structure of nuclei.Their segmentation and counting play an important role in cancer identification and its grading.In this study,WaveSeg-UNet,a lightweight model,is introduced to segment cancerous nuclei having touching boundaries.Residual blocks are used for feature extraction.Only one feature extractor block is used in each level of the encoder and decoder.Normally,images degrade quality and lose important information during down-sampling.To overcome this loss,discrete wavelet transform(DWT)alongside maxpooling is used in the down-sampling process.Inverse DWT is used to regenerate original images during up-sampling.In the bottleneck of the proposed model,atrous spatial channel pyramid pooling(ASCPP)is used to extract effective high-level features.The ASCPP is the modified pyramid pooling having atrous layers to increase the area of the receptive field.Spatial and channel-based attention are used to focus on the location and class of the identified objects.Finally,watershed transform is used as a post processing technique to identify and refine touching boundaries of nuclei.Nuclei are identified and counted to facilitate pathologists.The same domain of transfer learning is used to retrain the model for domain adaptability.Results of the proposed model are compared with state-of-the-art models,and it outperformed the existing studies.
基金Supported by the National Key Research and Development Program of China(2024YFB3311703)National Natural Science Foundation of China(61932003)Beijing Science and Technology Plan Project(Z221100006322003).
文摘In this paper,we introduce an innovative method for computer-aided design(CAD)segmentation by concatenating meshes and CAD models.Many previous CAD segmentation methods have achieved impressive performance using single representations,such as meshes,CAD,and point clouds.However,existing methods cannot effectively combine different three-dimensional model types for the direct conversion,alignment,and integrity maintenance of geometric and topological information.Hence,we propose an integration approach that combines the geometric accuracy of CAD data with the flexibility of mesh representations,as well as introduce a unique hybrid representation that combines CAD and mesh models to enhance segmentation accuracy.To combine these two model types,our hybrid system utilizes advanced-neural-network techniques to convert CAD models into mesh models.For complex CAD models,model segmentation is crucial for model retrieval and reuse.In partial retrieval,it aims to segment a complex CAD model into several simple components.The first component of our hybrid system involves advanced mesh-labeling algorithms that harness the digitization of CAD properties to mesh models.The second component integrates labelled face features for CAD segmentation by leveraging the abundant multisemantic information embedded in CAD models.This combination of mesh and CAD not only refines the accuracy of boundary delineation but also provides a comprehensive understanding of the underlying object semantics.This study uses the Fusion 360 Gallery dataset.Experimental results indicate that our hybrid method can segment these models with higher accuracy than other methods that use single representations.
文摘Thyroid nodules,a common disorder in the endocrine system,require accurate segmentation in ultrasound images for effective diagnosis and treatment.However,achieving precise segmentation remains a challenge due to various factors,including scattering noise,low contrast,and limited resolution in ultrasound images.Although existing segmentation models have made progress,they still suffer from several limitations,such as high error rates,low generalizability,overfitting,limited feature learning capability,etc.To address these challenges,this paper proposes a Multi-level Relation Transformer-based U-Net(MLRT-UNet)to improve thyroid nodule segmentation.The MLRTUNet leverages a novel Relation Transformer,which processes images at multiple scales,overcoming the limitations of traditional encoding methods.This transformer integrates both local and global features effectively through selfattention and cross-attention units,capturing intricate relationships within the data.The approach also introduces a Co-operative Transformer Fusion(CTF)module to combine multi-scale features from different encoding layers,enhancing the model’s ability to capture complex patterns in the data.Furthermore,the Relation Transformer block enhances long-distance dependencies during the decoding process,improving segmentation accuracy.Experimental results showthat the MLRT-UNet achieves high segmentation accuracy,reaching 98.2% on the Digital Database Thyroid Image(DDT)dataset,97.8% on the Thyroid Nodule 3493(TG3K)dataset,and 98.2% on the Thyroid Nodule3K(TN3K)dataset.These findings demonstrate that the proposed method significantly enhances the accuracy of thyroid nodule segmentation,addressing the limitations of existing models.
基金funded by Umm Al-Qura University,Saudi Arabia under grant number:25UQU4300346GSSR08.
文摘This study presents an advanced method for post-mortem person identification using the segmentation of skeletal structures from chest X-ray images.The proposed approach employs the Attention U-Net architecture,enhanced with gated attention mechanisms,to refine segmentation by emphasizing spatially relevant anatomical features while suppressing irrelevant details.By isolating skeletal structures which remain stable over time compared to soft tissues,this method leverages bones as reliable biometric markers for identity verification.The model integrates custom-designed encoder and decoder blocks with attention gates,achieving high segmentation precision.To evaluate the impact of architectural choices,we conducted an ablation study comparing Attention U-Net with and without attentionmechanisms,alongside an analysis of data augmentation effects.Training and evaluation were performed on a curated chest X-ray dataset,with segmentation performance measured using Dice score,precision,and loss functions,achieving over 98% precision and 94% Dice score.The extracted bone structures were further processed to derive unique biometric patterns,enabling robust and privacy-preserving person identification.Our findings highlight the effectiveness of attentionmechanisms in improving segmentation accuracy and underscore the potential of chest bonebased biometrics in forensic and medical imaging.This work paves the way for integrating artificial intelligence into real-world forensic workflows,offering a non-invasive and reliable solution for post-mortem identification.
基金funded by Zhejiang Basic Public Welfare Research Project,grant number LZY24E060001supported by Guangzhou Development Zone Science and Technology(2021GH10,2020GH10,2023GH02)+1 种基金the University of Macao(MYRG2022-00271-FST)the Science and Technology Development Fund(FDCT)of Macao(0032/2022/A).
文摘With the continuous development of artificial intelligence and machine learning techniques,there have been effective methods supporting the work of dermatologist in the field of skin cancer detection.However,object significant challenges have been presented in accurately segmenting melanomas in dermoscopic images due to the objects that could interfere human observations,such as bubbles and scales.To address these challenges,we propose a dual U-Net network framework for skin melanoma segmentation.In our proposed architecture,we introduce several innovative components that aim to enhance the performance and capabilities of the traditional U-Net.First,we establish a novel framework that links two simplified U-Nets,enabling more comprehensive information exchange and feature integration throughout the network.Second,after cascading the second U-Net,we introduce a skip connection between the decoder and encoder networks,and incorporate a modified receptive field block(MRFB),which is designed to capture multi-scale spatial information.Third,to further enhance the feature representation capabilities,we add a multi-path convolution block attention module(MCBAM)to the first two layers of the first U-Net encoding,and integrate a new squeeze-and-excitation(SE)mechanism with residual connections in the second U-Net.To illustrate the performance of our proposed model,we conducted comprehensive experiments on widely recognized skin datasets.On the ISIC-2017 dataset,the IoU value of our proposed model increased from 0.6406 to 0.6819 and the Dice coefficient increased from 0.7625 to 0.8023.On the ISIC-2018 dataset,the IoU value of proposed model also improved from 0.7138 to 0.7709,while the Dice coefficient increased from 0.8285 to 0.8665.Furthermore,the generalization experiments conducted on the jaw cyst dataset from Quzhou People’s Hospital further verified the outstanding segmentation performance of the proposed model.These findings collectively affirm the potential of our approach as a valuable tool in supporting clinical decision-making in the field of skin cancer detection,as well as advancing research in medical image analysis.
基金supported in part by the Jiangsu Province Construction System Science and Technology Project(No.2024ZD056)the Research Development Fund of Xi’an Jiaotong-Liverpool University(No.RDF-24-01-097).
文摘Computer-vision and deep-learning techniques are widely applied to detect,monitor,and assess pavement conditions including road crack detection.Traditional methods fail to achieve satisfactory accuracy and generalization performance in for crack detection.Complex network model can generate redundant feature maps and computational complexity.Therefore,this paper proposes a novel model compression framework based on deep learning to detect road cracks,which can improve the detection efficiency and accuracy.A distillation loss function is proposed to compress the teacher model,followed by channel pruning.Meanwhile,a multi-dilation model is proposed to improve the accuracy of the model pruned.The proposed method is tested on the public database CrackForest dataset(CFD).The experimental results show that the proposed method is more efficient and accurate than other state-of-art methods.
基金supported by Natural Science Foundation Programme of Gansu Province(No.24JRRA231)National Natural Science Foundation of China(No.62061023)Gansu Provincial Science and Technology Plan Key Research and Development Program Project(No.24YFFA024).
文摘Despite its remarkable performance on natural images,the segment anything model(SAM)lacks domain-specific information in medical imaging.and faces the challenge of losing local multi-scale information in the encoding phase.This paper presents a medical image segmentation model based on SAM with a local multi-scale feature encoder(LMSFE-SAM)to address the issues above.Firstly,based on the SAM,a local multi-scale feature encoder is introduced to improve the representation of features within local receptive field,thereby supplying the Vision Transformer(ViT)branch in SAM with enriched local multi-scale contextual information.At the same time,a multiaxial Hadamard product module(MHPM)is incorporated into the local multi-scale feature encoder in a lightweight manner to reduce the quadratic complexity and noise interference.Subsequently,a cross-branch balancing adapter is designed to balance the local and global information between the local multi-scale feature encoder and the ViT encoder in SAM.Finally,to obtain smaller input image size and to mitigate overlapping in patch embeddings,the size of the input image is reduced from 1024×1024 pixels to 256×256 pixels,and a multidimensional information adaptation component is developed,which includes feature adapters,position adapters,and channel-spatial adapters.This component effectively integrates the information from small-sized medical images into SAM,enhancing its suitability for clinical deployment.The proposed model demonstrates an average enhancement ranging from 0.0387 to 0.3191 across six objective evaluation metrics on BUSI,DDTI,and TN3K datasets compared to eight other representative image segmentation models.This significantly enhances the performance of the SAM on medical images,providing clinicians with a powerful tool in clinical diagnosis.
基金supported by the National Natural Science Foundation of China(Nos.81974355 and 82172524)Key Research and Development Program of Hubei Province(No.2021BEA161)+2 种基金National Innovation Platform Development Program(No.2020021105012440)Open Project Funding of the Hubei Key Laboratory of Big Data Intelligent Analysis and Application,Hubei University(No.2024BDIAA03)Free Innovation Preliminary Research Fund of Wuhan Union Hospital(No.2024XHYN047).
文摘Objective This study aimed to explore a novel method that integrates the segmentation guidance classification and the dif-fusion model augmentation to realize the automatic classification for tibial plateau fractures(TPFs).Methods YOLOv8n-cls was used to construct a baseline model on the data of 3781 patients from the Orthopedic Trauma Center of Wuhan Union Hospital.Additionally,a segmentation-guided classification approach was proposed.To enhance the dataset,a diffusion model was further demonstrated for data augmentation.Results The novel method that integrated the segmentation-guided classification and diffusion model augmentation sig-nificantly improved the accuracy and robustness of fracture classification.The average accuracy of classification for TPFs rose from 0.844 to 0.896.The comprehensive performance of the dual-stream model was also significantly enhanced after many rounds of training,with both the macro-area under the curve(AUC)and the micro-AUC increasing from 0.94 to 0.97.By utilizing diffusion model augmentation and segmentation map integration,the model demonstrated superior efficacy in identifying SchatzkerⅠ,achieving an accuracy of 0.880.It yielded an accuracy of 0.898 for SchatzkerⅡandⅢand 0.913 for SchatzkerⅣ;for SchatzkerⅤandⅥ,the accuracy was 0.887;and for intercondylar ridge fracture,the accuracy was 0.923.Conclusion The dual-stream attention-based classification network,which has been verified by many experiments,exhibited great potential in predicting the classification of TPFs.This method facilitates automatic TPF assessment and may assist surgeons in the rapid formulation of surgical plans.
基金the National Science and Technology Council(NSTC)of the Republic of China,Taiwan,for financially supporting this research under Contract No.NSTC 112-2637-M-131-001.
文摘Accurate and efficient brain tumor segmentation is essential for early diagnosis,treatment planning,and clinical decision-making.However,the complex structure of brain anatomy and the heterogeneous nature of tumors present significant challenges for precise anomaly detection.While U-Net-based architectures have demonstrated strong performance in medical image segmentation,there remains room for improvement in feature extraction and localization accuracy.In this study,we propose a novel hybrid model designed to enhance 3D brain tumor segmentation.The architecture incorporates a 3D ResNet encoder known for mitigating the vanishing gradient problem and a 3D U-Net decoder.Additionally,to enhance the model’s generalization ability,Squeeze and Excitation attention mechanism is integrated.We introduce Gabor filter banks into the encoder to further strengthen the model’s ability to extract robust and transformation-invariant features from the complex and irregular shapes typical in medical imaging.This approach,which is not well explored in current U-Net-based segmentation frameworks,provides a unique advantage by enhancing texture-aware feature representation.Specifically,Gabor filters help extract distinctive low-level texture features,reducing the effects of texture interference and facilitating faster convergence during the early stages of training.Our model achieved Dice scores of 0.881,0.846,and 0.819 for Whole Tumor(WT),Tumor Core(TC),and Enhancing Tumor(ET),respectively,on the BraTS 2020 dataset.Cross-validation on the BraTS 2021 dataset further confirmed the model’s robustness,yielding Dice score values of 0.887 for WT,0.856 for TC,and 0.824 for ET.The proposed model outperforms several state-of-the-art existing models,particularly in accurately identifying small and complex tumor regions.Extensive evaluations suggest integrating advanced preprocessing with an attention-augmented hybrid architecture offers significant potential for reliable and clinically valuable brain tumor segmentation.
基金Research Fund of Macao Polytechnic University(RP/FCSD-01/2022).
文摘Background Magnetic resonance imaging(MRI)has played an important role in the rapid growth of medical imaging diagnostic technology,especially in the diagnosis and treatment of brain tumors owing to its non invasive characteristics and superior soft tissue contrast.However,brain tumors are characterized by high non uniformity and non-obvious boundaries in MRI images because of their invasive and highly heterogeneous nature.In addition,the labeling of tumor areas is time-consuming and laborious.Methods To address these issues,this study uses a residual grouped convolution module,convolutional block attention module,and bilinear interpolation upsampling method to improve the classical segmentation network U-net.The influence of network normalization,loss function,and network depth on segmentation performance is further considered.Results In the experiments,the Dice score of the proposed segmentation model reached 97.581%,which is 12.438%higher than that of traditional U-net,demonstrating the effective segmentation of MRI brain tumor images.Conclusions In conclusion,we use the improved U-net network to achieve a good segmentation effect of brain tumor MRI images.
文摘Retinal blood vessel segmentation is crucial for diagnosing ocular and cardiovascular diseases.Although the introduction of U-Net in 2015 by Olaf Ronneberger significantly advanced this field,yet issues like limited training data,imbalance data distribution,and inadequate feature extraction persist,hindering both the segmentation performance and optimal model generalization.Addressing these critical issues,the DEFFA-Unet is proposed featuring an additional encoder to process domain-invariant pre-processed inputs,thereby improving both richer feature encoding and enhanced model generalization.A feature filtering fusion module is developed to ensure the precise feature filtering and robust hybrid feature fusion.In response to the task-specific need for higher precision where false positives are very costly,traditional skip connections are replaced with the attention-guided feature reconstructing fusion module.Additionally,innovative data augmentation and balancing methods are proposed to counter data scarcity and distribution imbalance,further boosting the robustness and generalization of the model.With a comprehensive suite of evaluation metrics,extensive validations on four benchmark datasets(DRIVE,CHASEDB1,STARE,and HRF)and an SLO dataset(IOSTAR),demonstrate the proposed method’s superiority over both baseline and state-of-the-art models.Particularly the proposed method significantly outperforms the compared methods in cross-validation model generalization.
基金The researchers would like to thank the Deanship of Graduate Studies and Scientific Research at Qassim University for financial support(QU-APC-2025).
文摘Deep learning(DL),derived from the domain of Artificial Neural Networks(ANN),forms one of the most essential components of modern deep learning algorithms.DL segmentation models rely on layer-by-layer convolution-based feature representation,guided by forward and backward propagation.Acritical aspect of this process is the selection of an appropriate activation function(AF)to ensure robustmodel learning.However,existing activation functions often fail to effectively address the vanishing gradient problem or are complicated by the need for manual parameter tuning.Most current research on activation function design focuses on classification tasks using natural image datasets such asMNIST,CIFAR-10,and CIFAR-100.To address this gap,this study proposesMed-ReLU,a novel activation function specifically designed for medical image segmentation.Med-ReLU prevents deep learning models fromsuffering dead neurons or vanishing gradient issues.It is a hybrid activation function that combines the properties of ReLU and Softsign.For positive inputs,Med-ReLU adopts the linear behavior of ReLU to avoid vanishing gradients,while for negative inputs,it exhibits the Softsign’s polynomial convergence,ensuring robust training and avoiding inactive neurons across the training set.The training performance and segmentation accuracy ofMed-ReLU have been thoroughly evaluated,demonstrating stable learning behavior and resistance to overfitting.It consistently outperforms state-of-the-art activation functions inmedical image segmentation tasks.Designed as a parameter-free function,Med-ReLU is simple to implement in complex deep learning architectures,and its effectiveness spans various neural network models and anomaly detection scenarios.
文摘The use of AI technologies in remote sensing(RS)tasks has been the focus of many individuals in both the professional and academic domains.Having more accessible interfaces and tools that allow people of little or no experience to intuitively interact with RS data of multiple formats is a potential provided by this integration.However,the use of AI and AI agents to help automate RS-related tasks is still in its infancy stage,with some frameworks and interfaces built on top of well-known vision language models(VLM)such as GPT-4,segment anything model(SAM),and grounding DINO.These tools do promise and draw guidelines on the potentials and limitations of existing solutions concerning the use of said models.In this work,the state of the art AI foundation models(FM)are reviewed and used in a multi-modal manner to ingest RS imagery input and perform zero-shot object detection using natural language.The natural language input is then used to define the classes or labels the model should look for,then,both inputs are fed to the pipeline.The pipeline presented in this work makes up for the shortcomings of the general knowledge FMs by stacking pre-processing and post-processing applications on top of the FMs;these applications include tiling to produce uniform patches of the original image for faster detection,outlier rejection of redundant bounding boxes using statistical and machine learning methods.The pipeline was tested with UAV,aerial and satellite images taken over multiple areas.The accuracy for the semantic segmentation showed improvement from the original 64%to approximately 80%-99%by utilizing the pipeline and techniques proposed in this work.GitHub Repository:MohanadDiab/LangRS.
基金Supported by National Natural Science Foundation of China(Grant Nos.52122510 and 52375415).
文摘The current method for inspecting microholes in printed circuit boards(PCBs)involves preparing slices followed by optical microscope measurements.However,this approach suffers from low detection efficiency,poor reliability,and insufficient measurement stability.Micro-CT enables the observation of the internal structures of the sample without the need for slicing,thereby presenting a promising new method for assessing the quality of microholes in PCBs.This study integrates computer vision technology with computed tomography(CT)to propose a method for detecting microhole wall roughness using a U-Net model and image processing algorithms.This study established an unplated copper PCB CT image dataset and trained an improved U-Net model.Validation of the test set demonstrated that the improved model effectively segmented microholes in the PCB CT images.Subsequently,the roughness of the holes’walls was assessed using a customized image-processing algorithm.Comparative analysis between CT detection based on various edge detection algorithms and slice detection revealed that CT detection employing the Canny algorithm closely approximates slice detection,yielding range and average errors of 2.92 and 1.64μm,respectively.Hence,the detection method proposed in this paper offers a novel approach for nondestructive testing of hole wall roughness in the PCB industry.
基金Natural Science Foundation of Zhejiang Province,Grant/Award Number:LY23F020025Science and Technology Commissioner Program of Huzhou,Grant/Award Number:2023GZ42Sichuan Provincial Science and Technology Support Program,Grant/Award Numbers:2023ZHCG0005,2023ZHCG0008。
文摘Data augmentation plays an important role in training deep neural model by expanding the size and diversity of the dataset.Initially,data augmentation mainly involved some simple transformations of images.Later,in order to increase the diversity and complexity of data,more advanced methods appeared and evolved to sophisticated generative models.However,these methods required a mass of computation of training or searching.In this paper,a novel training-free method that utilises the Pre-Trained Segment Anything Model(SAM)model as a data augmentation tool(PTSAM-DA)is proposed to generate the augmented annotations for images.Without the need for training,it obtains prompt boxes from the original annotations and then feeds the boxes to the pre-trained SAM to generate diverse and improved annotations.In this way,annotations are augmented more ingenious than simple manipulations without incurring huge computation for training a data augmentation model.Multiple comparative experiments on three datasets are conducted,including an in-house dataset,ADE20K and COCO2017.On this in-house dataset,namely Agricultural Plot Segmentation Dataset,maximum improvements of 3.77%and 8.92%are gained in two mainstream metrics,mIoU and mAcc,respectively.Consequently,large vision models like SAM are proven to be promising not only in image segmentation but also in data augmentation.