Fractures are critical to subsurface activities such as oil and gas extraction,geothermal energy production,and carbon storage.Hydraulic fracturing,a technique that enhances fluid production,creates complex fracture n...Fractures are critical to subsurface activities such as oil and gas extraction,geothermal energy production,and carbon storage.Hydraulic fracturing,a technique that enhances fluid production,creates complex fracture networks within rock formations containing natural discontinuities.Accurately distinguishing between hydraulically induced fractures and pre-existing discontinuities is essential for understanding hydraulic fracture mechanisms.However,this remains challenging due to the interconnected nature of fractures in three-dimensional(3D)space.Manual segmentation,while adaptive,is both labor-intensive and subjective,making it impractical for large-scale 3D datasets.This study introduces a deep learning-based progressive cross-sectional segmentation method to automate the classification of 3D fracture volumes.The proposed method was applied to a 3D hydraulic fracture network in a Montney cube sample,successfully segmenting natural fractures,parted bedding planes,and hydraulic fractures with minimal user intervention.The automated approach achieves a 99.6%reduction in manual image processing workload while maintaining high segmentation accuracy,with test accuracy exceeding 98%and F1-score over 84%.This approach generalizes well to Brazilian disc samples with different fracture patterns,achieving consistently high accuracy in distinguishing between bedding and non-bedding fractures.This automated fracture segmentation method offers an effective tool for enhanced quantitative characterization of fracture networks,which would contribute to a deeper understanding of hydraulic fracturing processes.展开更多
Segmenting vegetation in color images is a complex task, especially when the background and lighting conditions of the environment are uncontrolled. This paper proposes a vegetation segmentation algorithm that combine...Segmenting vegetation in color images is a complex task, especially when the background and lighting conditions of the environment are uncontrolled. This paper proposes a vegetation segmentation algorithm that combines a supervised and an unsupervised learning method to segment healthy and diseased plant images from the background. During the training stage, a Self-Organizing Map (SOM) neural network is applied to create different color groups from a set of images containing vegetation, acquired from a tomato greenhouse. The color groups are labeled as vegetation and non-vegetation and then used to create two color histogram models corresponding to vegetation and non-vegetation. In the online mode, input images are segmented by a Bayesian classifier using the two histogram models. This algorithm has provided a qualitatively better segmentation rate of images containing plants’ foliage in uncontrolled environments than the segmentation rate obtained by a color index technique, resulting in the elimination of the background and the preservation of important color information. This segmentation method will be applied in disease diagnosis of tomato plants in greenhouses as future work.展开更多
Acting as a pilot of the Square Kilometer Array (SKA), a Five hundred meter Aperture Spherical Telescope (FAST) project puts forward many innovative ideas, among which the design of the active main reflector shows fas...Acting as a pilot of the Square Kilometer Array (SKA), a Five hundred meter Aperture Spherical Telescope (FAST) project puts forward many innovative ideas, among which the design of the active main reflector shows fascinating potential. The main spherical reflector is to be composed of thousands of small spherical panels, which can be adjusted to fit a paraboloid of revolution in real time. For the construction and performance, the rms of the fit must be optimized, and so appropriate dimensional limits for the panels need to be determined. The issue of how to divide the spherical reflector mathematically is addressed in this paper. The advantages and drawbacks of various segmenting methods are discussed and an optimum one is suggested.展开更多
Segmenting Arabic handwritings had been one of the subjects of research in the field of Arabic character recognition for more than 25 years. The majority of reported segmentation techniques share a critical shortcomin...Segmenting Arabic handwritings had been one of the subjects of research in the field of Arabic character recognition for more than 25 years. The majority of reported segmentation techniques share a critical shortcoming, which is over-segmentation. The aim of segmentation is to produce the letters (segments) of a handwritten word. When a resulting letter (segment) is made of more than one piece (stroke) instead of one, this is called over-segmentation. Our objective is to overcome this problem by using an Artificial Neural Networks (ANN) to verify the resulting segment. We propose a set of heuristic-based rules to assemble strokes in order to report the precise segmented letters. Preprocessing phases that include normalization and feature extraction are required as a prerequisite step for the ANN system for recognition and verification. In our previous work [1], we did achieve a segmentation success rate of 86% but without recognition. In this work, our experimental results confirmed a segmentation success rate of no less than 95%.展开更多
The wide availability, low radiation dose and short acquisition time of Cone-Beam CT (CBCT) scans make them an attractive source of data for compiling databases of anatomical structures. However CBCT has higher noise ...The wide availability, low radiation dose and short acquisition time of Cone-Beam CT (CBCT) scans make them an attractive source of data for compiling databases of anatomical structures. However CBCT has higher noise and lower contrast than helical slice CT, which makes segmentation more challenging and the optimal methods are not yet known. This paper evaluates several methods of segmenting airway geometries (nares, nasal cavities and pharynx) from typical dental quality head and neck CBCT data. The nasal cavity has narrow and intricate passages and is separated from the paranasal sinuses by thin walls, making it is susceptible to either over- or under-segmentation. The upper airway was split into two: the nasal cavity and the pharyngeal region (nasopharynx to larynx). Each part was segmented using global thresholding, multi-step level-set, and region competition methods (the latter using thresholding, clustering and classification initialisation and edge attraction techniques). The segmented 3D surfaces were evaluated against a reference manual segmentation using distance-, overlap- and volume-based metrics. Global thresholding, multi-step level-set, and region competition all gave satisfactory results for the lower part of the airway (nasopharynx to larynx). Edge attraction failed completely. A semi-automatic region-growing segmentation with multi-thresholding (or classification) initialization offered the best quality segmentation. With some minimal manual editing, it resulted in an accurate upper airway model, as judged by the similarity and volumetric indices, while being the least time consuming of the semi-automatic methods, and relying the least on the operator’s expertise.展开更多
Visual attention mechanisms allow humans to extract relevant and important information from raw input percepts. Many applications in robotics and computer vision have modeled human visual attention mechanisms using a ...Visual attention mechanisms allow humans to extract relevant and important information from raw input percepts. Many applications in robotics and computer vision have modeled human visual attention mechanisms using a bottom-up data centric approach. In contrast, recent studies in cognitive science highlight advantages of a top-down approach to the attention mechanisms, especially in applications involving goal-directed search. In this paper, we propose a top-down approach for extracting salient objects/regions of space. The top-down methodology first isolates different objects in an unorganized point cloud, and compares each object for uniqueness. A measure of saliency using the properties of geodesic distance on the object’s surface is defined. Our method works on 3D point cloud data, and identifies salient objects of high curvature and unique silhouette. These being the most unique features of a scene, are robust to clutter, occlusions and view point changes. We provide the details of the proposed method and initial experimental results.展开更多
Segmentation of vegetation remote sensing images can minimize the interference of background,thus achieving efficient monitoring and analysis for vegetation information.The segmentation of vegetation poses a significa...Segmentation of vegetation remote sensing images can minimize the interference of background,thus achieving efficient monitoring and analysis for vegetation information.The segmentation of vegetation poses a significant challenge due to the inherently complex environmental conditions.Currently,there is a growing trend of using spectral sensing combined with deep learning for field vegetation segmentation to cope with complex environ-ments.However,two major constraints remain:the high cost of equipment required for field spectral data collection;the availability of field datasets is limited and data annotation is time-consuming and labor-intensive.To address these challenges,we propose a weakly supervised approach for field vegetation segmentation by using spectral reconstruction(SR)techniques as the foundation and drawing on the theory of vegetation index(Ⅵ).Specifically,to reduce the cost of data acquisition,we propose SRCNet and SRANet based on convolution and attention structure to reconstruct multispectral images of fields,respectively.Then,borrowing from theⅥprinciple,we aggregate the reconstructed data to establish the connection of spectral bands,obtaining more salient vegetation information.Finally,we employ the adaptation strategy to segment the fused feature map using a weakly supervised method,which does not require manual labeling to obtain a field vegetation segmentation result.Our segmentation method can achieve a Mean Intersection over Union(MIoU)of 0.853 on real field datasets,which outperforms the existing methods.In addition,we have open-sourced a dataset of unmanned aerial vehicle(UAV)RGB-multispectral images,comprising 2358 pairs of samples,to improve the richness of remote sensing agricultural data.The code and data are available at egment_SR,and.展开更多
Advanced traffic monitoring systems encounter substantial challenges in vehicle detection and classification due to the limitations of conventional methods,which often demand extensive computational resources and stru...Advanced traffic monitoring systems encounter substantial challenges in vehicle detection and classification due to the limitations of conventional methods,which often demand extensive computational resources and struggle with diverse data acquisition techniques.This research presents a novel approach for vehicle classification and recognition in aerial image sequences,integrating multiple advanced techniques to enhance detection accuracy.The proposed model begins with preprocessing using Multiscale Retinex(MSR)to enhance image quality,followed by Expectation-Maximization(EM)Segmentation for precise foreground object identification.Vehicle detection is performed using the state-of-the-art YOLOv10 framework,while feature extraction incorporates Maximally Stable Extremal Regions(MSER),Dense Scale-Invariant Feature Transform(Dense SIFT),and Zernike Moments Features to capture distinct object characteristics.Feature optimization is further refined through a Hybrid Swarm-based Optimization algorithm,ensuring optimal feature selection for improved classification performance.The final classification is conducted using a Vision Transformer,leveraging its robust learning capabilities for enhanced accuracy.Experimental evaluations on benchmark datasets,including UAVDT and the Unmanned Aerial Vehicle Intruder Dataset(UAVID),demonstrate the superiority of the proposed approach,achieving an accuracy of 94.40%on UAVDT and 93.57%on UAVID.The results highlight the efficacy of the model in significantly enhancing vehicle detection and classification in aerial imagery,outperforming existing methodologies and offering a statistically validated improvement for intelligent traffic monitoring systems compared to existing approaches.展开更多
Brain tumors require precise segmentation for diagnosis and treatment plans due to their complex morphology and heterogeneous characteristics.While MRI-based automatic brain tumor segmentation technology reduces the b...Brain tumors require precise segmentation for diagnosis and treatment plans due to their complex morphology and heterogeneous characteristics.While MRI-based automatic brain tumor segmentation technology reduces the burden on medical staff and provides quantitative information,existing methodologies and recent models still struggle to accurately capture and classify the fine boundaries and diverse morphologies of tumors.In order to address these challenges and maximize the performance of brain tumor segmentation,this research introduces a novel SwinUNETR-based model by integrating a new decoder block,the Hierarchical Channel-wise Attention Decoder(HCAD),into a powerful SwinUNETR encoder.The HCAD decoder block utilizes hierarchical features and channelspecific attention mechanisms to further fuse information at different scales transmitted from the encoder and preserve spatial details throughout the reconstruction phase.Rigorous evaluations on the recent BraTS GLI datasets demonstrate that the proposed SwinHCAD model achieved superior and improved segmentation accuracy on both the Dice score and HD95 metrics across all tumor subregions(WT,TC,and ET)compared to baseline models.In particular,the rationale and contribution of the model design were clarified through ablation studies to verify the effectiveness of the proposed HCAD decoder block.The results of this study are expected to greatly contribute to enhancing the efficiency of clinical diagnosis and treatment planning by increasing the precision of automated brain tumor segmentation.展开更多
High-resolution remote sensing images(HRSIs)are now an essential data source for gathering surface information due to advancements in remote sensing data capture technologies.However,their significant scale changes an...High-resolution remote sensing images(HRSIs)are now an essential data source for gathering surface information due to advancements in remote sensing data capture technologies.However,their significant scale changes and wealth of spatial details pose challenges for semantic segmentation.While convolutional neural networks(CNNs)excel at capturing local features,they are limited in modeling long-range dependencies.Conversely,transformers utilize multihead self-attention to integrate global context effectively,but this approach often incurs a high computational cost.This paper proposes a global-local multiscale context network(GLMCNet)to extract both global and local multiscale contextual information from HRSIs.A detail-enhanced filtering module(DEFM)is proposed at the end of the encoder to refine the encoder outputs further,thereby enhancing the key details extracted by the encoder and effectively suppressing redundant information.In addition,a global-local multiscale transformer block(GLMTB)is proposed in the decoding stage to enable the modeling of rich multiscale global and local information.We also design a stair fusion mechanism to transmit deep semantic information from deep to shallow layers progressively.Finally,we propose the semantic awareness enhancement module(SAEM),which further enhances the representation of multiscale semantic features through spatial attention and covariance channel attention.Extensive ablation analyses and comparative experiments were conducted to evaluate the performance of the proposed method.Specifically,our method achieved a mean Intersection over Union(mIoU)of 86.89%on the ISPRS Potsdam dataset and 84.34%on the ISPRS Vaihingen dataset,outperforming existing models such as ABCNet and BANet.展开更多
This systematic review aims to comprehensively examine and compare deep learning methods for brain tumor segmentation and classification using MRI and other imaging modalities,focusing on recent trends from 2022 to 20...This systematic review aims to comprehensively examine and compare deep learning methods for brain tumor segmentation and classification using MRI and other imaging modalities,focusing on recent trends from 2022 to 2025.The primary objective is to evaluate methodological advancements,model performance,dataset usage,and existing challenges in developing clinically robust AI systems.We included peer-reviewed journal articles and highimpact conference papers published between 2022 and 2025,written in English,that proposed or evaluated deep learning methods for brain tumor segmentation and/or classification.Excluded were non-open-access publications,books,and non-English articles.A structured search was conducted across Scopus,Google Scholar,Wiley,and Taylor&Francis,with the last search performed in August 2025.Risk of bias was not formally quantified but considered during full-text screening based on dataset diversity,validation methods,and availability of performance metrics.We used narrative synthesis and tabular benchmarking to compare performance metrics(e.g.,accuracy,Dice score)across model types(CNN,Transformer,Hybrid),imaging modalities,and datasets.A total of 49 studies were included(43 journal articles and 6 conference papers).These studies spanned over 9 public datasets(e.g.,BraTS,Figshare,REMBRANDT,MOLAB)and utilized a range of imaging modalities,predominantly MRI.Hybrid models,especially ResViT and UNetFormer,consistently achieved high performance,with classification accuracy exceeding 98%and segmentation Dice scores above 0.90 across multiple studies.Transformers and hybrid architectures showed increasing adoption post2023.Many studies lacked external validation and were evaluated only on a few benchmark datasets,raising concerns about generalizability and dataset bias.Few studies addressed clinical interpretability or uncertainty quantification.Despite promising results,particularly for hybrid deep learning models,widespread clinical adoption remains limited due to lack of validation,interpretability concerns,and real-world deployment barriers.展开更多
Deep learning-based methods have become alternatives to traditional numerical weather prediction systems,offering faster computation and the ability to utilize large historical datasets.However,the application of deep...Deep learning-based methods have become alternatives to traditional numerical weather prediction systems,offering faster computation and the ability to utilize large historical datasets.However,the application of deep learning to medium-range regional weather forecasting with limited data remains a significant challenge.In this work,three key solutions are proposed:(1)motivated by the need to improve model performance in data-scarce regional forecasting scenarios,the authors innovatively apply semantic segmentation models,to better capture spatiotemporal features and improve prediction accuracy;(2)recognizing the challenge of overfitting and the inability of traditional noise-based data augmentation methods to effectively enhance model robustness,a novel learnable Gaussian noise mechanism is introduced that allows the model to adaptively optimize perturbations for different locations,ensuring more effective learning;and(3)to address the issue of error accumulation in autoregressive prediction,as well as the challenge of learning difficulty and the lack of intermediate data utilization in one-shot prediction,the authors propose a cascade prediction approach that effectively resolves these problems while significantly improving model forecasting performance.The method achieves a competitive result in The East China Regional AI Medium Range Weather Forecasting Competition.Ablation experiments further validate the effectiveness of each component,highlighting their contributions to enhancing prediction performance.展开更多
Modern manufacturing processes have become more reliant on automation because of the accelerated transition from Industry 3.0 to Industry 4.0.Manual inspection of products on assembly lines remains inefficient,prone t...Modern manufacturing processes have become more reliant on automation because of the accelerated transition from Industry 3.0 to Industry 4.0.Manual inspection of products on assembly lines remains inefficient,prone to errors and lacks consistency,emphasizing the need for a reliable and automated inspection system.Leveraging both object detection and image segmentation approaches,this research proposes a vision-based solution for the detection of various kinds of tools in the toolkit using deep learning(DL)models.Two Intel RealSense D455f depth cameras were arranged in a top down configuration to capture both RGB and depth images of the toolkits.After applying multiple constraints and enhancing them through preprocessing and augmentation,a dataset consisting of 3300 annotated RGB-D photos was generated.Several DL models were selected through a comprehensive assessment of mean Average Precision(mAP),precision-recall equilibrium,inference latency(target≥30 FPS),and computational burden,resulting in a preference for YOLO and Region-based Convolutional Neural Networks(R-CNN)variants over ViT-based models due to the latter’s increased latency and resource requirements.YOLOV5,YOLOV8,YOLOV11,Faster R-CNN,and Mask R-CNN were trained on the annotated dataset and evaluated using key performance metrics(Recall,Accuracy,F1-score,and Precision).YOLOV11 demonstrated balanced excellence with 93.0%precision,89.9%recall,and a 90.6%F1-score in object detection,as well as 96.9%precision,95.3%recall,and a 96.5%F1-score in instance segmentation with an average inference time of 25 ms per frame(≈40 FPS),demonstrating real-time performance.Leveraging these results,a YOLOV11-based windows application was successfully deployed in a real-time assembly line environment,where it accurately processed live video streams to detect and segment tools within toolkits,demonstrating its practical effectiveness in industrial automation.The application is capable of precisely measuring socket dimensions by utilising edge detection techniques on YOLOv11 segmentation masks,in addition to detection and segmentation.This makes it possible to do specification-level quality control right on the assembly line,which improves the ability to examine things in real time.The implementation is a big step forward for intelligent manufacturing in the Industry 4.0 paradigm.It provides a scalable,efficient,and accurate way to do automated inspection and dimensional verification activities.展开更多
We develop a novel network to segment water with significant appearance variation in videos.Unlike existing state-of-the-art video segmentation approaches that use a pre-trained feature recognition network and several...We develop a novel network to segment water with significant appearance variation in videos.Unlike existing state-of-the-art video segmentation approaches that use a pre-trained feature recognition network and several previous frames to guide segmentation,we accommodate the object’s appearance variation by considering features observed from the current frame.When dealing with segmentation of objects such as water,whose appearance is non-uniform and changing dynamically,our pipeline can produce more reliable and accurate segmentation results than existing algorithms.展开更多
Caenorhabditis elegans has been widely used as a model organism in developmental biology due to its invariant development.In this study,we developed a desktop software CShaperApp to segment fluorescence-labeled images...Caenorhabditis elegans has been widely used as a model organism in developmental biology due to its invariant development.In this study,we developed a desktop software CShaperApp to segment fluorescence-labeled images of cell membranes and analyze cellular morphologies interactively during C.elegans embryogenesis.Based on the previously proposed framework CShaper,CShaperApp empowers biologists to automatically and efficiently extract quantitative cellular morphological data with either an existing deep learning model or a fine-tuned one adapted to their in-house dataset.Experimental results show that it takes about 30 min to process a three-dimensional time-lapse(4D)dataset,which consists of 150 image stacks at a~1.5-min interval and covers C.elegans embryogenesis from the 4-cell to 350-cell stages.The robustness of CShaperApp is also validated with the datasets from different laboratories.Furthermore,modularized implementation increases the flexibility in multi-task applications and promotes its flexibility for future enhancements.As cell morphology over development has emerged as a focus of interest in developmental biology,CShaperApp is anticipated to pave the way for those studies by accelerating the high-throughput generation of systems-level quantitative data collection.The software can be freely downloaded from the website of Github(cao13jf/CShaperApp)and is executable on Windows,macOS,and Linux operating systems.展开更多
Segmenting skin lesions is critical for early skin cancer detection.Existing CNN and Transformer-based methods face challenges such as high computational complexity and limited adaptability to variations in lesion siz...Segmenting skin lesions is critical for early skin cancer detection.Existing CNN and Transformer-based methods face challenges such as high computational complexity and limited adaptability to variations in lesion sizes.To overcome these limitations,we introduce MSAMamba-UNet,a lightweight model that integrates two novel architectures:Multi-Scale Mamba(MSMamba)and Adaptive Dynamic Gating Block(ADGB).MSMamba utilizes multi-scale decomposition and a parallel hierarchical structure to enhance the delineation of irregular lesion boundaries and sensitivity to small targets.ADGB dynamically selects convolutional kernels with varying receptive fields based on input features,improving the model’s capacity to accommodate diverse lesion textures and scales.Additionally,we introduce a Mix Attention Fusion Block(MAF)to enhance shallow feature representation by integrating parallel channel and pixel attention mechanisms.Extensive evaluation of MSAMamba-UNet on the ISIC 2016,ISIC 2017,and ISIC 2018 datasets demonstrates competitive segmentation accuracy with only 0.056 M parameters and 0.069 GFLOPs.Our experiments revealed that MSAMamba-UNet achieved IoU scores of 85.53%,85.47%,and 82.22%,as well as DSC scores of 92.20%,92.17%,and 90.24%,respectively.These results underscore the lightweight design and effectiveness of MSAMamba-UNet.展开更多
Detecting pavement cracks is critical for road safety and infrastructure management.Traditional methods,relying on manual inspection and basic image processing,are time-consuming and prone to errors.Recent deep-learni...Detecting pavement cracks is critical for road safety and infrastructure management.Traditional methods,relying on manual inspection and basic image processing,are time-consuming and prone to errors.Recent deep-learning(DL)methods automate crack detection,but many still struggle with variable crack patterns and environmental conditions.This study aims to address these limitations by introducing the Masker Transformer,a novel hybrid deep learning model that integrates the precise localization capabilities of Mask Region-based Convolutional Neural Network(Mask R-CNN)with the global contextual awareness of Vision Transformer(ViT).The research focuses on leveraging the strengths of both architectures to enhance segmentation accuracy and adaptability across different pavement conditions.We evaluated the performance of theMaskerTransformer against other state-of-theartmodels such asU-Net,TransformerU-Net(TransUNet),U-NetTransformer(UNETr),SwinU-NetTransformer(Swin-UNETr),You Only Look Once version 8(YoloV8),and Mask R-CNN using two benchmark datasets:Crack500 and DeepCrack.The findings reveal that the MaskerTransformer significantly outperforms the existing models,achieving the highest Dice SimilarityCoefficient(DSC),precision,recall,and F1-Score across both datasets.Specifically,the model attained a DSC of 80.04%on Crack500 and 91.37%on DeepCrack,demonstrating superior segmentation accuracy and reliability.The high precision and recall rates further substantiate its effectiveness in real-world applications,suggesting that the Masker Transformer can serve as a robust tool for automated pavement crack detection,potentially replacing more traditional methods.展开更多
Retinal aging has been recognized as a significant risk factor for various retinal disorders,including diabetic retinopathy,age-related macular degeneration,and glaucoma,following a growing understanding of the molecu...Retinal aging has been recognized as a significant risk factor for various retinal disorders,including diabetic retinopathy,age-related macular degeneration,and glaucoma,following a growing understanding of the molecular underpinnings of their development.This comprehensive review explores the mechanisms of retinal aging and investigates potential neuroprotective approaches,focusing on the activation of transcription factor EB.Recent meta-analyses have demonstrated promising outcomes of transcription factor EB-targeted strategies,such as exercise,calorie restriction,rapamycin,and metformin,in patients and animal models of these common retinal diseases.The review critically assesses the role of transcription factor EB in retinal biology during aging,its neuroprotective effects,and its therapeutic potential for retinal disorders.The impact of transcription factor EB on retinal aging is cell-specific,influencing metabolic reprogramming and energy homeostasis in retinal neurons through the regulation of mitochondrial quality control and nutrient-sensing pathways.In vascular endothelial cells,transcription factor EB controls important processes,including endothelial cell proliferation,endothelial tube formation,and nitric oxide levels,thereby influencing the inner blood-retinal barrier,angiogenesis,and retinal microvasculature.Additionally,transcription factor EB affects vascular smooth muscle cells,inhibiting vascular calcification and atherogenesis.In retinal pigment epithelial cells,transcription factor EB modulates functions such as autophagy,lysosomal dynamics,and clearance of the aging pigment lipofuscin,thereby promoting photoreceptor survival and regulating vascular endothelial growth factor A expression involved in neovascularization.These cell-specific functions of transcription factor EB significantly impact retinal aging mechanisms encompassing proteostasis,neuronal synapse plasticity,energy metabolism,microvasculature,and inflammation,ultimately offering protection against retinal aging and diseases.The review emphasizes transcription factor EB as a potential therapeutic target for retinal diseases.Therefore,it is imperative to obtain well-controlled direct experimental evidence to confirm the efficacy of transcription factor EB modulation in retinal diseases while minimizing its risk of adverse effects.展开更多
基金supported through the Natural Sciences and Engineering Research Council of Canada(NSERC)Discovery Grants 341275,CRDPJ 543894-19NSERC/Energi Simulation Industrial Research Chair program.
文摘Fractures are critical to subsurface activities such as oil and gas extraction,geothermal energy production,and carbon storage.Hydraulic fracturing,a technique that enhances fluid production,creates complex fracture networks within rock formations containing natural discontinuities.Accurately distinguishing between hydraulically induced fractures and pre-existing discontinuities is essential for understanding hydraulic fracture mechanisms.However,this remains challenging due to the interconnected nature of fractures in three-dimensional(3D)space.Manual segmentation,while adaptive,is both labor-intensive and subjective,making it impractical for large-scale 3D datasets.This study introduces a deep learning-based progressive cross-sectional segmentation method to automate the classification of 3D fracture volumes.The proposed method was applied to a 3D hydraulic fracture network in a Montney cube sample,successfully segmenting natural fractures,parted bedding planes,and hydraulic fractures with minimal user intervention.The automated approach achieves a 99.6%reduction in manual image processing workload while maintaining high segmentation accuracy,with test accuracy exceeding 98%and F1-score over 84%.This approach generalizes well to Brazilian disc samples with different fracture patterns,achieving consistently high accuracy in distinguishing between bedding and non-bedding fractures.This automated fracture segmentation method offers an effective tool for enhanced quantitative characterization of fracture networks,which would contribute to a deeper understanding of hydraulic fracturing processes.
文摘Segmenting vegetation in color images is a complex task, especially when the background and lighting conditions of the environment are uncontrolled. This paper proposes a vegetation segmentation algorithm that combines a supervised and an unsupervised learning method to segment healthy and diseased plant images from the background. During the training stage, a Self-Organizing Map (SOM) neural network is applied to create different color groups from a set of images containing vegetation, acquired from a tomato greenhouse. The color groups are labeled as vegetation and non-vegetation and then used to create two color histogram models corresponding to vegetation and non-vegetation. In the online mode, input images are segmented by a Bayesian classifier using the two histogram models. This algorithm has provided a qualitatively better segmentation rate of images containing plants’ foliage in uncontrolled environments than the segmentation rate obtained by a color index technique, resulting in the elimination of the background and the preservation of important color information. This segmentation method will be applied in disease diagnosis of tomato plants in greenhouses as future work.
文摘Acting as a pilot of the Square Kilometer Array (SKA), a Five hundred meter Aperture Spherical Telescope (FAST) project puts forward many innovative ideas, among which the design of the active main reflector shows fascinating potential. The main spherical reflector is to be composed of thousands of small spherical panels, which can be adjusted to fit a paraboloid of revolution in real time. For the construction and performance, the rms of the fit must be optimized, and so appropriate dimensional limits for the panels need to be determined. The issue of how to divide the spherical reflector mathematically is addressed in this paper. The advantages and drawbacks of various segmenting methods are discussed and an optimum one is suggested.
文摘Segmenting Arabic handwritings had been one of the subjects of research in the field of Arabic character recognition for more than 25 years. The majority of reported segmentation techniques share a critical shortcoming, which is over-segmentation. The aim of segmentation is to produce the letters (segments) of a handwritten word. When a resulting letter (segment) is made of more than one piece (stroke) instead of one, this is called over-segmentation. Our objective is to overcome this problem by using an Artificial Neural Networks (ANN) to verify the resulting segment. We propose a set of heuristic-based rules to assemble strokes in order to report the precise segmented letters. Preprocessing phases that include normalization and feature extraction are required as a prerequisite step for the ANN system for recognition and verification. In our previous work [1], we did achieve a segmentation success rate of 86% but without recognition. In this work, our experimental results confirmed a segmentation success rate of no less than 95%.
文摘The wide availability, low radiation dose and short acquisition time of Cone-Beam CT (CBCT) scans make them an attractive source of data for compiling databases of anatomical structures. However CBCT has higher noise and lower contrast than helical slice CT, which makes segmentation more challenging and the optimal methods are not yet known. This paper evaluates several methods of segmenting airway geometries (nares, nasal cavities and pharynx) from typical dental quality head and neck CBCT data. The nasal cavity has narrow and intricate passages and is separated from the paranasal sinuses by thin walls, making it is susceptible to either over- or under-segmentation. The upper airway was split into two: the nasal cavity and the pharyngeal region (nasopharynx to larynx). Each part was segmented using global thresholding, multi-step level-set, and region competition methods (the latter using thresholding, clustering and classification initialisation and edge attraction techniques). The segmented 3D surfaces were evaluated against a reference manual segmentation using distance-, overlap- and volume-based metrics. Global thresholding, multi-step level-set, and region competition all gave satisfactory results for the lower part of the airway (nasopharynx to larynx). Edge attraction failed completely. A semi-automatic region-growing segmentation with multi-thresholding (or classification) initialization offered the best quality segmentation. With some minimal manual editing, it resulted in an accurate upper airway model, as judged by the similarity and volumetric indices, while being the least time consuming of the semi-automatic methods, and relying the least on the operator’s expertise.
文摘Visual attention mechanisms allow humans to extract relevant and important information from raw input percepts. Many applications in robotics and computer vision have modeled human visual attention mechanisms using a bottom-up data centric approach. In contrast, recent studies in cognitive science highlight advantages of a top-down approach to the attention mechanisms, especially in applications involving goal-directed search. In this paper, we propose a top-down approach for extracting salient objects/regions of space. The top-down methodology first isolates different objects in an unorganized point cloud, and compares each object for uniqueness. A measure of saliency using the properties of geodesic distance on the object’s surface is defined. Our method works on 3D point cloud data, and identifies salient objects of high curvature and unique silhouette. These being the most unique features of a scene, are robust to clutter, occlusions and view point changes. We provide the details of the proposed method and initial experimental results.
基金supported by National Key R&D Program of China(2024YFD2001100,2024YFE0214300)National Natural Science Foundation of China(No.62162008)+3 种基金Guizhou Provincial Science and Technology Projects([2024]002,CXTD[2023]027)Guizhou Province Youth Science and Technology Talent Project([2024]317)Guiyang Guian Science and Technology Talent Training Project([2024]2-15)The Talent Introduction Program of Guizhou University under Grant No.(2021)89.
文摘Segmentation of vegetation remote sensing images can minimize the interference of background,thus achieving efficient monitoring and analysis for vegetation information.The segmentation of vegetation poses a significant challenge due to the inherently complex environmental conditions.Currently,there is a growing trend of using spectral sensing combined with deep learning for field vegetation segmentation to cope with complex environ-ments.However,two major constraints remain:the high cost of equipment required for field spectral data collection;the availability of field datasets is limited and data annotation is time-consuming and labor-intensive.To address these challenges,we propose a weakly supervised approach for field vegetation segmentation by using spectral reconstruction(SR)techniques as the foundation and drawing on the theory of vegetation index(Ⅵ).Specifically,to reduce the cost of data acquisition,we propose SRCNet and SRANet based on convolution and attention structure to reconstruct multispectral images of fields,respectively.Then,borrowing from theⅥprinciple,we aggregate the reconstructed data to establish the connection of spectral bands,obtaining more salient vegetation information.Finally,we employ the adaptation strategy to segment the fused feature map using a weakly supervised method,which does not require manual labeling to obtain a field vegetation segmentation result.Our segmentation method can achieve a Mean Intersection over Union(MIoU)of 0.853 on real field datasets,which outperforms the existing methods.In addition,we have open-sourced a dataset of unmanned aerial vehicle(UAV)RGB-multispectral images,comprising 2358 pairs of samples,to improve the richness of remote sensing agricultural data.The code and data are available at egment_SR,and.
文摘Advanced traffic monitoring systems encounter substantial challenges in vehicle detection and classification due to the limitations of conventional methods,which often demand extensive computational resources and struggle with diverse data acquisition techniques.This research presents a novel approach for vehicle classification and recognition in aerial image sequences,integrating multiple advanced techniques to enhance detection accuracy.The proposed model begins with preprocessing using Multiscale Retinex(MSR)to enhance image quality,followed by Expectation-Maximization(EM)Segmentation for precise foreground object identification.Vehicle detection is performed using the state-of-the-art YOLOv10 framework,while feature extraction incorporates Maximally Stable Extremal Regions(MSER),Dense Scale-Invariant Feature Transform(Dense SIFT),and Zernike Moments Features to capture distinct object characteristics.Feature optimization is further refined through a Hybrid Swarm-based Optimization algorithm,ensuring optimal feature selection for improved classification performance.The final classification is conducted using a Vision Transformer,leveraging its robust learning capabilities for enhanced accuracy.Experimental evaluations on benchmark datasets,including UAVDT and the Unmanned Aerial Vehicle Intruder Dataset(UAVID),demonstrate the superiority of the proposed approach,achieving an accuracy of 94.40%on UAVDT and 93.57%on UAVID.The results highlight the efficacy of the model in significantly enhancing vehicle detection and classification in aerial imagery,outperforming existing methodologies and offering a statistically validated improvement for intelligent traffic monitoring systems compared to existing approaches.
基金supported by Institute of Information&Communications Technology Planning&Evaluation(IITP)under the Metaverse Support Program to Nurture the Best Talents(IITP-2024-RS-2023-00254529)grant funded by the Korea government(MSIT).
文摘Brain tumors require precise segmentation for diagnosis and treatment plans due to their complex morphology and heterogeneous characteristics.While MRI-based automatic brain tumor segmentation technology reduces the burden on medical staff and provides quantitative information,existing methodologies and recent models still struggle to accurately capture and classify the fine boundaries and diverse morphologies of tumors.In order to address these challenges and maximize the performance of brain tumor segmentation,this research introduces a novel SwinUNETR-based model by integrating a new decoder block,the Hierarchical Channel-wise Attention Decoder(HCAD),into a powerful SwinUNETR encoder.The HCAD decoder block utilizes hierarchical features and channelspecific attention mechanisms to further fuse information at different scales transmitted from the encoder and preserve spatial details throughout the reconstruction phase.Rigorous evaluations on the recent BraTS GLI datasets demonstrate that the proposed SwinHCAD model achieved superior and improved segmentation accuracy on both the Dice score and HD95 metrics across all tumor subregions(WT,TC,and ET)compared to baseline models.In particular,the rationale and contribution of the model design were clarified through ablation studies to verify the effectiveness of the proposed HCAD decoder block.The results of this study are expected to greatly contribute to enhancing the efficiency of clinical diagnosis and treatment planning by increasing the precision of automated brain tumor segmentation.
基金provided by the Science Research Project of Hebei Education Department under grant No.BJK2024115.
文摘High-resolution remote sensing images(HRSIs)are now an essential data source for gathering surface information due to advancements in remote sensing data capture technologies.However,their significant scale changes and wealth of spatial details pose challenges for semantic segmentation.While convolutional neural networks(CNNs)excel at capturing local features,they are limited in modeling long-range dependencies.Conversely,transformers utilize multihead self-attention to integrate global context effectively,but this approach often incurs a high computational cost.This paper proposes a global-local multiscale context network(GLMCNet)to extract both global and local multiscale contextual information from HRSIs.A detail-enhanced filtering module(DEFM)is proposed at the end of the encoder to refine the encoder outputs further,thereby enhancing the key details extracted by the encoder and effectively suppressing redundant information.In addition,a global-local multiscale transformer block(GLMTB)is proposed in the decoding stage to enable the modeling of rich multiscale global and local information.We also design a stair fusion mechanism to transmit deep semantic information from deep to shallow layers progressively.Finally,we propose the semantic awareness enhancement module(SAEM),which further enhances the representation of multiscale semantic features through spatial attention and covariance channel attention.Extensive ablation analyses and comparative experiments were conducted to evaluate the performance of the proposed method.Specifically,our method achieved a mean Intersection over Union(mIoU)of 86.89%on the ISPRS Potsdam dataset and 84.34%on the ISPRS Vaihingen dataset,outperforming existing models such as ABCNet and BANet.
文摘This systematic review aims to comprehensively examine and compare deep learning methods for brain tumor segmentation and classification using MRI and other imaging modalities,focusing on recent trends from 2022 to 2025.The primary objective is to evaluate methodological advancements,model performance,dataset usage,and existing challenges in developing clinically robust AI systems.We included peer-reviewed journal articles and highimpact conference papers published between 2022 and 2025,written in English,that proposed or evaluated deep learning methods for brain tumor segmentation and/or classification.Excluded were non-open-access publications,books,and non-English articles.A structured search was conducted across Scopus,Google Scholar,Wiley,and Taylor&Francis,with the last search performed in August 2025.Risk of bias was not formally quantified but considered during full-text screening based on dataset diversity,validation methods,and availability of performance metrics.We used narrative synthesis and tabular benchmarking to compare performance metrics(e.g.,accuracy,Dice score)across model types(CNN,Transformer,Hybrid),imaging modalities,and datasets.A total of 49 studies were included(43 journal articles and 6 conference papers).These studies spanned over 9 public datasets(e.g.,BraTS,Figshare,REMBRANDT,MOLAB)and utilized a range of imaging modalities,predominantly MRI.Hybrid models,especially ResViT and UNetFormer,consistently achieved high performance,with classification accuracy exceeding 98%and segmentation Dice scores above 0.90 across multiple studies.Transformers and hybrid architectures showed increasing adoption post2023.Many studies lacked external validation and were evaluated only on a few benchmark datasets,raising concerns about generalizability and dataset bias.Few studies addressed clinical interpretability or uncertainty quantification.Despite promising results,particularly for hybrid deep learning models,widespread clinical adoption remains limited due to lack of validation,interpretability concerns,and real-world deployment barriers.
基金supported by the National Natural Science Foundation of China[grant number 62376217]the Young Elite Scientists Sponsorship Program by CAST[grant number 2023QNRC001]the Joint Research Project for Meteorological Capacity Improvement[grant number 24NLTSZ003]。
文摘Deep learning-based methods have become alternatives to traditional numerical weather prediction systems,offering faster computation and the ability to utilize large historical datasets.However,the application of deep learning to medium-range regional weather forecasting with limited data remains a significant challenge.In this work,three key solutions are proposed:(1)motivated by the need to improve model performance in data-scarce regional forecasting scenarios,the authors innovatively apply semantic segmentation models,to better capture spatiotemporal features and improve prediction accuracy;(2)recognizing the challenge of overfitting and the inability of traditional noise-based data augmentation methods to effectively enhance model robustness,a novel learnable Gaussian noise mechanism is introduced that allows the model to adaptively optimize perturbations for different locations,ensuring more effective learning;and(3)to address the issue of error accumulation in autoregressive prediction,as well as the challenge of learning difficulty and the lack of intermediate data utilization in one-shot prediction,the authors propose a cascade prediction approach that effectively resolves these problems while significantly improving model forecasting performance.The method achieves a competitive result in The East China Regional AI Medium Range Weather Forecasting Competition.Ablation experiments further validate the effectiveness of each component,highlighting their contributions to enhancing prediction performance.
基金National Science and Technology Council,the Republic of China,under grants NSTC 113-2221-E-194-011-MY3 and Research Center on Artificial Intelligence and Sustainability,National Chung Cheng University under the research project grant titled“Generative Digital Twin System Design for Sustainable Smart City Development in Taiwan.
文摘Modern manufacturing processes have become more reliant on automation because of the accelerated transition from Industry 3.0 to Industry 4.0.Manual inspection of products on assembly lines remains inefficient,prone to errors and lacks consistency,emphasizing the need for a reliable and automated inspection system.Leveraging both object detection and image segmentation approaches,this research proposes a vision-based solution for the detection of various kinds of tools in the toolkit using deep learning(DL)models.Two Intel RealSense D455f depth cameras were arranged in a top down configuration to capture both RGB and depth images of the toolkits.After applying multiple constraints and enhancing them through preprocessing and augmentation,a dataset consisting of 3300 annotated RGB-D photos was generated.Several DL models were selected through a comprehensive assessment of mean Average Precision(mAP),precision-recall equilibrium,inference latency(target≥30 FPS),and computational burden,resulting in a preference for YOLO and Region-based Convolutional Neural Networks(R-CNN)variants over ViT-based models due to the latter’s increased latency and resource requirements.YOLOV5,YOLOV8,YOLOV11,Faster R-CNN,and Mask R-CNN were trained on the annotated dataset and evaluated using key performance metrics(Recall,Accuracy,F1-score,and Precision).YOLOV11 demonstrated balanced excellence with 93.0%precision,89.9%recall,and a 90.6%F1-score in object detection,as well as 96.9%precision,95.3%recall,and a 96.5%F1-score in instance segmentation with an average inference time of 25 ms per frame(≈40 FPS),demonstrating real-time performance.Leveraging these results,a YOLOV11-based windows application was successfully deployed in a real-time assembly line environment,where it accurately processed live video streams to detect and segment tools within toolkits,demonstrating its practical effectiveness in industrial automation.The application is capable of precisely measuring socket dimensions by utilising edge detection techniques on YOLOv11 segmentation masks,in addition to detection and segmentation.This makes it possible to do specification-level quality control right on the assembly line,which improves the ability to examine things in real time.The implementation is a big step forward for intelligent manufacturing in the Industry 4.0 paradigm.It provides a scalable,efficient,and accurate way to do automated inspection and dimensional verification activities.
基金supported in part by the National Science Foundation under Grant EAR 1760582the Louisiana Board of Regents ITRS LEQSF(2018-21)-RD-B-03.
文摘We develop a novel network to segment water with significant appearance variation in videos.Unlike existing state-of-the-art video segmentation approaches that use a pre-trained feature recognition network and several previous frames to guide segmentation,we accommodate the object’s appearance variation by considering features observed from the current frame.When dealing with segmentation of objects such as water,whose appearance is non-uniform and changing dynamically,our pipeline can produce more reliable and accurate segmentation results than existing algorithms.
基金National Natural Science Foundation of China,Grant/Award Numbers:12090053,32088101Hong Kong Innovation and Technology Fund,Grant/Award Numbers:GHP/176/21SZ,InnoHK Project CIMDAHong Kong Research Grants Council,Grant/Award Numbers:11204821,HKBU12101323,HKBU12101520,HKBU12101522,N_HKBU201/18。
文摘Caenorhabditis elegans has been widely used as a model organism in developmental biology due to its invariant development.In this study,we developed a desktop software CShaperApp to segment fluorescence-labeled images of cell membranes and analyze cellular morphologies interactively during C.elegans embryogenesis.Based on the previously proposed framework CShaper,CShaperApp empowers biologists to automatically and efficiently extract quantitative cellular morphological data with either an existing deep learning model or a fine-tuned one adapted to their in-house dataset.Experimental results show that it takes about 30 min to process a three-dimensional time-lapse(4D)dataset,which consists of 150 image stacks at a~1.5-min interval and covers C.elegans embryogenesis from the 4-cell to 350-cell stages.The robustness of CShaperApp is also validated with the datasets from different laboratories.Furthermore,modularized implementation increases the flexibility in multi-task applications and promotes its flexibility for future enhancements.As cell morphology over development has emerged as a focus of interest in developmental biology,CShaperApp is anticipated to pave the way for those studies by accelerating the high-throughput generation of systems-level quantitative data collection.The software can be freely downloaded from the website of Github(cao13jf/CShaperApp)and is executable on Windows,macOS,and Linux operating systems.
基金supported in part by the National Natural Science Foundation of China under Grant 62201201the Foundation of Henan Educational Committee under Grant 242102211042.
文摘Segmenting skin lesions is critical for early skin cancer detection.Existing CNN and Transformer-based methods face challenges such as high computational complexity and limited adaptability to variations in lesion sizes.To overcome these limitations,we introduce MSAMamba-UNet,a lightweight model that integrates two novel architectures:Multi-Scale Mamba(MSMamba)and Adaptive Dynamic Gating Block(ADGB).MSMamba utilizes multi-scale decomposition and a parallel hierarchical structure to enhance the delineation of irregular lesion boundaries and sensitivity to small targets.ADGB dynamically selects convolutional kernels with varying receptive fields based on input features,improving the model’s capacity to accommodate diverse lesion textures and scales.Additionally,we introduce a Mix Attention Fusion Block(MAF)to enhance shallow feature representation by integrating parallel channel and pixel attention mechanisms.Extensive evaluation of MSAMamba-UNet on the ISIC 2016,ISIC 2017,and ISIC 2018 datasets demonstrates competitive segmentation accuracy with only 0.056 M parameters and 0.069 GFLOPs.Our experiments revealed that MSAMamba-UNet achieved IoU scores of 85.53%,85.47%,and 82.22%,as well as DSC scores of 92.20%,92.17%,and 90.24%,respectively.These results underscore the lightweight design and effectiveness of MSAMamba-UNet.
文摘Detecting pavement cracks is critical for road safety and infrastructure management.Traditional methods,relying on manual inspection and basic image processing,are time-consuming and prone to errors.Recent deep-learning(DL)methods automate crack detection,but many still struggle with variable crack patterns and environmental conditions.This study aims to address these limitations by introducing the Masker Transformer,a novel hybrid deep learning model that integrates the precise localization capabilities of Mask Region-based Convolutional Neural Network(Mask R-CNN)with the global contextual awareness of Vision Transformer(ViT).The research focuses on leveraging the strengths of both architectures to enhance segmentation accuracy and adaptability across different pavement conditions.We evaluated the performance of theMaskerTransformer against other state-of-theartmodels such asU-Net,TransformerU-Net(TransUNet),U-NetTransformer(UNETr),SwinU-NetTransformer(Swin-UNETr),You Only Look Once version 8(YoloV8),and Mask R-CNN using two benchmark datasets:Crack500 and DeepCrack.The findings reveal that the MaskerTransformer significantly outperforms the existing models,achieving the highest Dice SimilarityCoefficient(DSC),precision,recall,and F1-Score across both datasets.Specifically,the model attained a DSC of 80.04%on Crack500 and 91.37%on DeepCrack,demonstrating superior segmentation accuracy and reliability.The high precision and recall rates further substantiate its effectiveness in real-world applications,suggesting that the Masker Transformer can serve as a robust tool for automated pavement crack detection,potentially replacing more traditional methods.
基金supported by the Start-up Fund for new faculty from the Hong Kong Polytechnic University(PolyU)(A0043215)(to SA)the General Research Fund and Research Impact Fund from the Hong Kong Research Grants Council(15106018,R5032-18)(to DYT)+1 种基金the Research Center for SHARP Vision in PolyU(P0045843)(to SA)the InnoHK scheme from the Hong Kong Special Administrative Region Government(to DYT).
文摘Retinal aging has been recognized as a significant risk factor for various retinal disorders,including diabetic retinopathy,age-related macular degeneration,and glaucoma,following a growing understanding of the molecular underpinnings of their development.This comprehensive review explores the mechanisms of retinal aging and investigates potential neuroprotective approaches,focusing on the activation of transcription factor EB.Recent meta-analyses have demonstrated promising outcomes of transcription factor EB-targeted strategies,such as exercise,calorie restriction,rapamycin,and metformin,in patients and animal models of these common retinal diseases.The review critically assesses the role of transcription factor EB in retinal biology during aging,its neuroprotective effects,and its therapeutic potential for retinal disorders.The impact of transcription factor EB on retinal aging is cell-specific,influencing metabolic reprogramming and energy homeostasis in retinal neurons through the regulation of mitochondrial quality control and nutrient-sensing pathways.In vascular endothelial cells,transcription factor EB controls important processes,including endothelial cell proliferation,endothelial tube formation,and nitric oxide levels,thereby influencing the inner blood-retinal barrier,angiogenesis,and retinal microvasculature.Additionally,transcription factor EB affects vascular smooth muscle cells,inhibiting vascular calcification and atherogenesis.In retinal pigment epithelial cells,transcription factor EB modulates functions such as autophagy,lysosomal dynamics,and clearance of the aging pigment lipofuscin,thereby promoting photoreceptor survival and regulating vascular endothelial growth factor A expression involved in neovascularization.These cell-specific functions of transcription factor EB significantly impact retinal aging mechanisms encompassing proteostasis,neuronal synapse plasticity,energy metabolism,microvasculature,and inflammation,ultimately offering protection against retinal aging and diseases.The review emphasizes transcription factor EB as a potential therapeutic target for retinal diseases.Therefore,it is imperative to obtain well-controlled direct experimental evidence to confirm the efficacy of transcription factor EB modulation in retinal diseases while minimizing its risk of adverse effects.