The deep learning technology has shown impressive performance in various vision tasks such as image classification, object detection and semantic segmentation. In particular, recent advances of deep learning technique...The deep learning technology has shown impressive performance in various vision tasks such as image classification, object detection and semantic segmentation. In particular, recent advances of deep learning techniques bring encouraging performance to fine-grained image classification which aims to distinguish subordinate-level categories, such as bird species or dog breeds. This task is extremely challenging due to high intra-class and low inter-class variance. In this paper, we review four types of deep learning based fine-grained image classification approaches, including the general convolutional neural networks (CNNs), part detection based, ensemble of networks based and visual attention based fine-grained image classification approaches. Besides, the deep learning based semantic segmentation approaches are also covered in this paper. The region proposal based and fully convolutional networks based approaches for semantic segmentation are introduced respectively.展开更多
Neurons can be abstractly represented as skeletons due to the filament nature of neurites.With the rapid development of imaging and image analysis techniques,an increasing amount of neuron skeleton data is being produ...Neurons can be abstractly represented as skeletons due to the filament nature of neurites.With the rapid development of imaging and image analysis techniques,an increasing amount of neuron skeleton data is being produced.In some scienti fic studies,it is necessary to dissect the axons and dendrites,which is typically done manually and is both tedious and time-consuming.To automate this process,we have developed a method that relies solely on neuronal skeletons using Geometric Deep Learning(GDL).We demonstrate the effectiveness of this method using pyramidal neurons in mammalian brains,and the results are promising for its application in neuroscience studies.展开更多
The process of segmenting point cloud data into several homogeneous areas with points in the same region having the same attributes is known as 3D segmentation.Segmentation is challenging with point cloud data due to...The process of segmenting point cloud data into several homogeneous areas with points in the same region having the same attributes is known as 3D segmentation.Segmentation is challenging with point cloud data due to substantial redundancy,fluctuating sample density and lack of apparent organization.The research area has a wide range of robotics applications,including intelligent vehicles,autonomous mapping and navigation.A number of researchers have introduced various methodologies and algorithms.Deep learning has been successfully used to a spectrum of 2D vision domains as a prevailing A.I.methods.However,due to the specific problems of processing point clouds with deep neural networks,deep learning on point clouds is still in its initial stages.This study examines many strategies that have been presented to 3D instance and semantic segmentation and gives a complete assessment of current developments in deep learning-based 3D segmentation.In these approaches’benefits,draw backs,and design mechanisms are studied and addressed.This study evaluates the impact of various segmentation algorithms on competitiveness on various publicly accessible datasets,as well as the most often used pipelines,their advantages and limits,insightful findings and intriguing future research directions.展开更多
Image semantic segmentation is an important branch of computer vision of a wide variety of practical applications such as medical image analysis,autonomous driving,virtual or augmented reality,etc.In recent years,due ...Image semantic segmentation is an important branch of computer vision of a wide variety of practical applications such as medical image analysis,autonomous driving,virtual or augmented reality,etc.In recent years,due to the remarkable performance of transformer and multilayer perceptron(MLP)in computer vision,which is equivalent to convolutional neural network(CNN),there has been a substantial amount of image semantic segmentation works aimed at developing different types of deep learning architecture.This survey aims to provide a comprehensive overview of deep learning methods in the field of general image semantic segmentation.Firstly,the commonly used image segmentation datasets are listed.Next,extensive pioneering works are deeply studied from multiple perspectives(e.g.,network structures,feature fusion methods,attention mechanisms),and are divided into four categories according to different network architectures:CNN-based architectures,transformer-based architectures,MLP-based architectures,and others.Furthermore,this paper presents some common evaluation metrics and compares the respective advantages and limitations of popular techniques both in terms of architectural design and their experimental value on the most widely used datasets.Finally,possible future research directions and challenges are discussed for the reference of other researchers.展开更多
The convolutional neural network(CNN)method based on DeepLabv3+has some problems in the semantic segmentation task of high-resolution remote sensing images,such as fixed receiving field size of feature extraction,lack...The convolutional neural network(CNN)method based on DeepLabv3+has some problems in the semantic segmentation task of high-resolution remote sensing images,such as fixed receiving field size of feature extraction,lack of semantic information,high decoder magnification,and insufficient detail retention ability.A hierarchical feature fusion network(HFFNet)was proposed.Firstly,a combination of transformer and CNN architectures was employed for feature extraction from images of varying resolutions.The extracted features were processed independently.Subsequently,the features from the transformer and CNN were fused under the guidance of features from different sources.This fusion process assisted in restoring information more comprehensively during the decoding stage.Furthermore,a spatial channel attention module was designed in the final stage of decoding to refine features and reduce the semantic gap between shallow CNN features and deep decoder features.The experimental results showed that HFFNet had superior performance on UAVid,LoveDA,Potsdam,and Vaihingen datasets,and its cross-linking index was better than DeepLabv3+and other competing methods,showing strong generalization ability.展开更多
Segmenting a breast ultrasound image is still challenging due to the presence of speckle noise,dependency on the operator,and the variation of image quality.This paper presents the UltraSegNet architecture that addres...Segmenting a breast ultrasound image is still challenging due to the presence of speckle noise,dependency on the operator,and the variation of image quality.This paper presents the UltraSegNet architecture that addresses these challenges through three key technical innovations:This work adds three things:(1)a changed ResNet-50 backbone with sequential 3×3 convolutions to keep fine anatomical details that are needed for finding lesion boundaries;(2)a computationally efficient regional attention mechanism that works on high-resolution features without using a transformer’s extra memory;and(3)an adaptive feature fusion strategy that changes local and global featuresbasedonhowthe image isbeing used.Extensive evaluation on two distinct datasets demonstrates UltraSegNet’s superior performance:On the BUSI dataset,it obtains a precision of 0.915,a recall of 0.908,and an F1 score of 0.911.In the UDAIT dataset,it achieves robust performance across the board,with a precision of 0.901 and recall of 0.894.Importantly,these improvements are achieved at clinically feasible computation times,taking 235 ms per image on standard GPU hardware.Notably,UltraSegNet does amazingly well on difficult small lesions(less than 10 mm),achieving a detection accuracy of 0.891.This is a huge improvement over traditional methods that have a hard time with small-scale features,as standard models can only achieve 0.63–0.71 accuracy.This improvement in small lesion detection is particularly crucial for early-stage breast cancer identification.Results from this work demonstrate that UltraSegNet can be practically deployable in clinical workflows to improve breast cancer screening accuracy.展开更多
BACKGROUND Upper gastrointestinal(UGI)diseases present diagnostic challenges during endoscopy due to visual similarities,indistinct boundaries,and observer variability,which can lead to missed diagnoses and delayed tr...BACKGROUND Upper gastrointestinal(UGI)diseases present diagnostic challenges during endoscopy due to visual similarities,indistinct boundaries,and observer variability,which can lead to missed diagnoses and delayed treatment.Automated segmentation using deep learning(DL)models offers the potential to assist endoscopists,improve diagnostic accuracy,and reduce workload.However,multi-class UGI disease segmentation remains underexplored,with limited annotated datasets and insufficient focus on clinical validation.This study hypothesizes that comparative analysis of different DL architectures can identify models suitable for clinical application,providing actionable insights to reduce diagnostic errors and support clinical decision-making in endoscopic practice.AIM To evaluate 17 state-of-the-art DL models for multi-class UGI disease segmentation,emphasizing clinical translation and real-world applicability.METHODS This study evaluated 17 DL models spanning convolutional neural network(CNN)-,transformer-,and mambabased architectures using a self-collected dataset from two hospitals in Macao and Xiangyang(3313 images,9 classes)and the public EDD2020 dataset(386 images,5 classes).Models were assessed for segmentation performance and performance-efficiency trade-off.Statistical analyses were conducted to examine performance differences across architectures.Generalization capability was measured through a cross-dataset evaluation(training models on the self-collected dataset and testing on the EDD2020 dataset).RESULTS Swin-UMamba achieved the highest segmentation performance across both datasets[intersection over union(IoU):89.06%±0.20%self-collected,77.53%±0.32%EDD2020],followed by SegFormer(IoU:88.94%±0.38%selfcollected,77.20%±0.98%EDD2020)and ConvNeXt+UPerNet(IoU:88.48%±0.09%self-collected,76.90%±0.61%EDD2020).Statistical analyses showed no significant differences between paradigms,though hierarchical architectures with pre-trained encoders consistently outperformed simpler designs.SegFormer achieved the best balance of accuracy and computational efficiency with a performance-efficiency trade-off score of 92.02%,making it suitable for real-time clinical use.Cross-dataset evaluation revealed significant performance drops,with generalization retention rates of 64.78%to 71.52%.Transformer-based models,particularly pyramid vision transformer v2+efficient multi-scale convolutional decoding(IoU:63.35%±1.44%),generalized better than CNN-and mambabased models.CONCLUSION Hierarchical architectures like Swin-UMamba and SegFormer show promise for UGI disease segmentation,reducing missed diagnoses and improving workflows,but robust clinical validation is crucial for real-world deployment.展开更多
Semantic segmentation plays a foundational role in biomedical image analysis, providing precise information about cellular, tissue, and organ structures in both biological and medical imaging modalities. Traditional a...Semantic segmentation plays a foundational role in biomedical image analysis, providing precise information about cellular, tissue, and organ structures in both biological and medical imaging modalities. Traditional approaches often fail in the face of challenges such as low contrast, morphological variability, and densely packed structures. Recent advancements in deep learning have transformed segmentation capabilities through the integration of fine-scale detail preservation, coarse-scale contextual modeling, and multi-scale feature fusion. This work provides a comprehensive analysis of state-of-the-art deep learning models, including U-Net variants, attention-based frameworks, and Transformer-integrated networks, highlighting innovations that improve accuracy, generalizability, and computational efficiency. Key architectural components such as convolution operations, shallow and deep blocks, skip connections, and hybrid encoders are examined for their roles in enhancing spatial representation and semantic consistency. We further discuss the importance of hierarchical and instance-aware segmentation and annotation in interpreting complex biological scenes and multiplexed medical images. By bridging methodological developments with diverse application domains, this paper outlines current trends and future directions for semantic segmentation, emphasizing its critical role in facilitating annotation, diagnosis, and discovery in biomedical research.展开更多
Automated prostate cancer detection in magnetic resonance imaging(MRI)scans is of significant importance for cancer patient management.Most existing computer-aided diagnosis systems adopt segmentation methods while ob...Automated prostate cancer detection in magnetic resonance imaging(MRI)scans is of significant importance for cancer patient management.Most existing computer-aided diagnosis systems adopt segmentation methods while object detection approaches recently show promising results.The authors have(1)carefully compared performances of most-developed segmentation and object detection methods in localising prostate imaging reporting and data system(PIRADS)-labelled prostate lesions on MRI scans;(2)proposed an additional customised set of lesion-level localisation sensitivity and precision;(3)proposed efficient ways to ensemble the segmentation and object detection methods for improved performances.The ground-truth(GT)perspective lesion-level sensitivity and prediction-perspective lesion-level precision are reported,to quantify the ratios of true positive voxels being detected by algorithms over the number of voxels in the GT labelled regions and predicted regions.The two networks are trained independently on 549 clinical patients data with PIRADS-V2 as GT labels,and tested on 161 internal and 100 external MRI scans.At the lesion level,nnDetection outperforms nnUNet for detecting both PIRADS≥3 and PIRADS≥4 lesions in majority cases.For example,at the average false positive prediction per patient being 3,nnDetection achieves a greater Intersection-of-Union(IoU)-based sensitivity than nnUNet for detecting PIRADS≥3 lesions,being 80.78%�1.50%versus 60.40%�1.64%(p<0.01).At the voxel level,nnUnet is in general superior or comparable to nnDetection.The proposed ensemble methods achieve improved or comparable lesion-level accuracy,in all tested clinical scenarios.For example,at 3 false positives,the lesion-wise ensemble method achieves 82.24%�1.43%sensitivity versus 80.78%�1.50%(nnDetection)and 60.40%�1.64%(nnUNet)for detecting PIRADS≥3 lesions.Consistent conclusions are also drawn from results on the external data set.展开更多
Lower back pain is one of the most common medical problems in the world and it is experienced by a huge percentage of people everywhere.Due to its ability to produce a detailed view of the soft tissues,including the s...Lower back pain is one of the most common medical problems in the world and it is experienced by a huge percentage of people everywhere.Due to its ability to produce a detailed view of the soft tissues,including the spinal cord,nerves,intervertebral discs,and vertebrae,Magnetic Resonance Imaging is thought to be the most effective method for imaging the spine.The semantic segmentation of vertebrae plays a major role in the diagnostic process of lumbar diseases.It is difficult to semantically partition the vertebrae in Magnetic Resonance Images from the surrounding variety of tissues,including muscles,ligaments,and intervertebral discs.U-Net is a powerful deep-learning architecture to handle the challenges of medical image analysis tasks and achieves high segmentation accuracy.This work proposes a modified U-Net architecture namely MU-Net,consisting of the Meijering convolutional layer that incorporates the Meijering filter to perform the semantic segmentation of lumbar vertebrae L1 to L5 and sacral vertebra S1.Pseudo-colour mask images were generated and used as ground truth for training the model.The work has been carried out on 1312 images expanded from T1-weighted mid-sagittal MRI images of 515 patients in the Lumbar Spine MRI Dataset publicly available from Mendeley Data.The proposed MU-Net model for the semantic segmentation of the lumbar vertebrae gives better performance with 98.79%of pixel accuracy(PA),98.66%of dice similarity coefficient(DSC),97.36%of Jaccard coefficient,and 92.55%mean Intersection over Union(mean IoU)metrics using the mentioned dataset.展开更多
Automatic segmentation and recognition of content and element information in color geological map are of great significance for researchers to analyze the distribution of mineral resources and predict disaster informa...Automatic segmentation and recognition of content and element information in color geological map are of great significance for researchers to analyze the distribution of mineral resources and predict disaster information.This article focuses on color planar raster geological map(geological maps include planar geological maps,columnar maps,and profiles).While existing deep learning approaches are often used to segment general images,their performance is limited due to complex elements,diverse regional features,and complicated backgrounds for color geological map in the domain of geoscience.To address the issue,a color geological map segmentation model is proposed that combines the Felz clustering algorithm and an improved SE-UNet deep learning network(named GeoMSeg).Firstly,a symmetrical encoder-decoder structure backbone network based on UNet is constructed,and the channel attention mechanism SENet has been incorporated to augment the network’s capacity for feature representation,enabling the model to purposefully extract map information.The SE-UNet network is employed for feature extraction from the geological map and obtain coarse segmentation results.Secondly,the Felz clustering algorithm is used for super pixel pre-segmentation of geological maps.The coarse segmentation results are refined and modified based on the super pixel pre-segmentation results to obtain the final segmentation results.This study applies GeoMSeg to the constructed dataset,and the experimental results show that the algorithm proposed in this paper has superior performance compared to other mainstream map segmentation models,with an accuracy of 91.89%and a MIoU of 71.91%.展开更多
The thawing of ice-rich permafrost leads to the formation of thermokarst landforms.Precise mapping of retrogressive thaw slumps(RTSs)is imperative for assessing the degradation and carbon exchange of permafrost at bot...The thawing of ice-rich permafrost leads to the formation of thermokarst landforms.Precise mapping of retrogressive thaw slumps(RTSs)is imperative for assessing the degradation and carbon exchange of permafrost at both local and regional scales on the Tibetan Plateau(TP).However,previous methods for RTSs mapping rely on a large number of samples and complex classifiers with low automation level or unnecessary complexity.We propose an automatic mapping network(AmRTSNet)for producing decimeter-level RTSs maps from GaoFen-7 images based on deep learning.Both the quantitative metrics and qualitative evaluations show that AmRTSNet trained in the Beiluhe offers significant advantages over previous methods.Without further fine-tuning,we conducted RTSs automatic mapping based on AmRTSNet in the Wulanwula,Chumarhe,and Gaolinggo.Over 141,312 ha on the TP have been automatically mapped,comprising 926 RTS regions with a total RTS area of 2318.72 ha.The average statistics of the mapped RTSs show low roundness(0.38),moderate rectangularity(0.61),and high convexity(0.79).About 90%of the RTSs are smaller than 6 ha.The average aspect ratio is 2.18.RTSs are unevenly distributed in belt-like aggregations with dominant density peaks.RTSs often concentrate in hillslopes and along lateral streams,with more dense areas more likely to have larger RTSs.展开更多
The key to the success of few-shot semantic segmentation(FSS)depends on the efficient use of limited annotated support set to accurately segment novel classes in the query set.Due to the few samples in the support set...The key to the success of few-shot semantic segmentation(FSS)depends on the efficient use of limited annotated support set to accurately segment novel classes in the query set.Due to the few samples in the support set,FSS faces challenges such as intra-class differences,background(BG)mismatches between query and support sets,and ambiguous segmentation between the foreground(FG)and BG in the query set.To address these issues,The paper propose a multi-module network called CAMSNet,which includes four modules:the General Information Module(GIM),the Class Activation Map Aggregation(CAMA)module,the Self-Cross Attention(SCA)Block,and the Feature Fusion Module(FFM).In CAMSNet,The GIM employs an improved triplet loss,which concatenates word embedding vectors and support prototypes as anchors,and uses local support features of FG and BG as positive and negative samples to help solve the problem of intra-class differences.Then for the first time,the Class Activation Map(CAM)from the Weakly Supervised Semantic Segmentation(WSSS)is applied to FSS within the CAMA module.This method replaces the traditional use of cosine similarity to locate query information.Subsequently,the SCA Block processes the support and query features aggregated by the CAMA module,significantly enhancing the understanding of input information,leading to more accurate predictions and effectively addressing BG mismatch and ambiguous FG-BG segmentation.Finally,The FFM combines general class information with the enhanced query information to achieve accurate segmentation of the query image.Extensive Experiments on PASCAL and COCO demonstrate that-5i-20ithe CAMSNet yields superior performance and set a state-of-the-art.展开更多
Liver cancer remains a leading cause of mortality worldwide,and precise diagnostic tools are essential for effective treatment planning.Liver Tumors(LTs)vary significantly in size,shape,and location,and can present wi...Liver cancer remains a leading cause of mortality worldwide,and precise diagnostic tools are essential for effective treatment planning.Liver Tumors(LTs)vary significantly in size,shape,and location,and can present with tissues of similar intensities,making automatically segmenting and classifying LTs from abdominal tomography images crucial and challenging.This review examines recent advancements in Liver Segmentation(LS)and Tumor Segmentation(TS)algorithms,highlighting their strengths and limitations regarding precision,automation,and resilience.Performance metrics are utilized to assess key detection algorithms and analytical methods,emphasizing their effectiveness and relevance in clinical contexts.The review also addresses ongoing challenges in liver tumor segmentation and identification,such as managing high variability in patient data and ensuring robustness across different imaging conditions.It suggests directions for future research,with insights into technological advancements that can enhance surgical planning and diagnostic accuracy by comparing popular methods.This paper contributes to a comprehensive understanding of current liver tumor detection techniques,provides a roadmap for future innovations,and improves diagnostic and therapeutic outcomes for liver cancer by integrating recent progress with remaining challenges.展开更多
We consider an image semantic communication system in a time-varying fading Gaussian MIMO channel,with a finite number of channel states.A deep learning-aided broadcast approach scheme is proposed to benefit the adapt...We consider an image semantic communication system in a time-varying fading Gaussian MIMO channel,with a finite number of channel states.A deep learning-aided broadcast approach scheme is proposed to benefit the adaptive semantic transmission in terms of different channel states.We combine the classic broadcast approach with the image transformer to implement this adaptive joint source and channel coding(JSCC)scheme.Specifically,we utilize the neural network(NN)to jointly optimize the hierarchical image compression and superposition code mapping within this scheme.The learned transformers and codebooks allow recovering of the image with an adaptive quality and low error rate at the receiver side,in each channel state.The simulation results exhibit our proposed scheme can dynamically adapt the coding to the current channel state and outperform some existing intelligent schemes with the fixed coding block.展开更多
Automatic detection of Leukemia or blood cancer is one of the most challenging tasks that need to be addressed in the healthcare system.Analysis of white blood cells(WBCs)in the blood or bone marrow microscopic slide ...Automatic detection of Leukemia or blood cancer is one of the most challenging tasks that need to be addressed in the healthcare system.Analysis of white blood cells(WBCs)in the blood or bone marrow microscopic slide images play a crucial part in early identification to facilitate medical experts.For Acute Lymphocytic Leukemia(ALL),the most preferred part of the blood or marrow is to be analyzed by the experts before it spreads in the whole body and the condition becomes worse.The researchers have done a lot of work in this field,to demonstrate a comprehensive analysis few literature reviews have been published focusing on various artificial intelligence-based techniques like machine and deep learning detection of ALL.The systematic review has been done in this article under the PRISMA guidelines which presents the most recent advancements in this field.Different image segmentation techniques were broadly studied and categorized from various online databases like Google Scholar,Science Direct,and PubMed as image processing-based,traditional machine and deep learning-based,and advanced deep learning-based models were presented.Convolutional Neural Networks(CNN)based on traditional models and then the recent advancements in CNN used for the classification of ALL into its subtypes.A critical analysis of the existing methods is provided to offer clarity on the current state of the field.Finally,the paper concludes with insights and suggestions for future research,aiming to guide new researchers in the development of advanced automated systems for detecting life-threatening diseases.展开更多
Since the introduction of vision Transformers into the computer vision field,many vision tasks such as semantic segmentation tasks,have undergone radical changes.Although Transformer enhances the correlation of each l...Since the introduction of vision Transformers into the computer vision field,many vision tasks such as semantic segmentation tasks,have undergone radical changes.Although Transformer enhances the correlation of each local feature of an image object in the hidden space through the attention mechanism,it is difficult for a segmentation head to accomplish the mask prediction for dense embedding of multi-category and multi-local features.We present patch prototype vision Transformer(PPFormer),a Transformer architecture for semantic segmentation based on knowledge-embedded patch prototypes.1)The hierarchical Transformer encoder can generate multi-scale and multi-layered patch features including seamless patch projection to obtain information of multiscale patches,and feature-clustered self-attention to enhance the interplay of multi-layered visual information with implicit position encodes.2)PPFormer utilizes a non-parametric prototype decoder to extract region observations which represent significant parts of the objects by unlearnable patch prototypes and then calculate similarity between patch prototypes and pixel embeddings.The proposed contrasting patch prototype alignment module,which uses new patch prototypes to update prototype bank,effectively maintains class boundaries for prototypes.For different application scenarios,we have launched PPFormer-S,PPFormer-M and PPFormer-L by expanding the scale.Experimental results demonstrate that PPFormer can outperform fully convolutional networks(FCN)-and attention-based semantic segmentation models on the PASCAL VOC 2012,ADE20k,and Cityscapes datasets.展开更多
Microseismic monitoring is essential for understanding subsurface dynamics and optimizing oil and gas pro-duction.However,traditional methods for the automatic detection of microseismic events rely heavily on characte...Microseismic monitoring is essential for understanding subsurface dynamics and optimizing oil and gas pro-duction.However,traditional methods for the automatic detection of microseismic events rely heavily on characteristic functions and human intervention,often resulting in suboptimal performance when dealing with complex and noisy data.In this study,we propose a novel approach that leverages deep learning frame to extract multiscale features from microseismic data using a TransUNet neural network.Our model integrates the ad-vantages of Transformer and UNet architectures to achieve high accuracy in multivariate image segmentation and precise picking of P-wave and S-wave first arrivals simultaneously.We validate our approach using both synthetic and field microseismic datasets recorded from gas storage monitoring and roof fracturing in a coal seam.The robustness of the proposed method has been verified in the testing of synthetic data with various levels of Gaussian and real background noises extracted from field data.The comparisons of the proposed method with UNet and SwinUNet in terms of the model architecture and classification performance demonstrate the Tran-sUNet achieves the optimal balance in its architecture and inference speed.With relatively low inference time and network complexity,it operates effectively in high-precision microseismic phase pickings.This advancement holds significant promise for enhancing microseismic monitoring technology in hydraulic fracturing and reser-voir monitoring applications.展开更多
To review the existing deep learning applications for diagnosing diabetic retinopathy and retinopathy of prematurity diseases,the available public retinal databases for the diseases and apply the International Journal...To review the existing deep learning applications for diagnosing diabetic retinopathy and retinopathy of prematurity diseases,the available public retinal databases for the diseases and apply the International Journal of Medical Informatics(IJMEDI)checklist were assessed the quality of included studies;an in-depth literature search in Scopus,Web of Science,IEEE and ACM databases targeting articles from inception up to 31st January 2023 was done by two independent reviewers.In the review,26 out of 1476 articles with a total of 36 models were included.Data size and model validation were found to be challenges for most studies.Deep learning models are gaining focus in the development of medical diagnosis tools and applications.However,there seems to be a critical issue with most of the studies being published,with some not including information about data sources and data sizes which is important for their performance verification.展开更多
As legal cases grow in complexity and volume worldwide,integrating machine learning and artificial intelligence into judicial systems has become a pivotal research focus.This study introduces a comprehensive framework...As legal cases grow in complexity and volume worldwide,integrating machine learning and artificial intelligence into judicial systems has become a pivotal research focus.This study introduces a comprehensive framework for verdict recommendation that synergizes rule-based methods with deep learning techniques specifically tailored to the legal domain.The proposed framework comprises three core modules:legal feature extraction,semantic similarity assessment,and verdict recommendation.For legal feature extraction,a rule-based approach leverages Black’s Law Dictionary and WordNet Synsets to construct feature vectors from judicial texts.Semantic similarity between cases is evaluated using a hybrid method that combines rule-based logic with an LSTM model,analyzing the feature vectors of query cases against a legal knowledge base.Verdicts are then recommended through a rule-based retrieval system,enhanced by predefined legal statutes and regulations.By merging rule-based methodologies with deep learning,this framework addresses the interpretability challenges often associated with contemporary AImodels,thereby enhancing both transparency and generalizability across diverse legal contexts.The system was rigorously tested using a legal corpus of 43,000 case laws across six categories:Criminal,Revenue,Service,Corporate,Constitutional,and Civil law,ensuring its adaptability across a wide range of judicial scenarios.Performance evaluation showed that the feature extraction module achieved an average accuracy of 91.6%with an F-Score of 95%.The semantic similarity module,tested using Manhattan,Euclidean,and Cosine distance metrics,achieved 88%accuracy and a 93%F-Score for short queries(Manhattan),89%accuracy and a 93.7%F-Score for medium-length queries(Euclidean),and 87%accuracy with a 92.5%F-Score for longer queries(Cosine).The verdict recommendation module outperformed existing methods,achieving 90%accuracy and a 93.75%F-Score.This study highlights the potential of hybrid AI frameworks to improve judicial decision-making and streamline legal processes,offering a robust,interpretable,and adaptable solution for the evolving demands of modern legal systems.展开更多
基金supported by the National Natural Science Foundation of China(Nos.61373121 and 61328205)Program for Sichuan Provincial Science Fund for Distinguished Young Scholars(No.13QNJJ0149)+1 种基金the Fundamental Research Funds for the Central UniversitiesChina Scholarship Council(No.201507000032)
文摘The deep learning technology has shown impressive performance in various vision tasks such as image classification, object detection and semantic segmentation. In particular, recent advances of deep learning techniques bring encouraging performance to fine-grained image classification which aims to distinguish subordinate-level categories, such as bird species or dog breeds. This task is extremely challenging due to high intra-class and low inter-class variance. In this paper, we review four types of deep learning based fine-grained image classification approaches, including the general convolutional neural networks (CNNs), part detection based, ensemble of networks based and visual attention based fine-grained image classification approaches. Besides, the deep learning based semantic segmentation approaches are also covered in this paper. The region proposal based and fully convolutional networks based approaches for semantic segmentation are introduced respectively.
基金supported by the Simons Foundation,the National Natural Science Foundation of China(No.NSFC61405038)the Fujian provincial fund(No.2020J01453).
文摘Neurons can be abstractly represented as skeletons due to the filament nature of neurites.With the rapid development of imaging and image analysis techniques,an increasing amount of neuron skeleton data is being produced.In some scienti fic studies,it is necessary to dissect the axons and dendrites,which is typically done manually and is both tedious and time-consuming.To automate this process,we have developed a method that relies solely on neuronal skeletons using Geometric Deep Learning(GDL).We demonstrate the effectiveness of this method using pyramidal neurons in mammalian brains,and the results are promising for its application in neuroscience studies.
基金This research was supported by the BB21 plus funded by Busan Metropolitan City and Busan Institute for Talent and Lifelong Education(BIT)and a grant from Tongmyong University Innovated University Research Park(I-URP)funded by Busan Metropolitan City,Republic of Korea.
文摘The process of segmenting point cloud data into several homogeneous areas with points in the same region having the same attributes is known as 3D segmentation.Segmentation is challenging with point cloud data due to substantial redundancy,fluctuating sample density and lack of apparent organization.The research area has a wide range of robotics applications,including intelligent vehicles,autonomous mapping and navigation.A number of researchers have introduced various methodologies and algorithms.Deep learning has been successfully used to a spectrum of 2D vision domains as a prevailing A.I.methods.However,due to the specific problems of processing point clouds with deep neural networks,deep learning on point clouds is still in its initial stages.This study examines many strategies that have been presented to 3D instance and semantic segmentation and gives a complete assessment of current developments in deep learning-based 3D segmentation.In these approaches’benefits,draw backs,and design mechanisms are studied and addressed.This study evaluates the impact of various segmentation algorithms on competitiveness on various publicly accessible datasets,as well as the most often used pipelines,their advantages and limits,insightful findings and intriguing future research directions.
基金supported by the Major science and technology project of Hainan Province(Grant No.ZDKJ2020012)National Natural Science Foundation of China(Grant No.62162024 and 62162022)+1 种基金Key Projects in Hainan Province(Grant ZDYF2021GXJS003 and Grant ZDYF2020040)Graduate Innovation Project(Grant No.Qhys2021-187).
文摘Image semantic segmentation is an important branch of computer vision of a wide variety of practical applications such as medical image analysis,autonomous driving,virtual or augmented reality,etc.In recent years,due to the remarkable performance of transformer and multilayer perceptron(MLP)in computer vision,which is equivalent to convolutional neural network(CNN),there has been a substantial amount of image semantic segmentation works aimed at developing different types of deep learning architecture.This survey aims to provide a comprehensive overview of deep learning methods in the field of general image semantic segmentation.Firstly,the commonly used image segmentation datasets are listed.Next,extensive pioneering works are deeply studied from multiple perspectives(e.g.,network structures,feature fusion methods,attention mechanisms),and are divided into four categories according to different network architectures:CNN-based architectures,transformer-based architectures,MLP-based architectures,and others.Furthermore,this paper presents some common evaluation metrics and compares the respective advantages and limitations of popular techniques both in terms of architectural design and their experimental value on the most widely used datasets.Finally,possible future research directions and challenges are discussed for the reference of other researchers.
基金supported by National Natural Science Foundation of China(No.52374155)Anhui Provincial Natural Science Foundation(No.2308085 MF218).
文摘The convolutional neural network(CNN)method based on DeepLabv3+has some problems in the semantic segmentation task of high-resolution remote sensing images,such as fixed receiving field size of feature extraction,lack of semantic information,high decoder magnification,and insufficient detail retention ability.A hierarchical feature fusion network(HFFNet)was proposed.Firstly,a combination of transformer and CNN architectures was employed for feature extraction from images of varying resolutions.The extracted features were processed independently.Subsequently,the features from the transformer and CNN were fused under the guidance of features from different sources.This fusion process assisted in restoring information more comprehensively during the decoding stage.Furthermore,a spatial channel attention module was designed in the final stage of decoding to refine features and reduce the semantic gap between shallow CNN features and deep decoder features.The experimental results showed that HFFNet had superior performance on UAVid,LoveDA,Potsdam,and Vaihingen datasets,and its cross-linking index was better than DeepLabv3+and other competing methods,showing strong generalization ability.
基金funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R435),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Segmenting a breast ultrasound image is still challenging due to the presence of speckle noise,dependency on the operator,and the variation of image quality.This paper presents the UltraSegNet architecture that addresses these challenges through three key technical innovations:This work adds three things:(1)a changed ResNet-50 backbone with sequential 3×3 convolutions to keep fine anatomical details that are needed for finding lesion boundaries;(2)a computationally efficient regional attention mechanism that works on high-resolution features without using a transformer’s extra memory;and(3)an adaptive feature fusion strategy that changes local and global featuresbasedonhowthe image isbeing used.Extensive evaluation on two distinct datasets demonstrates UltraSegNet’s superior performance:On the BUSI dataset,it obtains a precision of 0.915,a recall of 0.908,and an F1 score of 0.911.In the UDAIT dataset,it achieves robust performance across the board,with a precision of 0.901 and recall of 0.894.Importantly,these improvements are achieved at clinically feasible computation times,taking 235 ms per image on standard GPU hardware.Notably,UltraSegNet does amazingly well on difficult small lesions(less than 10 mm),achieving a detection accuracy of 0.891.This is a huge improvement over traditional methods that have a hard time with small-scale features,as standard models can only achieve 0.63–0.71 accuracy.This improvement in small lesion detection is particularly crucial for early-stage breast cancer identification.Results from this work demonstrate that UltraSegNet can be practically deployable in clinical workflows to improve breast cancer screening accuracy.
基金Supported by the Guangdong Basic and Applied Basic Research Foundation,No.2021B1515130003the Key Research and Development Plan of Hubei Province,No.2022BCE034the Natural Science Foundation of Hubei Province,No.2024AFB1054.
文摘BACKGROUND Upper gastrointestinal(UGI)diseases present diagnostic challenges during endoscopy due to visual similarities,indistinct boundaries,and observer variability,which can lead to missed diagnoses and delayed treatment.Automated segmentation using deep learning(DL)models offers the potential to assist endoscopists,improve diagnostic accuracy,and reduce workload.However,multi-class UGI disease segmentation remains underexplored,with limited annotated datasets and insufficient focus on clinical validation.This study hypothesizes that comparative analysis of different DL architectures can identify models suitable for clinical application,providing actionable insights to reduce diagnostic errors and support clinical decision-making in endoscopic practice.AIM To evaluate 17 state-of-the-art DL models for multi-class UGI disease segmentation,emphasizing clinical translation and real-world applicability.METHODS This study evaluated 17 DL models spanning convolutional neural network(CNN)-,transformer-,and mambabased architectures using a self-collected dataset from two hospitals in Macao and Xiangyang(3313 images,9 classes)and the public EDD2020 dataset(386 images,5 classes).Models were assessed for segmentation performance and performance-efficiency trade-off.Statistical analyses were conducted to examine performance differences across architectures.Generalization capability was measured through a cross-dataset evaluation(training models on the self-collected dataset and testing on the EDD2020 dataset).RESULTS Swin-UMamba achieved the highest segmentation performance across both datasets[intersection over union(IoU):89.06%±0.20%self-collected,77.53%±0.32%EDD2020],followed by SegFormer(IoU:88.94%±0.38%selfcollected,77.20%±0.98%EDD2020)and ConvNeXt+UPerNet(IoU:88.48%±0.09%self-collected,76.90%±0.61%EDD2020).Statistical analyses showed no significant differences between paradigms,though hierarchical architectures with pre-trained encoders consistently outperformed simpler designs.SegFormer achieved the best balance of accuracy and computational efficiency with a performance-efficiency trade-off score of 92.02%,making it suitable for real-time clinical use.Cross-dataset evaluation revealed significant performance drops,with generalization retention rates of 64.78%to 71.52%.Transformer-based models,particularly pyramid vision transformer v2+efficient multi-scale convolutional decoding(IoU:63.35%±1.44%),generalized better than CNN-and mambabased models.CONCLUSION Hierarchical architectures like Swin-UMamba and SegFormer show promise for UGI disease segmentation,reducing missed diagnoses and improving workflows,but robust clinical validation is crucial for real-world deployment.
基金Open Access funding provided by the National Institutes of Health(NIH)The funding for this project was provided by NCATS Intramural Fund.
文摘Semantic segmentation plays a foundational role in biomedical image analysis, providing precise information about cellular, tissue, and organ structures in both biological and medical imaging modalities. Traditional approaches often fail in the face of challenges such as low contrast, morphological variability, and densely packed structures. Recent advancements in deep learning have transformed segmentation capabilities through the integration of fine-scale detail preservation, coarse-scale contextual modeling, and multi-scale feature fusion. This work provides a comprehensive analysis of state-of-the-art deep learning models, including U-Net variants, attention-based frameworks, and Transformer-integrated networks, highlighting innovations that improve accuracy, generalizability, and computational efficiency. Key architectural components such as convolution operations, shallow and deep blocks, skip connections, and hybrid encoders are examined for their roles in enhancing spatial representation and semantic consistency. We further discuss the importance of hierarchical and instance-aware segmentation and annotation in interpreting complex biological scenes and multiplexed medical images. By bridging methodological developments with diverse application domains, this paper outlines current trends and future directions for semantic segmentation, emphasizing its critical role in facilitating annotation, diagnosis, and discovery in biomedical research.
基金National Natural Science Foundation of China,Grant/Award Number:62303275International Alliance for Cancer Early Detection,Grant/Award Numbers:C28070/A30912,C73666/A31378Wellcome/EPSRC Centre for Interventional and Surgical Sciences,Grant/Award Number:203145Z/16/Z。
文摘Automated prostate cancer detection in magnetic resonance imaging(MRI)scans is of significant importance for cancer patient management.Most existing computer-aided diagnosis systems adopt segmentation methods while object detection approaches recently show promising results.The authors have(1)carefully compared performances of most-developed segmentation and object detection methods in localising prostate imaging reporting and data system(PIRADS)-labelled prostate lesions on MRI scans;(2)proposed an additional customised set of lesion-level localisation sensitivity and precision;(3)proposed efficient ways to ensemble the segmentation and object detection methods for improved performances.The ground-truth(GT)perspective lesion-level sensitivity and prediction-perspective lesion-level precision are reported,to quantify the ratios of true positive voxels being detected by algorithms over the number of voxels in the GT labelled regions and predicted regions.The two networks are trained independently on 549 clinical patients data with PIRADS-V2 as GT labels,and tested on 161 internal and 100 external MRI scans.At the lesion level,nnDetection outperforms nnUNet for detecting both PIRADS≥3 and PIRADS≥4 lesions in majority cases.For example,at the average false positive prediction per patient being 3,nnDetection achieves a greater Intersection-of-Union(IoU)-based sensitivity than nnUNet for detecting PIRADS≥3 lesions,being 80.78%�1.50%versus 60.40%�1.64%(p<0.01).At the voxel level,nnUnet is in general superior or comparable to nnDetection.The proposed ensemble methods achieve improved or comparable lesion-level accuracy,in all tested clinical scenarios.For example,at 3 false positives,the lesion-wise ensemble method achieves 82.24%�1.43%sensitivity versus 80.78%�1.50%(nnDetection)and 60.40%�1.64%(nnUNet)for detecting PIRADS≥3 lesions.Consistent conclusions are also drawn from results on the external data set.
文摘Lower back pain is one of the most common medical problems in the world and it is experienced by a huge percentage of people everywhere.Due to its ability to produce a detailed view of the soft tissues,including the spinal cord,nerves,intervertebral discs,and vertebrae,Magnetic Resonance Imaging is thought to be the most effective method for imaging the spine.The semantic segmentation of vertebrae plays a major role in the diagnostic process of lumbar diseases.It is difficult to semantically partition the vertebrae in Magnetic Resonance Images from the surrounding variety of tissues,including muscles,ligaments,and intervertebral discs.U-Net is a powerful deep-learning architecture to handle the challenges of medical image analysis tasks and achieves high segmentation accuracy.This work proposes a modified U-Net architecture namely MU-Net,consisting of the Meijering convolutional layer that incorporates the Meijering filter to perform the semantic segmentation of lumbar vertebrae L1 to L5 and sacral vertebra S1.Pseudo-colour mask images were generated and used as ground truth for training the model.The work has been carried out on 1312 images expanded from T1-weighted mid-sagittal MRI images of 515 patients in the Lumbar Spine MRI Dataset publicly available from Mendeley Data.The proposed MU-Net model for the semantic segmentation of the lumbar vertebrae gives better performance with 98.79%of pixel accuracy(PA),98.66%of dice similarity coefficient(DSC),97.36%of Jaccard coefficient,and 92.55%mean Intersection over Union(mean IoU)metrics using the mentioned dataset.
基金financially supported by the Natural Science Foundation of China(42301492)the Open Fund of Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering(2022SDSJ04,2024SDSJ03)+1 种基金the Opening Fund of Key Laboratory of Geological Survey and Evaluation of Ministry of Education(GLAB 2023ZR01,GLAB2024ZR08)the Fundamental Research Funds for the Central Universities.
文摘Automatic segmentation and recognition of content and element information in color geological map are of great significance for researchers to analyze the distribution of mineral resources and predict disaster information.This article focuses on color planar raster geological map(geological maps include planar geological maps,columnar maps,and profiles).While existing deep learning approaches are often used to segment general images,their performance is limited due to complex elements,diverse regional features,and complicated backgrounds for color geological map in the domain of geoscience.To address the issue,a color geological map segmentation model is proposed that combines the Felz clustering algorithm and an improved SE-UNet deep learning network(named GeoMSeg).Firstly,a symmetrical encoder-decoder structure backbone network based on UNet is constructed,and the channel attention mechanism SENet has been incorporated to augment the network’s capacity for feature representation,enabling the model to purposefully extract map information.The SE-UNet network is employed for feature extraction from the geological map and obtain coarse segmentation results.Secondly,the Felz clustering algorithm is used for super pixel pre-segmentation of geological maps.The coarse segmentation results are refined and modified based on the super pixel pre-segmentation results to obtain the final segmentation results.This study applies GeoMSeg to the constructed dataset,and the experimental results show that the algorithm proposed in this paper has superior performance compared to other mainstream map segmentation models,with an accuracy of 91.89%and a MIoU of 71.91%.
基金The Second Tibetan Plateau Scientific Expedition and Research,No.2022QZKK0101National Natural Science Foundation of China,No.42271427。
文摘The thawing of ice-rich permafrost leads to the formation of thermokarst landforms.Precise mapping of retrogressive thaw slumps(RTSs)is imperative for assessing the degradation and carbon exchange of permafrost at both local and regional scales on the Tibetan Plateau(TP).However,previous methods for RTSs mapping rely on a large number of samples and complex classifiers with low automation level or unnecessary complexity.We propose an automatic mapping network(AmRTSNet)for producing decimeter-level RTSs maps from GaoFen-7 images based on deep learning.Both the quantitative metrics and qualitative evaluations show that AmRTSNet trained in the Beiluhe offers significant advantages over previous methods.Without further fine-tuning,we conducted RTSs automatic mapping based on AmRTSNet in the Wulanwula,Chumarhe,and Gaolinggo.Over 141,312 ha on the TP have been automatically mapped,comprising 926 RTS regions with a total RTS area of 2318.72 ha.The average statistics of the mapped RTSs show low roundness(0.38),moderate rectangularity(0.61),and high convexity(0.79).About 90%of the RTSs are smaller than 6 ha.The average aspect ratio is 2.18.RTSs are unevenly distributed in belt-like aggregations with dominant density peaks.RTSs often concentrate in hillslopes and along lateral streams,with more dense areas more likely to have larger RTSs.
基金supported by funding from the following sources:National Natural Science Foundation of China(U1904119)Research Programs of Henan Science and Technology Department(232102210033,232102210054)+3 种基金Chongqing Natural Science Foundation(CSTB2023NSCQ-MSX0070)Henan Province Key Research and Development Project(231111212000)Aviation Science Foundation(20230001055002)supported by Henan Center for Outstanding Overseas Scientists(GZS2022011).
文摘The key to the success of few-shot semantic segmentation(FSS)depends on the efficient use of limited annotated support set to accurately segment novel classes in the query set.Due to the few samples in the support set,FSS faces challenges such as intra-class differences,background(BG)mismatches between query and support sets,and ambiguous segmentation between the foreground(FG)and BG in the query set.To address these issues,The paper propose a multi-module network called CAMSNet,which includes four modules:the General Information Module(GIM),the Class Activation Map Aggregation(CAMA)module,the Self-Cross Attention(SCA)Block,and the Feature Fusion Module(FFM).In CAMSNet,The GIM employs an improved triplet loss,which concatenates word embedding vectors and support prototypes as anchors,and uses local support features of FG and BG as positive and negative samples to help solve the problem of intra-class differences.Then for the first time,the Class Activation Map(CAM)from the Weakly Supervised Semantic Segmentation(WSSS)is applied to FSS within the CAMA module.This method replaces the traditional use of cosine similarity to locate query information.Subsequently,the SCA Block processes the support and query features aggregated by the CAMA module,significantly enhancing the understanding of input information,leading to more accurate predictions and effectively addressing BG mismatch and ambiguous FG-BG segmentation.Finally,The FFM combines general class information with the enhanced query information to achieve accurate segmentation of the query image.Extensive Experiments on PASCAL and COCO demonstrate that-5i-20ithe CAMSNet yields superior performance and set a state-of-the-art.
基金the“Intelligent Recognition Industry Service Center”as part of the Featured Areas Research Center Program under the Higher Education Sprout Project by the Ministry of Education(MOE)in Taiwan,and the National Science and Technology Council,Taiwan,under grants 113-2221-E-224-041 and 113-2622-E-224-002.Additionally,partial support was provided by Isuzu Optics Corporation.
文摘Liver cancer remains a leading cause of mortality worldwide,and precise diagnostic tools are essential for effective treatment planning.Liver Tumors(LTs)vary significantly in size,shape,and location,and can present with tissues of similar intensities,making automatically segmenting and classifying LTs from abdominal tomography images crucial and challenging.This review examines recent advancements in Liver Segmentation(LS)and Tumor Segmentation(TS)algorithms,highlighting their strengths and limitations regarding precision,automation,and resilience.Performance metrics are utilized to assess key detection algorithms and analytical methods,emphasizing their effectiveness and relevance in clinical contexts.The review also addresses ongoing challenges in liver tumor segmentation and identification,such as managing high variability in patient data and ensuring robustness across different imaging conditions.It suggests directions for future research,with insights into technological advancements that can enhance surgical planning and diagnostic accuracy by comparing popular methods.This paper contributes to a comprehensive understanding of current liver tumor detection techniques,provides a roadmap for future innovations,and improves diagnostic and therapeutic outcomes for liver cancer by integrating recent progress with remaining challenges.
基金supported in part by the National Key R&D Project of China under Grant 2020YFA0712300National Natural Science Foundation of China under Grant NSFC-62231022,12031011supported in part by the NSF of China under Grant 62125108。
文摘We consider an image semantic communication system in a time-varying fading Gaussian MIMO channel,with a finite number of channel states.A deep learning-aided broadcast approach scheme is proposed to benefit the adaptive semantic transmission in terms of different channel states.We combine the classic broadcast approach with the image transformer to implement this adaptive joint source and channel coding(JSCC)scheme.Specifically,we utilize the neural network(NN)to jointly optimize the hierarchical image compression and superposition code mapping within this scheme.The learned transformers and codebooks allow recovering of the image with an adaptive quality and low error rate at the receiver side,in each channel state.The simulation results exhibit our proposed scheme can dynamically adapt the coding to the current channel state and outperform some existing intelligent schemes with the fixed coding block.
基金supported by Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(RS-2024-00460621,Developing BCI-Based Digital Health Technologies for Mental Illness and Pain Management).
文摘Automatic detection of Leukemia or blood cancer is one of the most challenging tasks that need to be addressed in the healthcare system.Analysis of white blood cells(WBCs)in the blood or bone marrow microscopic slide images play a crucial part in early identification to facilitate medical experts.For Acute Lymphocytic Leukemia(ALL),the most preferred part of the blood or marrow is to be analyzed by the experts before it spreads in the whole body and the condition becomes worse.The researchers have done a lot of work in this field,to demonstrate a comprehensive analysis few literature reviews have been published focusing on various artificial intelligence-based techniques like machine and deep learning detection of ALL.The systematic review has been done in this article under the PRISMA guidelines which presents the most recent advancements in this field.Different image segmentation techniques were broadly studied and categorized from various online databases like Google Scholar,Science Direct,and PubMed as image processing-based,traditional machine and deep learning-based,and advanced deep learning-based models were presented.Convolutional Neural Networks(CNN)based on traditional models and then the recent advancements in CNN used for the classification of ALL into its subtypes.A critical analysis of the existing methods is provided to offer clarity on the current state of the field.Finally,the paper concludes with insights and suggestions for future research,aiming to guide new researchers in the development of advanced automated systems for detecting life-threatening diseases.
基金supported in part by the Gansu Haizhi Characteristic Demonstration Project(No.GSHZTS2022-2).
文摘Since the introduction of vision Transformers into the computer vision field,many vision tasks such as semantic segmentation tasks,have undergone radical changes.Although Transformer enhances the correlation of each local feature of an image object in the hidden space through the attention mechanism,it is difficult for a segmentation head to accomplish the mask prediction for dense embedding of multi-category and multi-local features.We present patch prototype vision Transformer(PPFormer),a Transformer architecture for semantic segmentation based on knowledge-embedded patch prototypes.1)The hierarchical Transformer encoder can generate multi-scale and multi-layered patch features including seamless patch projection to obtain information of multiscale patches,and feature-clustered self-attention to enhance the interplay of multi-layered visual information with implicit position encodes.2)PPFormer utilizes a non-parametric prototype decoder to extract region observations which represent significant parts of the objects by unlearnable patch prototypes and then calculate similarity between patch prototypes and pixel embeddings.The proposed contrasting patch prototype alignment module,which uses new patch prototypes to update prototype bank,effectively maintains class boundaries for prototypes.For different application scenarios,we have launched PPFormer-S,PPFormer-M and PPFormer-L by expanding the scale.Experimental results demonstrate that PPFormer can outperform fully convolutional networks(FCN)-and attention-based semantic segmentation models on the PASCAL VOC 2012,ADE20k,and Cityscapes datasets.
基金supported by a National Natural Science Foundation of China(Grant number 41974150 and 42174158)Natural Science Basic Research Program of Shaanxi(2023-JC-YB-220).
文摘Microseismic monitoring is essential for understanding subsurface dynamics and optimizing oil and gas pro-duction.However,traditional methods for the automatic detection of microseismic events rely heavily on characteristic functions and human intervention,often resulting in suboptimal performance when dealing with complex and noisy data.In this study,we propose a novel approach that leverages deep learning frame to extract multiscale features from microseismic data using a TransUNet neural network.Our model integrates the ad-vantages of Transformer and UNet architectures to achieve high accuracy in multivariate image segmentation and precise picking of P-wave and S-wave first arrivals simultaneously.We validate our approach using both synthetic and field microseismic datasets recorded from gas storage monitoring and roof fracturing in a coal seam.The robustness of the proposed method has been verified in the testing of synthetic data with various levels of Gaussian and real background noises extracted from field data.The comparisons of the proposed method with UNet and SwinUNet in terms of the model architecture and classification performance demonstrate the Tran-sUNet achieves the optimal balance in its architecture and inference speed.With relatively low inference time and network complexity,it operates effectively in high-precision microseismic phase pickings.This advancement holds significant promise for enhancing microseismic monitoring technology in hydraulic fracturing and reser-voir monitoring applications.
基金Supported by DAAD,Google Research,and the Organization for Women in Science for the Developing World(OWSD).
文摘To review the existing deep learning applications for diagnosing diabetic retinopathy and retinopathy of prematurity diseases,the available public retinal databases for the diseases and apply the International Journal of Medical Informatics(IJMEDI)checklist were assessed the quality of included studies;an in-depth literature search in Scopus,Web of Science,IEEE and ACM databases targeting articles from inception up to 31st January 2023 was done by two independent reviewers.In the review,26 out of 1476 articles with a total of 36 models were included.Data size and model validation were found to be challenges for most studies.Deep learning models are gaining focus in the development of medical diagnosis tools and applications.However,there seems to be a critical issue with most of the studies being published,with some not including information about data sources and data sizes which is important for their performance verification.
基金funded by the Deanship of Scientific Research at Jouf University under Grant number DSR-2022-RG-0101。
文摘As legal cases grow in complexity and volume worldwide,integrating machine learning and artificial intelligence into judicial systems has become a pivotal research focus.This study introduces a comprehensive framework for verdict recommendation that synergizes rule-based methods with deep learning techniques specifically tailored to the legal domain.The proposed framework comprises three core modules:legal feature extraction,semantic similarity assessment,and verdict recommendation.For legal feature extraction,a rule-based approach leverages Black’s Law Dictionary and WordNet Synsets to construct feature vectors from judicial texts.Semantic similarity between cases is evaluated using a hybrid method that combines rule-based logic with an LSTM model,analyzing the feature vectors of query cases against a legal knowledge base.Verdicts are then recommended through a rule-based retrieval system,enhanced by predefined legal statutes and regulations.By merging rule-based methodologies with deep learning,this framework addresses the interpretability challenges often associated with contemporary AImodels,thereby enhancing both transparency and generalizability across diverse legal contexts.The system was rigorously tested using a legal corpus of 43,000 case laws across six categories:Criminal,Revenue,Service,Corporate,Constitutional,and Civil law,ensuring its adaptability across a wide range of judicial scenarios.Performance evaluation showed that the feature extraction module achieved an average accuracy of 91.6%with an F-Score of 95%.The semantic similarity module,tested using Manhattan,Euclidean,and Cosine distance metrics,achieved 88%accuracy and a 93%F-Score for short queries(Manhattan),89%accuracy and a 93.7%F-Score for medium-length queries(Euclidean),and 87%accuracy with a 92.5%F-Score for longer queries(Cosine).The verdict recommendation module outperformed existing methods,achieving 90%accuracy and a 93.75%F-Score.This study highlights the potential of hybrid AI frameworks to improve judicial decision-making and streamline legal processes,offering a robust,interpretable,and adaptable solution for the evolving demands of modern legal systems.