Neuronal soma segmentation plays a crucial role in neuroscience applications.However,the fine structure,such as boundaries,small-volume neuronal somata and fibers,are commonly present in cell images,which pose a chall...Neuronal soma segmentation plays a crucial role in neuroscience applications.However,the fine structure,such as boundaries,small-volume neuronal somata and fibers,are commonly present in cell images,which pose a challenge for accurate segmentation.In this paper,we propose a 3D semantic segmentation network for neuronal soma segmentation to address this issue.Using an encoding-decoding structure,we introduce a Multi-Scale feature extraction and Adaptive Weighting fusion module(MSAW)after each encoding block.The MSAW module can not only emphasize the fine structures via an upsampling strategy,but also provide pixel-wise weights to measure the importance of the multi-scale features.Additionally,a dynamic convolution instead of normal convolution is employed to better adapt the network to input data with different distributions.The proposed MSAW-based semantic segmentation network(MSAW-Net)was evaluated on three neuronal soma images from mouse brain and one neuronal soma image from macaque brain,demonstrating the efficiency of the proposed method.It achieved an F1 score of 91.8%on Fezf2-2A-CreER dataset,97.1%on LSL-H2B-GFP dataset,82.8%on Thy1-EGFP-Mline dataset,and 86.9%on macaque dataset,achieving improvements over the 3D U-Net model by 3.1%,3.3%,3.9%,and 2.3%,respectively.展开更多
Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to ...Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to the inability to effectively capture global information from images,CNNs can easily lead to loss of contours and textures in segmentation results.Notice that the transformer model can effectively capture the properties of long-range dependencies in the image,and furthermore,combining the CNN and the transformer can effectively extract local details and global contextual features of the image.Motivated by this,we propose a multi-branch and multi-scale attention network(M2ANet)for medical image segmentation,whose architecture consists of three components.Specifically,in the first component,we construct an adaptive multi-branch patch module for parallel extraction of image features to reduce information loss caused by downsampling.In the second component,we apply residual block to the well-known convolutional block attention module to enhance the network’s ability to recognize important features of images and alleviate the phenomenon of gradient vanishing.In the third component,we design a multi-scale feature fusion module,in which we adopt adaptive average pooling and position encoding to enhance contextual features,and then multi-head attention is introduced to further enrich feature representation.Finally,we validate the effectiveness and feasibility of the proposed M2ANet method through comparative experiments on four benchmark medical image segmentation datasets,particularly in the context of preserving contours and textures.展开更多
Due to the necessity for lightweight and efficient network models, deploying semantic segmentation models on mobile robots (MRs) is a formidable task. The fundamental limitation of the problem lies in the training per...Due to the necessity for lightweight and efficient network models, deploying semantic segmentation models on mobile robots (MRs) is a formidable task. The fundamental limitation of the problem lies in the training performance, the ability to effectively exploit the dataset, and the ability to adapt to complex environments when deploying the model. By utilizing the knowledge distillation techniques, the article strives to overcome the above challenges with the inheritance of the advantages of both the teacher model and the student model. More precisely, the ResNet152-PSP-Net model’s characteristics are utilized to train the ResNet18-PSP-Net model. Pyramid pooling blocks are utilized to decode multi-scale feature maps, creating a complete semantic map inference. The student model not only preserves the strong segmentation performance from the teacher model but also improves the inference speed of the prediction results. The proposed method exhibits a clear advantage over conventional convolutional neural network (CNN) models, as evident from the conducted experiments. Furthermore, the proposed model also shows remarkable improvement in processing speed when compared with light-weight models such as MobileNetV2 and EfficientNet based on latency and throughput parameters. The proposed KD-SegNet model obtains an accuracy of 96.3% and a mIoU (mean Intersection over Union) of 77%, outperforming the performance of existing models by more than 15% on the same training dataset. The suggested method has an average training time that is only 0.51 times less than same field models, while still achieving comparable segmentation performance. Hence, the semantic segmentation frames are collected, forming the motion trajectory for the system in the environment. Overall, this architecture shows great promise for the development of knowledge-based systems for MR’s navigation.展开更多
Convolutional neural network(CNN)with the encoder-decoder structure is popular in medical image segmentation due to its excellent local feature extraction ability but it faces limitations in capturing the global featu...Convolutional neural network(CNN)with the encoder-decoder structure is popular in medical image segmentation due to its excellent local feature extraction ability but it faces limitations in capturing the global feature.The transformer can extract the global information well but adapting it to small medical datasets is challenging and its computational complexity can be heavy.In this work,a serial and parallel network is proposed for the accurate 3D medical image segmentation by combining CNN and transformer and promoting feature interactions across various semantic levels.The core components of the proposed method include the cross window self-attention based transformer(CWST)and multi-scale local enhanced(MLE)modules.The CWST module enhances the global context understanding by partitioning 3D images into non-overlapping windows and calculating sparse global attention between windows.The MLE module selectively fuses features by computing the voxel attention between different branch features,and uses convolution to strengthen the dense local information.The experiments on the prostate,atrium,and pancreas MR/CT image datasets consistently demonstrate the advantage of the proposed method over six popular segmentation models in both qualitative evaluation and quantitative indexes such as dice similarity coefficient,Intersection over Union,95%Hausdorff distance and average symmetric surface distance.展开更多
Quantitative analysis of clinical function parameters from MRI images is crucial for diagnosing and assessing cardiovascular disease.However,the manual calculation of these parameters is challenging due to the high va...Quantitative analysis of clinical function parameters from MRI images is crucial for diagnosing and assessing cardiovascular disease.However,the manual calculation of these parameters is challenging due to the high variability among patients and the time-consuming nature of the process.In this study,the authors introduce a framework named MultiJSQ,comprising the feature presentation network(FRN)and the indicator prediction network(IEN),which is designed for simultaneous joint segmentation and quantification.The FRN is tailored for representing global image features,facilitating the direct acquisition of left ventricle(LV)contour images through pixel classification.Additionally,the IEN incorporates specifically designed modules to extract relevant clinical indices.The authors’method considers the interdependence of different tasks,demonstrating the validity of these relationships and yielding favourable results.Through extensive experiments on cardiac MR images from 145 patients,MultiJSQ achieves impressive outcomes,with low mean absolute errors of 124 mm^(2),1.72 mm,and 1.21 mm for areas,dimensions,and regional wall thicknesses,respectively,along with a Dice metric score of 0.908.The experimental findings underscore the excellent performance of our framework in LV segmentation and quantification,highlighting its promising clinical application prospects.展开更多
Chinese named entity recognition(CNER)has received widespread attention as an important task of Chinese information extraction.Most previous research has focused on individually studying flat CNER,overlapped CNER,or d...Chinese named entity recognition(CNER)has received widespread attention as an important task of Chinese information extraction.Most previous research has focused on individually studying flat CNER,overlapped CNER,or discontinuous CNER.However,a unified CNER is often needed in real-world scenarios.Recent studies have shown that grid tagging-based methods based on character-pair relationship classification hold great potential for achieving unified NER.Nevertheless,how to enrich Chinese character-pair grid representations and capture deeper dependencies between character pairs to improve entity recognition performance remains an unresolved challenge.In this study,we enhance the character-pair grid representation by incorporating both local and global information.Significantly,we introduce a new approach by considering the character-pair grid representation matrix as a specialized image,converting the classification of character-pair relationships into a pixel-level semantic segmentation task.We devise a U-shaped network to extract multi-scale and deeper semantic information from the grid image,allowing for a more comprehensive understanding of associative features between character pairs.This approach leads to improved accuracy in predicting their relationships,ultimately enhancing entity recognition performance.We conducted experiments on two public CNER datasets in the biomedical domain,namely CMeEE-V2 and Diakg.The results demonstrate the effectiveness of our approach,which achieves F1-score improvements of 7.29 percentage points and 1.64 percentage points compared to the current state-of-the-art(SOTA)models,respectively.展开更多
In actual traffic scenarios,precise recognition of traffic participants,such as vehicles and pedestrians,is crucial for intelligent transportation.This study proposes an improved algorithm built on Mask-RCNN to enhanc...In actual traffic scenarios,precise recognition of traffic participants,such as vehicles and pedestrians,is crucial for intelligent transportation.This study proposes an improved algorithm built on Mask-RCNN to enhance the ability of autonomous driving systems to recognize traffic participants.The algorithmincorporates long and shortterm memory networks and the fused attention module(GSAM,GCT,and Spatial Attention Module)to enhance the algorithm’s capability to process both global and local information.Additionally,to increase the network’s initial operation stability,the original network activation function was replaced with Gaussian error linear unit.Experiments were conducted using the publicly available Cityscapes dataset.Comparing the test results,it was observed that the revised algorithmoutperformed the original algorithmin terms of AP_(50),AP_(75),and othermetrics by 8.7%and 9.6%for target detection and 12.5%and 13.3%for segmentation.展开更多
Techniques in deep learning have significantly boosted the accuracy and productivity of computer vision segmentation tasks.This article offers an intriguing architecture for semantic,instance,and panoptic segmentation...Techniques in deep learning have significantly boosted the accuracy and productivity of computer vision segmentation tasks.This article offers an intriguing architecture for semantic,instance,and panoptic segmentation using EfficientNet-B7 and Bidirectional Feature Pyramid Networks(Bi-FPN).When implemented in place of the EfficientNet-B5 backbone,EfficientNet-B7 strengthens the model’s feature extraction capabilities and is far more appropriate for real-world applications.By ensuring superior multi-scale feature fusion,Bi-FPN integration enhances the segmentation of complex objects across various urban environments.The design suggested is examined on rigorous datasets,encompassing Cityscapes,Common Objects in Context,KITTI Karlsruhe Institute of Technology and Toyota Technological Institute,and Indian Driving Dataset,which replicate numerous real-world driving conditions.During extensive training,validation,and testing,the model showcases major gains in segmentation accuracy and surpasses state-of-the-art performance in semantic,instance,and panoptic segmentation tasks.Outperforming present methods,the recommended approach generates noteworthy gains in Panoptic Quality:+0.4%on Cityscapes,+0.2%on COCO,+1.7%on KITTI,and+0.4%on IDD.These changes show just how efficient it is in various driving circumstances and datasets.This study emphasizes the potential of EfficientNet-B7 and Bi-FPN to provide dependable,high-precision segmentation in computer vision applications,primarily autonomous driving.The research results suggest that this framework efficiently tackles the constraints of practical situations while delivering a robust solution for high-performance tasks involving segmentation.展开更多
Fractures are critical to subsurface activities such as oil and gas extraction,geothermal energy production,and carbon storage.Hydraulic fracturing,a technique that enhances fluid production,creates complex fracture n...Fractures are critical to subsurface activities such as oil and gas extraction,geothermal energy production,and carbon storage.Hydraulic fracturing,a technique that enhances fluid production,creates complex fracture networks within rock formations containing natural discontinuities.Accurately distinguishing between hydraulically induced fractures and pre-existing discontinuities is essential for understanding hydraulic fracture mechanisms.However,this remains challenging due to the interconnected nature of fractures in three-dimensional(3D)space.Manual segmentation,while adaptive,is both labor-intensive and subjective,making it impractical for large-scale 3D datasets.This study introduces a deep learning-based progressive cross-sectional segmentation method to automate the classification of 3D fracture volumes.The proposed method was applied to a 3D hydraulic fracture network in a Montney cube sample,successfully segmenting natural fractures,parted bedding planes,and hydraulic fractures with minimal user intervention.The automated approach achieves a 99.6%reduction in manual image processing workload while maintaining high segmentation accuracy,with test accuracy exceeding 98%and F1-score over 84%.This approach generalizes well to Brazilian disc samples with different fracture patterns,achieving consistently high accuracy in distinguishing between bedding and non-bedding fractures.This automated fracture segmentation method offers an effective tool for enhanced quantitative characterization of fracture networks,which would contribute to a deeper understanding of hydraulic fracturing processes.展开更多
Imaging evaluation of lymph node metastasis and infiltration faces problems such as low artificial outline efficiency and insufficient consistency.Deep learning technology based on convolutional neural networks has gr...Imaging evaluation of lymph node metastasis and infiltration faces problems such as low artificial outline efficiency and insufficient consistency.Deep learning technology based on convolutional neural networks has greatly improved the technical effect of radiomics in lymph node pathological characteristics analysis and efficacy monitoring through automatic lymph node detection,precise segmentation and three-dimensional reconstruction algorithms.This review focuses on the automatic lymph node segmentation model,treatment response prediction algorithm and benign and malignant differential diagnosis system for multimodal imaging,in order to provide a basis for further research on artificial intelligence to assist lymph node disease management and clinical decision-making,and provide a reference for promoting the construction of a system for accurate diagnosis,personalized treatment and prognostic evaluation of lymph node-related diseases.展开更多
Accurate automatic segmentation of gliomas in various sub-regions,including peritumoral edema,necrotic core,and enhancing and non-enhancing tumor core from 3D multimodal MRI images,is challenging because of its highly...Accurate automatic segmentation of gliomas in various sub-regions,including peritumoral edema,necrotic core,and enhancing and non-enhancing tumor core from 3D multimodal MRI images,is challenging because of its highly heterogeneous appearance and shape.Deep convolution neural networks(CNNs)have recently improved glioma segmentation performance.However,extensive down-sampling such as pooling or stridden convolution in CNNs significantly decreases the initial image resolution,resulting in the loss of accurate spatial and object parts information,especially information on the small sub-region tumors,affecting segmentation performance.Hence,this paper proposes a novel multi-level parallel network comprising three different level parallel subnetworks to fully use low-level,mid-level,and high-level information and improve the performance of brain tumor segmentation.We also introduce the Combo loss function to address input class imbalance and false positives and negatives imbalance in deep learning.The proposed method is trained and validated on the BraTS 2020 training and validation dataset.On the validation dataset,ourmethod achieved a mean Dice score of 0.907,0.830,and 0.787 for the whole tumor,tumor core,and enhancing tumor core,respectively.Compared with state-of-the-art methods,the multi-level parallel network has achieved competitive results on the validation dataset.展开更多
Human pose estimation is a critical research area in the field of computer vision,playing a significant role in applications such as human-computer interaction,behavior analysis,and action recognition.In this paper,we...Human pose estimation is a critical research area in the field of computer vision,playing a significant role in applications such as human-computer interaction,behavior analysis,and action recognition.In this paper,we propose a U-shaped keypoint detection network(DAUNet)based on an improved ResNet subsampling structure and spatial grouping mechanism.This network addresses key challenges in traditional methods,such as information loss,large network redundancy,and insufficient sensitivity to low-resolution features.DAUNet is composed of three main components.First,we introduce an improved BottleNeck block that employs partial convolution and strip pooling to reduce computational load and mitigate feature loss.Second,after upsampling,the network eliminates redundant features,improving the overall efficiency.Finally,a lightweight spatial grouping attention mechanism is applied to enhance low-resolution semantic features within the feature map,allowing for better restoration of the original image size and higher accuracy.Experimental results demonstrate that DAUNet achieves superior accuracy compared to most existing keypoint detection models,with a mean PCKh@0.5 score of 91.6%on the MPII dataset and an AP of 76.1%on the COCO dataset.Moreover,real-world experiments further validate the robustness and generalizability of DAUNet for detecting human bodies in unknown environments,highlighting its potential for broader applications.展开更多
The recent advancements in vision technology have had a significant impact on our ability to identify multiple objects and understand complex scenes.Various technologies,such as augmented reality-driven scene integrat...The recent advancements in vision technology have had a significant impact on our ability to identify multiple objects and understand complex scenes.Various technologies,such as augmented reality-driven scene integration,robotic navigation,autonomous driving,and guided tour systems,heavily rely on this type of scene comprehension.This paper presents a novel segmentation approach based on the UNet network model,aimed at recognizing multiple objects within an image.The methodology begins with the acquisition and preprocessing of the image,followed by segmentation using the fine-tuned UNet architecture.Afterward,we use an annotation tool to accurately label the segmented regions.Upon labeling,significant features are extracted from these segmented objects,encompassing KAZE(Accelerated Segmentation and Extraction)features,energy-based edge detection,frequency-based,and blob characteristics.For the classification stage,a convolution neural network(CNN)is employed.This comprehensive methodology demonstrates a robust framework for achieving accurate and efficient recognition of multiple objects in images.The experimental results,which include complex object datasets like MSRC-v2 and PASCAL-VOC12,have been documented.After analyzing the experimental results,it was found that the PASCAL-VOC12 dataset achieved an accuracy rate of 95%,while the MSRC-v2 dataset achieved an accuracy of 89%.The evaluation performed on these diverse datasets highlights a notably impressive level of performance.展开更多
Object segmentation and recognition is an imperative area of computer vision andmachine learning that identifies and separates individual objects within an image or video and determines classes or categories based on ...Object segmentation and recognition is an imperative area of computer vision andmachine learning that identifies and separates individual objects within an image or video and determines classes or categories based on their features.The proposed system presents a distinctive approach to object segmentation and recognition using Artificial Neural Networks(ANNs).The system takes RGB images as input and uses a k-means clustering-based segmentation technique to fragment the intended parts of the images into different regions and label thembased on their characteristics.Then,two distinct kinds of features are obtained from the segmented images to help identify the objects of interest.An Artificial Neural Network(ANN)is then used to recognize the objects based on their features.Experiments were carried out with three standard datasets,MSRC,MS COCO,and Caltech 101 which are extensively used in object recognition research,to measure the productivity of the suggested approach.The findings from the experiment support the suggested system’s validity,as it achieved class recognition accuracies of 89%,83%,and 90.30% on the MSRC,MS COCO,and Caltech 101 datasets,respectively.展开更多
Automatic segmentation of medical images provides a reliable scientific basis for disease diagnosis and analysis.Notably,most existing methods that combine the strengths of convolutional neural networks(CNNs)and Trans...Automatic segmentation of medical images provides a reliable scientific basis for disease diagnosis and analysis.Notably,most existing methods that combine the strengths of convolutional neural networks(CNNs)and Transformers have made significant progress.However,there are some limitations in the current integration of CNN and Transformer technology in two key aspects.Firstly,most methods either overlook or fail to fully incorporate the complementary nature between local and global features.Secondly,the significance of integrating the multiscale encoder features from the dual-branch network to enhance the decoding features is often disregarded in methods that combine CNN and Transformer.To address this issue,we present a groundbreaking dual-branch cross-attention fusion network(DCFNet),which efficiently combines the power of Swin Transformer and CNN to generate complementary global and local features.We then designed the Feature Cross-Fusion(FCF)module to efficiently fuse local and global features.In the FCF,the utilization of the Channel-wise Cross-fusion Transformer(CCT)serves the purpose of aggregatingmulti-scale features,and the Feature FusionModule(FFM)is employed to effectively aggregate dual-branch prominent feature regions from the spatial perspective.Furthermore,within the decoding phase of the dual-branch network,our proposed Channel Attention Block(CAB)aims to emphasize the significance of the channel features between the up-sampled features and the features generated by the FCFmodule to enhance the details of the decoding.Experimental results demonstrate that DCFNet exhibits enhanced accuracy in segmentation performance.Compared to other state-of-the-art(SOTA)methods,our segmentation framework exhibits a superior level of competitiveness.DCFNet’s accurate segmentation of medical images can greatly assist medical professionals in making crucial diagnoses of lesion areas in advance.展开更多
In recent years,semantic segmentation on 3D point cloud data has attracted much attention.Unlike 2D images where pixels distribute regularly in the image domain,3D point clouds in non-Euclidean space are irregular and...In recent years,semantic segmentation on 3D point cloud data has attracted much attention.Unlike 2D images where pixels distribute regularly in the image domain,3D point clouds in non-Euclidean space are irregular and inherently sparse.Therefore,it is very difficult to extract long-range contexts and effectively aggregate local features for semantic segmentation in 3D point cloud space.Most current methods either focus on local feature aggregation or long-range context dependency,but fail to directly establish a global-local feature extractor to complete the point cloud semantic segmentation tasks.In this paper,we propose a Transformer-based stratified graph convolutional network(SGT-Net),which enlarges the effective receptive field and builds direct long-range dependency.Specifically,we first propose a novel dense-sparse sampling strategy that provides dense local vertices and sparse long-distance vertices for subsequent graph convolutional network(GCN).Secondly,we propose a multi-key self-attention mechanism based on the Transformer to further weight augmentation for crucial neighboring relationships and enlarge the effective receptive field.In addition,to further improve the efficiency of the network,we propose a similarity measurement module to determine whether the neighborhood near the center point is effective.We demonstrate the validity and superiority of our method on the S3DIS and ShapeNet datasets.Through ablation experiments and segmentation visualization,we verify that the SGT model can improve the performance of the point cloud semantic segmentation.展开更多
In view of the problems of multi-scale changes of segmentation targets,noise interference,rough segmentation results and slow training process faced by medical image semantic segmentation,a multi-scale residual aggreg...In view of the problems of multi-scale changes of segmentation targets,noise interference,rough segmentation results and slow training process faced by medical image semantic segmentation,a multi-scale residual aggregation U-shaped attention network structure of MAAUNet(MultiRes aggregation attention UNet)is proposed based on MultiResUNet.Firstly,aggregate connection is introduced from the original feature aggregation at the same level.Skip connection is redesigned to aggregate features of different semantic scales at the decoder subnet,and the problem of semantic gaps is further solved that may exist between skip connections.Secondly,after the multi-scale convolution module,a convolution block attention module is added to focus and integrate features in the two attention directions of channel and space to adaptively optimize the intermediate feature map.Finally,the original convolution block is improved.The convolution channels are expanded with a series convolution structure to complement each other and extract richer spatial features.Residual connections are retained and the convolution block is turned into a multi-channel convolution block.The model is made to extract multi-scale spatial features.The experimental results show that MAAUNet has strong competitiveness in challenging datasets,and shows good segmentation performance and stability in dealing with multi-scale input and noise interference.展开更多
In response to issues such as incomplete segmentation and the presence of breakpoints encountered in extracting debris-flow fans using semantic segmentation models,this paper proposes a local feature and spatial atten...In response to issues such as incomplete segmentation and the presence of breakpoints encountered in extracting debris-flow fans using semantic segmentation models,this paper proposes a local feature and spatial attention mechanism to achieve precise segmentation of debris-flow fans.Firstly,leveraging the spatial inhibition mechanism from neuroscience theory as a foundation,an energy function for the local feature and spatial attention mechanism is formulated.Subsequently,by employing optimization theory,a closed-form solution for the energy function is derived,which ensures the lightweight nature of the proposed attention mechanism algorithm.Finally,the performance of this algorithm is compared with other mainstream attention mechanism algorithms embedded in semantic segmentation models through comparative experiments.Experimental results demonstrate that the proposed method outperforms both the original models and mainstream attention mechanisms across various classic models,effectively enhancing the performance of network models in debris-flow fan segmentation tasks.展开更多
Image semantic segmentation is an essential technique for studying human behavior through image data.This paper proposes an image semantic segmentation method for human behavior research.Firstly,an end-to-end convolut...Image semantic segmentation is an essential technique for studying human behavior through image data.This paper proposes an image semantic segmentation method for human behavior research.Firstly,an end-to-end convolutional neural network architecture is proposed,which consists of a depth-separable jump-connected fully convolutional network and a conditional random field network;then jump-connected convolution is used to classify each pixel in the image,and an image semantic segmentation method based on convolu-tional neural network is proposed;and then a conditional random field network is used to improve the effect of image segmentation of hu-man behavior and a linear modeling and nonlinear modeling method based on the semantic segmentation of conditional random field im-age is proposed.Finally,using the proposed image segmentation network,the input entrepreneurial image data is semantically segmented to obtain the contour features of the person;and the segmentation of the images in the medical field.The experimental results show that the image semantic segmentation method is effective.It is a new way to use image data to study human behavior and can be extended to other research areas.展开更多
Medical image segmentation has witnessed rapid advancements with the emergence of encoder-decoder based methods.In the encoder-decoder structure,the primary goal of the decoding phase is not only to restore feature ma...Medical image segmentation has witnessed rapid advancements with the emergence of encoder-decoder based methods.In the encoder-decoder structure,the primary goal of the decoding phase is not only to restore feature map resolution,but also to mitigate the loss of feature information incurred during the encoding phase.However,this approach gives rise to a challenge:multiple up-sampling operations in the decoder segment result in the loss of feature information.To address this challenge,we propose a novel network that removes the decoding structure to reduce feature information loss(CBL-Net).In particular,we introduce a Parallel Pooling Module(PPM)to counteract the feature information loss stemming from conventional and pooling operations during the encoding stage.Furthermore,we incorporate a Multiplexed Dilation Convolution(MDC)module to expand the network's receptive field.Also,although we have removed the decoding stage,we still need to recover the feature map resolution.Therefore,we introduced the Global Feature Recovery(GFR)module.It uses attention mechanism for the image feature map resolution recovery,which can effectively reduce the loss of feature information.We conduct extensive experimental evaluations on three publicly available medical image segmentation datasets:DRIVE,CHASEDB and MoNuSeg datasets.Experimental results show that our proposed network outperforms state-of-the-art methods in medical image segmentation.In addition,it achieves higher efficiency than the current network of coding and decoding structures by eliminating the decoding component.展开更多
基金supported by the STI2030-Major-Projects(No.2021ZD0200104)the National Natural Science Foundations of China under Grant 61771437.
文摘Neuronal soma segmentation plays a crucial role in neuroscience applications.However,the fine structure,such as boundaries,small-volume neuronal somata and fibers,are commonly present in cell images,which pose a challenge for accurate segmentation.In this paper,we propose a 3D semantic segmentation network for neuronal soma segmentation to address this issue.Using an encoding-decoding structure,we introduce a Multi-Scale feature extraction and Adaptive Weighting fusion module(MSAW)after each encoding block.The MSAW module can not only emphasize the fine structures via an upsampling strategy,but also provide pixel-wise weights to measure the importance of the multi-scale features.Additionally,a dynamic convolution instead of normal convolution is employed to better adapt the network to input data with different distributions.The proposed MSAW-based semantic segmentation network(MSAW-Net)was evaluated on three neuronal soma images from mouse brain and one neuronal soma image from macaque brain,demonstrating the efficiency of the proposed method.It achieved an F1 score of 91.8%on Fezf2-2A-CreER dataset,97.1%on LSL-H2B-GFP dataset,82.8%on Thy1-EGFP-Mline dataset,and 86.9%on macaque dataset,achieving improvements over the 3D U-Net model by 3.1%,3.3%,3.9%,and 2.3%,respectively.
基金supported by the Natural Science Foundation of the Anhui Higher Education Institutions of China(Grant Nos.2023AH040149 and 2024AH051915)the Anhui Provincial Natural Science Foundation(Grant No.2208085MF168)+1 种基金the Science and Technology Innovation Tackle Plan Project of Maanshan(Grant No.2024RGZN001)the Scientific Research Fund Project of Anhui Medical University(Grant No.2023xkj122).
文摘Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to the inability to effectively capture global information from images,CNNs can easily lead to loss of contours and textures in segmentation results.Notice that the transformer model can effectively capture the properties of long-range dependencies in the image,and furthermore,combining the CNN and the transformer can effectively extract local details and global contextual features of the image.Motivated by this,we propose a multi-branch and multi-scale attention network(M2ANet)for medical image segmentation,whose architecture consists of three components.Specifically,in the first component,we construct an adaptive multi-branch patch module for parallel extraction of image features to reduce information loss caused by downsampling.In the second component,we apply residual block to the well-known convolutional block attention module to enhance the network’s ability to recognize important features of images and alleviate the phenomenon of gradient vanishing.In the third component,we design a multi-scale feature fusion module,in which we adopt adaptive average pooling and position encoding to enhance contextual features,and then multi-head attention is introduced to further enrich feature representation.Finally,we validate the effectiveness and feasibility of the proposed M2ANet method through comparative experiments on four benchmark medical image segmentation datasets,particularly in the context of preserving contours and textures.
基金funded by Hanoi University of Science and Technology(HUST)under project number T2023-PC-008.
文摘Due to the necessity for lightweight and efficient network models, deploying semantic segmentation models on mobile robots (MRs) is a formidable task. The fundamental limitation of the problem lies in the training performance, the ability to effectively exploit the dataset, and the ability to adapt to complex environments when deploying the model. By utilizing the knowledge distillation techniques, the article strives to overcome the above challenges with the inheritance of the advantages of both the teacher model and the student model. More precisely, the ResNet152-PSP-Net model’s characteristics are utilized to train the ResNet18-PSP-Net model. Pyramid pooling blocks are utilized to decode multi-scale feature maps, creating a complete semantic map inference. The student model not only preserves the strong segmentation performance from the teacher model but also improves the inference speed of the prediction results. The proposed method exhibits a clear advantage over conventional convolutional neural network (CNN) models, as evident from the conducted experiments. Furthermore, the proposed model also shows remarkable improvement in processing speed when compared with light-weight models such as MobileNetV2 and EfficientNet based on latency and throughput parameters. The proposed KD-SegNet model obtains an accuracy of 96.3% and a mIoU (mean Intersection over Union) of 77%, outperforming the performance of existing models by more than 15% on the same training dataset. The suggested method has an average training time that is only 0.51 times less than same field models, while still achieving comparable segmentation performance. Hence, the semantic segmentation frames are collected, forming the motion trajectory for the system in the environment. Overall, this architecture shows great promise for the development of knowledge-based systems for MR’s navigation.
基金National Key Research and Development Program of China,Grant/Award Number:2018YFE0206900China Postdoctoral Science Foundation,Grant/Award Number:2023M731204+2 种基金The Open Project of Key Laboratory for Quality Evaluation of Ultrasound Surgical Equipment of National Medical Products Administration,Grant/Award Number:SMDTKL-2023-1-01The Hubei Province Key Research and Development Project,Grant/Award Number:2023BCB007CAAI-Huawei MindSpore Open Fund。
文摘Convolutional neural network(CNN)with the encoder-decoder structure is popular in medical image segmentation due to its excellent local feature extraction ability but it faces limitations in capturing the global feature.The transformer can extract the global information well but adapting it to small medical datasets is challenging and its computational complexity can be heavy.In this work,a serial and parallel network is proposed for the accurate 3D medical image segmentation by combining CNN and transformer and promoting feature interactions across various semantic levels.The core components of the proposed method include the cross window self-attention based transformer(CWST)and multi-scale local enhanced(MLE)modules.The CWST module enhances the global context understanding by partitioning 3D images into non-overlapping windows and calculating sparse global attention between windows.The MLE module selectively fuses features by computing the voxel attention between different branch features,and uses convolution to strengthen the dense local information.The experiments on the prostate,atrium,and pancreas MR/CT image datasets consistently demonstrate the advantage of the proposed method over six popular segmentation models in both qualitative evaluation and quantitative indexes such as dice similarity coefficient,Intersection over Union,95%Hausdorff distance and average symmetric surface distance.
基金Hefei Municipal Natural Science Foundation,Grant/Award Number:2022009Suqian Guiding Program Project,Grant/Award Number:Z202309Suqian Traditional Chinese Medicine Science and Technology Plan,Grant/Award Number:MS202301。
文摘Quantitative analysis of clinical function parameters from MRI images is crucial for diagnosing and assessing cardiovascular disease.However,the manual calculation of these parameters is challenging due to the high variability among patients and the time-consuming nature of the process.In this study,the authors introduce a framework named MultiJSQ,comprising the feature presentation network(FRN)and the indicator prediction network(IEN),which is designed for simultaneous joint segmentation and quantification.The FRN is tailored for representing global image features,facilitating the direct acquisition of left ventricle(LV)contour images through pixel classification.Additionally,the IEN incorporates specifically designed modules to extract relevant clinical indices.The authors’method considers the interdependence of different tasks,demonstrating the validity of these relationships and yielding favourable results.Through extensive experiments on cardiac MR images from 145 patients,MultiJSQ achieves impressive outcomes,with low mean absolute errors of 124 mm^(2),1.72 mm,and 1.21 mm for areas,dimensions,and regional wall thicknesses,respectively,along with a Dice metric score of 0.908.The experimental findings underscore the excellent performance of our framework in LV segmentation and quantification,highlighting its promising clinical application prospects.
基金supported by Yunnan Provincial Major Science and Technology Special Plan Projects(Grant Nos.202202AD080003,202202AE090008,202202AD080004,202302AD080003)National Natural Science Foundation of China(Grant Nos.U21B2027,62266027,62266028,62266025)Yunnan Province Young and Middle-Aged Academic and Technical Leaders Reserve Talent Program(Grant No.202305AC160063).
文摘Chinese named entity recognition(CNER)has received widespread attention as an important task of Chinese information extraction.Most previous research has focused on individually studying flat CNER,overlapped CNER,or discontinuous CNER.However,a unified CNER is often needed in real-world scenarios.Recent studies have shown that grid tagging-based methods based on character-pair relationship classification hold great potential for achieving unified NER.Nevertheless,how to enrich Chinese character-pair grid representations and capture deeper dependencies between character pairs to improve entity recognition performance remains an unresolved challenge.In this study,we enhance the character-pair grid representation by incorporating both local and global information.Significantly,we introduce a new approach by considering the character-pair grid representation matrix as a specialized image,converting the classification of character-pair relationships into a pixel-level semantic segmentation task.We devise a U-shaped network to extract multi-scale and deeper semantic information from the grid image,allowing for a more comprehensive understanding of associative features between character pairs.This approach leads to improved accuracy in predicting their relationships,ultimately enhancing entity recognition performance.We conducted experiments on two public CNER datasets in the biomedical domain,namely CMeEE-V2 and Diakg.The results demonstrate the effectiveness of our approach,which achieves F1-score improvements of 7.29 percentage points and 1.64 percentage points compared to the current state-of-the-art(SOTA)models,respectively.
基金the National Natural Science Foundation of China(52175236)Qingdao People’s Livelihood Science and Technology Plan(19-6-1-88-nsh).
文摘In actual traffic scenarios,precise recognition of traffic participants,such as vehicles and pedestrians,is crucial for intelligent transportation.This study proposes an improved algorithm built on Mask-RCNN to enhance the ability of autonomous driving systems to recognize traffic participants.The algorithmincorporates long and shortterm memory networks and the fused attention module(GSAM,GCT,and Spatial Attention Module)to enhance the algorithm’s capability to process both global and local information.Additionally,to increase the network’s initial operation stability,the original network activation function was replaced with Gaussian error linear unit.Experiments were conducted using the publicly available Cityscapes dataset.Comparing the test results,it was observed that the revised algorithmoutperformed the original algorithmin terms of AP_(50),AP_(75),and othermetrics by 8.7%and 9.6%for target detection and 12.5%and 13.3%for segmentation.
文摘Techniques in deep learning have significantly boosted the accuracy and productivity of computer vision segmentation tasks.This article offers an intriguing architecture for semantic,instance,and panoptic segmentation using EfficientNet-B7 and Bidirectional Feature Pyramid Networks(Bi-FPN).When implemented in place of the EfficientNet-B5 backbone,EfficientNet-B7 strengthens the model’s feature extraction capabilities and is far more appropriate for real-world applications.By ensuring superior multi-scale feature fusion,Bi-FPN integration enhances the segmentation of complex objects across various urban environments.The design suggested is examined on rigorous datasets,encompassing Cityscapes,Common Objects in Context,KITTI Karlsruhe Institute of Technology and Toyota Technological Institute,and Indian Driving Dataset,which replicate numerous real-world driving conditions.During extensive training,validation,and testing,the model showcases major gains in segmentation accuracy and surpasses state-of-the-art performance in semantic,instance,and panoptic segmentation tasks.Outperforming present methods,the recommended approach generates noteworthy gains in Panoptic Quality:+0.4%on Cityscapes,+0.2%on COCO,+1.7%on KITTI,and+0.4%on IDD.These changes show just how efficient it is in various driving circumstances and datasets.This study emphasizes the potential of EfficientNet-B7 and Bi-FPN to provide dependable,high-precision segmentation in computer vision applications,primarily autonomous driving.The research results suggest that this framework efficiently tackles the constraints of practical situations while delivering a robust solution for high-performance tasks involving segmentation.
基金supported through the Natural Sciences and Engineering Research Council of Canada(NSERC)Discovery Grants 341275,CRDPJ 543894-19NSERC/Energi Simulation Industrial Research Chair program.
文摘Fractures are critical to subsurface activities such as oil and gas extraction,geothermal energy production,and carbon storage.Hydraulic fracturing,a technique that enhances fluid production,creates complex fracture networks within rock formations containing natural discontinuities.Accurately distinguishing between hydraulically induced fractures and pre-existing discontinuities is essential for understanding hydraulic fracture mechanisms.However,this remains challenging due to the interconnected nature of fractures in three-dimensional(3D)space.Manual segmentation,while adaptive,is both labor-intensive and subjective,making it impractical for large-scale 3D datasets.This study introduces a deep learning-based progressive cross-sectional segmentation method to automate the classification of 3D fracture volumes.The proposed method was applied to a 3D hydraulic fracture network in a Montney cube sample,successfully segmenting natural fractures,parted bedding planes,and hydraulic fractures with minimal user intervention.The automated approach achieves a 99.6%reduction in manual image processing workload while maintaining high segmentation accuracy,with test accuracy exceeding 98%and F1-score over 84%.This approach generalizes well to Brazilian disc samples with different fracture patterns,achieving consistently high accuracy in distinguishing between bedding and non-bedding fractures.This automated fracture segmentation method offers an effective tool for enhanced quantitative characterization of fracture networks,which would contribute to a deeper understanding of hydraulic fracturing processes.
基金Supported by Clinical Trials from the Nanjing Drum Tower Hospital,Affiliated Hospital of Medical School,Nanjing University,No.2021-LCYJ-MS-11Nanjing Drum Tower Hospital National Natural Science Foundation Youth Cultivation Project,No.2024-JCYJQP-15.
文摘Imaging evaluation of lymph node metastasis and infiltration faces problems such as low artificial outline efficiency and insufficient consistency.Deep learning technology based on convolutional neural networks has greatly improved the technical effect of radiomics in lymph node pathological characteristics analysis and efficacy monitoring through automatic lymph node detection,precise segmentation and three-dimensional reconstruction algorithms.This review focuses on the automatic lymph node segmentation model,treatment response prediction algorithm and benign and malignant differential diagnosis system for multimodal imaging,in order to provide a basis for further research on artificial intelligence to assist lymph node disease management and clinical decision-making,and provide a reference for promoting the construction of a system for accurate diagnosis,personalized treatment and prognostic evaluation of lymph node-related diseases.
基金supported by the Sichuan Science and Technology Program (No.2019YJ0356).
文摘Accurate automatic segmentation of gliomas in various sub-regions,including peritumoral edema,necrotic core,and enhancing and non-enhancing tumor core from 3D multimodal MRI images,is challenging because of its highly heterogeneous appearance and shape.Deep convolution neural networks(CNNs)have recently improved glioma segmentation performance.However,extensive down-sampling such as pooling or stridden convolution in CNNs significantly decreases the initial image resolution,resulting in the loss of accurate spatial and object parts information,especially information on the small sub-region tumors,affecting segmentation performance.Hence,this paper proposes a novel multi-level parallel network comprising three different level parallel subnetworks to fully use low-level,mid-level,and high-level information and improve the performance of brain tumor segmentation.We also introduce the Combo loss function to address input class imbalance and false positives and negatives imbalance in deep learning.The proposed method is trained and validated on the BraTS 2020 training and validation dataset.On the validation dataset,ourmethod achieved a mean Dice score of 0.907,0.830,and 0.787 for the whole tumor,tumor core,and enhancing tumor core,respectively.Compared with state-of-the-art methods,the multi-level parallel network has achieved competitive results on the validation dataset.
基金supported by the Natural Science Foundation of Hubei Province of China under grant number 2022CFB536the National Natural Science Foundation of China under grant number 62367006the 15th Graduate Education Innovation Fund of Wuhan Institute of Technology under grant number CX2023579.
文摘Human pose estimation is a critical research area in the field of computer vision,playing a significant role in applications such as human-computer interaction,behavior analysis,and action recognition.In this paper,we propose a U-shaped keypoint detection network(DAUNet)based on an improved ResNet subsampling structure and spatial grouping mechanism.This network addresses key challenges in traditional methods,such as information loss,large network redundancy,and insufficient sensitivity to low-resolution features.DAUNet is composed of three main components.First,we introduce an improved BottleNeck block that employs partial convolution and strip pooling to reduce computational load and mitigate feature loss.Second,after upsampling,the network eliminates redundant features,improving the overall efficiency.Finally,a lightweight spatial grouping attention mechanism is applied to enhance low-resolution semantic features within the feature map,allowing for better restoration of the original image size and higher accuracy.Experimental results demonstrate that DAUNet achieves superior accuracy compared to most existing keypoint detection models,with a mean PCKh@0.5 score of 91.6%on the MPII dataset and an AP of 76.1%on the COCO dataset.Moreover,real-world experiments further validate the robustness and generalizability of DAUNet for detecting human bodies in unknown environments,highlighting its potential for broader applications.
基金supported by the MSIT(Ministry of Science and ICT),Korea,under the ICAN(ICT Challenge and Advanced Network of HRD)Program(IITP-2024-RS-2022-00156326)supervised by the IITP(Institute of Information&Communications Technology Planning&Evaluation)+2 种基金The authors are thankful to the Deanship of Scientific Research at Najran University for funding this work under the Research Group Funding Program Grant Code(NU/GP/SERC/13/30)funding for this work was provided by Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2024R410)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.The authors extend their appreciation to the Deanship of Scientific Research at Northern Border University,Arar,KSA for funding this research work through the Project Number“NBU-FFR-2024-231-06”.
文摘The recent advancements in vision technology have had a significant impact on our ability to identify multiple objects and understand complex scenes.Various technologies,such as augmented reality-driven scene integration,robotic navigation,autonomous driving,and guided tour systems,heavily rely on this type of scene comprehension.This paper presents a novel segmentation approach based on the UNet network model,aimed at recognizing multiple objects within an image.The methodology begins with the acquisition and preprocessing of the image,followed by segmentation using the fine-tuned UNet architecture.Afterward,we use an annotation tool to accurately label the segmented regions.Upon labeling,significant features are extracted from these segmented objects,encompassing KAZE(Accelerated Segmentation and Extraction)features,energy-based edge detection,frequency-based,and blob characteristics.For the classification stage,a convolution neural network(CNN)is employed.This comprehensive methodology demonstrates a robust framework for achieving accurate and efficient recognition of multiple objects in images.The experimental results,which include complex object datasets like MSRC-v2 and PASCAL-VOC12,have been documented.After analyzing the experimental results,it was found that the PASCAL-VOC12 dataset achieved an accuracy rate of 95%,while the MSRC-v2 dataset achieved an accuracy of 89%.The evaluation performed on these diverse datasets highlights a notably impressive level of performance.
基金supported by the MSIT(Ministry of Science and ICT)Korea,under the ITRC(Information Technology Research Center)Support Program(IITP-2023-2018-0-01426)supervised by the IITP(Institute for Information&Communications Technology Planning&Evaluation)+1 种基金Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2023R410),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabiathe Deanship of Scientific Research at Najran University for funding this work under the Research Group Funding Program Grant Code(NU/RG/SERC/12/6).
文摘Object segmentation and recognition is an imperative area of computer vision andmachine learning that identifies and separates individual objects within an image or video and determines classes or categories based on their features.The proposed system presents a distinctive approach to object segmentation and recognition using Artificial Neural Networks(ANNs).The system takes RGB images as input and uses a k-means clustering-based segmentation technique to fragment the intended parts of the images into different regions and label thembased on their characteristics.Then,two distinct kinds of features are obtained from the segmented images to help identify the objects of interest.An Artificial Neural Network(ANN)is then used to recognize the objects based on their features.Experiments were carried out with three standard datasets,MSRC,MS COCO,and Caltech 101 which are extensively used in object recognition research,to measure the productivity of the suggested approach.The findings from the experiment support the suggested system’s validity,as it achieved class recognition accuracies of 89%,83%,and 90.30% on the MSRC,MS COCO,and Caltech 101 datasets,respectively.
基金supported by the National Key R&D Program of China(2018AAA0102100)the National Natural Science Foundation of China(No.62376287)+3 种基金the International Science and Technology Innovation Joint Base of Machine Vision and Medical Image Processing in Hunan Province(2021CB1013)the Key Research and Development Program of Hunan Province(2022SK2054)the Natural Science Foundation of Hunan Province(No.2022JJ30762,2023JJ70016)the 111 Project under Grant(No.B18059).
文摘Automatic segmentation of medical images provides a reliable scientific basis for disease diagnosis and analysis.Notably,most existing methods that combine the strengths of convolutional neural networks(CNNs)and Transformers have made significant progress.However,there are some limitations in the current integration of CNN and Transformer technology in two key aspects.Firstly,most methods either overlook or fail to fully incorporate the complementary nature between local and global features.Secondly,the significance of integrating the multiscale encoder features from the dual-branch network to enhance the decoding features is often disregarded in methods that combine CNN and Transformer.To address this issue,we present a groundbreaking dual-branch cross-attention fusion network(DCFNet),which efficiently combines the power of Swin Transformer and CNN to generate complementary global and local features.We then designed the Feature Cross-Fusion(FCF)module to efficiently fuse local and global features.In the FCF,the utilization of the Channel-wise Cross-fusion Transformer(CCT)serves the purpose of aggregatingmulti-scale features,and the Feature FusionModule(FFM)is employed to effectively aggregate dual-branch prominent feature regions from the spatial perspective.Furthermore,within the decoding phase of the dual-branch network,our proposed Channel Attention Block(CAB)aims to emphasize the significance of the channel features between the up-sampled features and the features generated by the FCFmodule to enhance the details of the decoding.Experimental results demonstrate that DCFNet exhibits enhanced accuracy in segmentation performance.Compared to other state-of-the-art(SOTA)methods,our segmentation framework exhibits a superior level of competitiveness.DCFNet’s accurate segmentation of medical images can greatly assist medical professionals in making crucial diagnoses of lesion areas in advance.
基金supported in part by the National Natural Science Foundation of China under Grant Nos.U20A20197,62306187the Foundation of Ministry of Industry and Information Technology TC220H05X-04.
文摘In recent years,semantic segmentation on 3D point cloud data has attracted much attention.Unlike 2D images where pixels distribute regularly in the image domain,3D point clouds in non-Euclidean space are irregular and inherently sparse.Therefore,it is very difficult to extract long-range contexts and effectively aggregate local features for semantic segmentation in 3D point cloud space.Most current methods either focus on local feature aggregation or long-range context dependency,but fail to directly establish a global-local feature extractor to complete the point cloud semantic segmentation tasks.In this paper,we propose a Transformer-based stratified graph convolutional network(SGT-Net),which enlarges the effective receptive field and builds direct long-range dependency.Specifically,we first propose a novel dense-sparse sampling strategy that provides dense local vertices and sparse long-distance vertices for subsequent graph convolutional network(GCN).Secondly,we propose a multi-key self-attention mechanism based on the Transformer to further weight augmentation for crucial neighboring relationships and enlarge the effective receptive field.In addition,to further improve the efficiency of the network,we propose a similarity measurement module to determine whether the neighborhood near the center point is effective.We demonstrate the validity and superiority of our method on the S3DIS and ShapeNet datasets.Through ablation experiments and segmentation visualization,we verify that the SGT model can improve the performance of the point cloud semantic segmentation.
基金National Natural Science Foundation of China(No.61806006)Jiangsu University Superior Discipline Construction Project。
文摘In view of the problems of multi-scale changes of segmentation targets,noise interference,rough segmentation results and slow training process faced by medical image semantic segmentation,a multi-scale residual aggregation U-shaped attention network structure of MAAUNet(MultiRes aggregation attention UNet)is proposed based on MultiResUNet.Firstly,aggregate connection is introduced from the original feature aggregation at the same level.Skip connection is redesigned to aggregate features of different semantic scales at the decoder subnet,and the problem of semantic gaps is further solved that may exist between skip connections.Secondly,after the multi-scale convolution module,a convolution block attention module is added to focus and integrate features in the two attention directions of channel and space to adaptively optimize the intermediate feature map.Finally,the original convolution block is improved.The convolution channels are expanded with a series convolution structure to complement each other and extract richer spatial features.Residual connections are retained and the convolution block is turned into a multi-channel convolution block.The model is made to extract multi-scale spatial features.The experimental results show that MAAUNet has strong competitiveness in challenging datasets,and shows good segmentation performance and stability in dealing with multi-scale input and noise interference.
基金National Natural Science Foundation of China,No.61966040.
文摘In response to issues such as incomplete segmentation and the presence of breakpoints encountered in extracting debris-flow fans using semantic segmentation models,this paper proposes a local feature and spatial attention mechanism to achieve precise segmentation of debris-flow fans.Firstly,leveraging the spatial inhibition mechanism from neuroscience theory as a foundation,an energy function for the local feature and spatial attention mechanism is formulated.Subsequently,by employing optimization theory,a closed-form solution for the energy function is derived,which ensures the lightweight nature of the proposed attention mechanism algorithm.Finally,the performance of this algorithm is compared with other mainstream attention mechanism algorithms embedded in semantic segmentation models through comparative experiments.Experimental results demonstrate that the proposed method outperforms both the original models and mainstream attention mechanisms across various classic models,effectively enhancing the performance of network models in debris-flow fan segmentation tasks.
基金Supported by the Major Consulting and Research Project of the Chinese Academy of Engineering(2020-CQ-ZD-1)the National Natural Science Foundation of China(72101235)Zhejiang Soft Science Research Program(2023C35012)。
文摘Image semantic segmentation is an essential technique for studying human behavior through image data.This paper proposes an image semantic segmentation method for human behavior research.Firstly,an end-to-end convolutional neural network architecture is proposed,which consists of a depth-separable jump-connected fully convolutional network and a conditional random field network;then jump-connected convolution is used to classify each pixel in the image,and an image semantic segmentation method based on convolu-tional neural network is proposed;and then a conditional random field network is used to improve the effect of image segmentation of hu-man behavior and a linear modeling and nonlinear modeling method based on the semantic segmentation of conditional random field im-age is proposed.Finally,using the proposed image segmentation network,the input entrepreneurial image data is semantically segmented to obtain the contour features of the person;and the segmentation of the images in the medical field.The experimental results show that the image semantic segmentation method is effective.It is a new way to use image data to study human behavior and can be extended to other research areas.
基金funded by the National Key Research and Development Program of China(Grant 2020YFB1708900)the Fundamental Research Funds for the Central Universities(Grant No.B220201044).
文摘Medical image segmentation has witnessed rapid advancements with the emergence of encoder-decoder based methods.In the encoder-decoder structure,the primary goal of the decoding phase is not only to restore feature map resolution,but also to mitigate the loss of feature information incurred during the encoding phase.However,this approach gives rise to a challenge:multiple up-sampling operations in the decoder segment result in the loss of feature information.To address this challenge,we propose a novel network that removes the decoding structure to reduce feature information loss(CBL-Net).In particular,we introduce a Parallel Pooling Module(PPM)to counteract the feature information loss stemming from conventional and pooling operations during the encoding stage.Furthermore,we incorporate a Multiplexed Dilation Convolution(MDC)module to expand the network's receptive field.Also,although we have removed the decoding stage,we still need to recover the feature map resolution.Therefore,we introduced the Global Feature Recovery(GFR)module.It uses attention mechanism for the image feature map resolution recovery,which can effectively reduce the loss of feature information.We conduct extensive experimental evaluations on three publicly available medical image segmentation datasets:DRIVE,CHASEDB and MoNuSeg datasets.Experimental results show that our proposed network outperforms state-of-the-art methods in medical image segmentation.In addition,it achieves higher efficiency than the current network of coding and decoding structures by eliminating the decoding component.