Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to ...Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to the inability to effectively capture global information from images,CNNs can easily lead to loss of contours and textures in segmentation results.Notice that the transformer model can effectively capture the properties of long-range dependencies in the image,and furthermore,combining the CNN and the transformer can effectively extract local details and global contextual features of the image.Motivated by this,we propose a multi-branch and multi-scale attention network(M2ANet)for medical image segmentation,whose architecture consists of three components.Specifically,in the first component,we construct an adaptive multi-branch patch module for parallel extraction of image features to reduce information loss caused by downsampling.In the second component,we apply residual block to the well-known convolutional block attention module to enhance the network’s ability to recognize important features of images and alleviate the phenomenon of gradient vanishing.In the third component,we design a multi-scale feature fusion module,in which we adopt adaptive average pooling and position encoding to enhance contextual features,and then multi-head attention is introduced to further enrich feature representation.Finally,we validate the effectiveness and feasibility of the proposed M2ANet method through comparative experiments on four benchmark medical image segmentation datasets,particularly in the context of preserving contours and textures.展开更多
This paper aims to develop a nonrigid registration method of preoperative and intraoperative thoracoabdominal CT images in computer-assisted interventional surgeries for accurate tumor localization and tissue visualiz...This paper aims to develop a nonrigid registration method of preoperative and intraoperative thoracoabdominal CT images in computer-assisted interventional surgeries for accurate tumor localization and tissue visualization enhancement.However,fine structure registration of complex thoracoabdominal organs and large deformation registration caused by respiratory motion is challenging.To deal with this problem,we propose a 3D multi-scale attention VoxelMorph(MAVoxelMorph)registration network.To alleviate the large deformation problem,a multi-scale axial attention mechanism is utilized by using a residual dilated pyramid pooling for multi-scale feature extraction,and position-aware axial attention for long-distance dependencies between pixels capture.To further improve the large deformation and fine structure registration results,a multi-scale context channel attention mechanism is employed utilizing content information via adjacent encoding layers.Our method was evaluated on four public lung datasets(DIR-Lab dataset,Creatis dataset,Learn2Reg dataset,OASIS dataset)and a local dataset.Results proved that the proposed method achieved better registration performance than current state-of-the-art methods,especially in handling the registration of large deformations and fine structures.It also proved to be fast in 3D image registration,using about 1.5 s,and faster than most methods.Qualitative and quantitative assessments proved that the proposed MA-VoxelMorph has the potential to realize precise and fast tumor localization in clinical interventional surgeries.展开更多
We propose a hierarchical multi-scale attention mechanism-based model in response to the low accuracy and inefficient manual classification of existing oceanic biological image classification methods. Firstly, the hie...We propose a hierarchical multi-scale attention mechanism-based model in response to the low accuracy and inefficient manual classification of existing oceanic biological image classification methods. Firstly, the hierarchical efficient multi-scale attention(H-EMA) module is designed for lightweight feature extraction, achieving outstanding performance at a relatively low cost. Secondly, an improved EfficientNetV2 block is used to integrate information from different scales better and enhance inter-layer message passing. Furthermore, introducing the convolutional block attention module(CBAM) enhances the model's perception of critical features, optimizing its generalization ability. Lastly, Focal Loss is introduced to adjust the weights of complex samples to address the issue of imbalanced categories in the dataset, further improving the model's performance. The model achieved 96.11% accuracy on the intertidal marine organism dataset of Nanji Islands and 84.78% accuracy on the CIFAR-100 dataset, demonstrating its strong generalization ability to meet the demands of oceanic biological image classification.展开更多
Whole brain functional connectivity(FC)patterns obtained from resting-state functional magnetic resonance imaging(rs-fMRI)have been widely used in the diagnosis of brain disorders such as autism spectrum disorder(ASD)...Whole brain functional connectivity(FC)patterns obtained from resting-state functional magnetic resonance imaging(rs-fMRI)have been widely used in the diagnosis of brain disorders such as autism spectrum disorder(ASD).Recently,an increasing number of studies have focused on employing deep learning techniques to analyze FC patterns for brain disease classification.However,the high dimensionality of the FC features and the interpretation of deep learning results are issues that need to be addressed in the FC-based brain disease classification.In this paper,we proposed a multi-scale attention-based deep neural network(MSA-DNN)model to classify FC patterns for the ASD diagnosis.The model was implemented by adding a flexible multi-scale attention(MSA)module to the auto-encoder based backbone DNN,which can extract multi-scale features of the FC patterns and change the level of attention for different FCs by continuous learning.Our model will reinforce the weights of important FC features while suppress the unimportant FCs to ensure the sparsity of the model weights and enhance the model interpretability.We performed systematic experiments on the large multi-sites ASD dataset with both ten-fold and leaveone-site-out cross-validations.Results showed that our model outperformed classical methods in brain disease classification and revealed robust intersite prediction performance.We also localized important FC features and brain regions associated with ASD classification.Overall,our study further promotes the biomarker detection and computer-aided classification for ASD diagnosis,and the proposed MSA module is flexible and easy to implement in other classification networks.展开更多
The goal of street-to-aerial cross-view image geo-localization is to determine the location of the query street-view image by retrieving the aerial-view image from the same place.The drastic viewpoint and appearance g...The goal of street-to-aerial cross-view image geo-localization is to determine the location of the query street-view image by retrieving the aerial-view image from the same place.The drastic viewpoint and appearance gap between the aerial-view and the street-view images brings a huge challenge against this task.In this paper,we propose a novel multiscale attention encoder to capture the multiscale contextual information of the aerial/street-view images.To bridge the domain gap between these two view images,we first use an inverse polar transform to make the street-view images approximately aligned with the aerial-view images.Then,the explored multiscale attention encoder is applied to convert the image into feature representation with the guidance of the learnt multiscale information.Finally,we propose a novel global mining strategy to enable the network to pay more attention to hard negative exemplars.Experiments on standard benchmark datasets show that our approach obtains 81.39%top-1 recall rate on the CVUSA dataset and 71.52%on the CVACT dataset,achieving the state-of-the-art performance and outperforming most of the existing methods significantly.展开更多
Bone age assessment(BAA)helps doctors determine how a child’s bones grow and develop in clinical medicine.Traditional BAA methods rely on clinician expertise,leading to time-consuming predictions and inaccurate resul...Bone age assessment(BAA)helps doctors determine how a child’s bones grow and develop in clinical medicine.Traditional BAA methods rely on clinician expertise,leading to time-consuming predictions and inaccurate results.Most deep learning-based BAA methods feed the extracted critical points of images into the network by providing additional annotations.This operation is costly and subjective.To address these problems,we propose a multi-scale attentional densely connected network(MSADCN)in this paper.MSADCN constructs a multi-scale dense connectivity mechanism,which can avoid overfitting,obtain the local features effectively and prevent gradient vanishing even in limited training data.First,MSADCN designs multi-scale structures in the densely connected network to extract fine-grained features at different scales.Then,coordinate attention is embedded to focus on critical features and automatically locate the regions of interest(ROI)without additional annotation.In addition,to improve the model’s generalization,transfer learning is applied to train the proposed MSADCN on the public dataset IMDB-WIKI,and the obtained pre-trained weights are loaded onto the Radiological Society of North America(RSNA)dataset.Finally,label distribution learning(LDL)and expectation regression techniques are introduced into our model to exploit the correlation between hand bone images of different ages,which can obtain stable age estimates.Extensive experiments confirm that our model can converge more efficiently and obtain a mean absolute error(MAE)of 4.64 months,outperforming some state-of-the-art BAA methods.展开更多
Fine-grained visual classification(FGVC)is a very challenging task due to distinguishing subcategories under the same super-category.Recent works mainly localize discriminative image regions and capture subtle inter-c...Fine-grained visual classification(FGVC)is a very challenging task due to distinguishing subcategories under the same super-category.Recent works mainly localize discriminative image regions and capture subtle inter-class differences by utilizing attention-based methods.However,at the same layer,most attention-based works only consider large-scale attention blocks with the same size as feature maps,and they ignore small-scale attention blocks that are smaller than feature maps.To distinguish subcategories,it is important to exploit small local regions.In this work,a novel multi-scale attention network(MSANet)is proposed to capture large and small regions at the same layer in fine-grained visual classification.Specifically,a novel multi-scale attention layer(MSAL)is proposed,which generates multiple groups in each feature maps to capture different-scale discriminative regions.The groups based on large-scale regions can exploit global features and the groups based on the small-scale regions can extract local subtle features.Then,a simple feature fusion strategy is utilized to fully integrate global features and local subtle features to mine information that are more conducive to FGVC.Comprehensive experiments in Caltech-UCSD Birds-200-2011(CUB),FGVC-Aircraft(AIR)and Stanford Cars(Cars)datasets show that our method achieves the competitive performances,which demonstrate its effectiveness.展开更多
Images taken in dim environments frequently exhibit issues like insufficient brightness,noise,color shifts,and loss of detail.These problems pose significant challenges to dark image enhancement tasks.Current approach...Images taken in dim environments frequently exhibit issues like insufficient brightness,noise,color shifts,and loss of detail.These problems pose significant challenges to dark image enhancement tasks.Current approaches,while effective in global illumination modeling,often struggle to simultaneously suppress noise and preserve structural details,especially under heterogeneous lighting.Furthermore,misalignment between luminance and color channels introduces additional challenges to accurate enhancement.In response to the aforementioned difficulties,we introduce a single-stage framework,M2ATNet,using the multi-scale multi-attention and Transformer architecture.First,to address the problems of texture blurring and residual noise,we design a multi-scale multi-attention denoising module(MMAD),which is applied separately to the luminance and color channels to enhance the structural and texture modeling capabilities.Secondly,to solve the non-alignment problem of the luminance and color channels,we introduce the multi-channel feature fusion Transformer(CFFT)module,which effectively recovers the dark details and corrects the color shifts through cross-channel alignment and deep feature interaction.To guide the model to learn more stably and efficiently,we also fuse multiple types of loss functions to form a hybrid loss term.We extensively evaluate the proposed method on various standard datasets,including LOL-v1,LOL-v2,DICM,LIME,and NPE.Evaluation in terms of numerical metrics and visual quality demonstrate that M2ATNet consistently outperforms existing advanced approaches.Ablation studies further confirm the critical roles played by the MMAD and CFFT modules to detail preservation and visual fidelity under challenging illumination-deficient environments.展开更多
Distributed secondary control has been proposed to maintain frequency/voltage synchronization and power sharing for distributed energy sources in AC microgrids(MGs).The cyber layer is susceptible to time delays and cy...Distributed secondary control has been proposed to maintain frequency/voltage synchronization and power sharing for distributed energy sources in AC microgrids(MGs).The cyber layer is susceptible to time delays and cyber failures and thus,a distributed resilient secondary control should be investigated.This paper proposes a distributed multi-scale attention and predictor-based control(DMAPC)strategy to address false data injection attacks and packet loss failures with time delays.The multi-scale attention mechanism enables the system to selectively focus on neighbors'states with higher confidence evaluated in different time scales,while the data-driven predictor compensates for lost neighbors'states in the nonlinear controller.The DMAPC does not impose strict limitations on the number of false communication links or upper bound for false data.Besides,the DMAPC is formulated as an uncertain system with time delays and is proven to be uniformly ultimately bounded.Extensive experiments on a hardware-in-the-loop MG testbed have validated the effectiveness of DMAPC,which successfully relaxes restrictions on cyber failures compared to existing strategies.展开更多
Shadows in document images are undesirable yet inevitable.They can decrease the clarity and readability of the images.The existing methods for removing shadows from documents still face some challenges,such as the tra...Shadows in document images are undesirable yet inevitable.They can decrease the clarity and readability of the images.The existing methods for removing shadows from documents still face some challenges,such as the traditional heuristics lack universality and the optimization goal of subnetworks is not consistent for multistage deep learning methods.In this paper,we introduce an end-to-end direct document shadow removal network(DDSR-Net),where we employ a 3-layer UNet++as the backbone to extract features from diverse scales.To further improve the performance of DDSR-Net,we integrate the multi-scale attention(MSA)blocks into each node.The MSA block allocates different weights to feature vectors at different positions,achieving automatic feature alignment and significantly enhancing the end-to-end network's ability to handle shadow processing.To verify the effectiveness of the proposed DDSR-Net,qualitative and quantitative experiments are conducted on multiple open-source document shadow removal datasets.The experimental results demonstrate that our method outperforms the existing state-of-the-art methods on these datasets.Our code and models will be released to the public.展开更多
Underwater target detection is extensively applied in domains such as underwater search and rescue,environmental monitoring,and marine resource surveys.It is crucial in enabling autonomous underwater robot operations ...Underwater target detection is extensively applied in domains such as underwater search and rescue,environmental monitoring,and marine resource surveys.It is crucial in enabling autonomous underwater robot operations and promoting ocean exploration.Nevertheless,low imaging quality,harsh underwater environments,and obscured objects considerably increase the difficulty of detecting underwater targets,making it difficult for current detection methods to achieve optimal performance.In order to enhance underwater object perception and improve target detection precision,we propose a lightweight underwater target detection method using You Only Look Once(YOLO)v8 with multi-scale cross-channel attention(MSCCA),named YOLOv8-UOD.In the proposed multiscale cross-channel attention module,multi-scale attention(MSA)augments the variety of attentional perception by extracting information from innately diverse sensory fields.The cross-channel strategy utilizes RepVGGbased channel shuffling(RCS)and one-shot aggregation(OSA)to rearrange feature map channels according to specific rules.It aggregates all features only once in the final feature mapping,resulting in the extraction of more comprehensive and valuable feature information.The experimental results show that the proposed YOLOv8-UOD achieves a mAP50 of 95.67%and FLOPs of 23.8 G on the Underwater Robot Picking Contest 2017(URPC2017)dataset,outperforming other methods in terms of detection precision and computational cost-efficiency.展开更多
The application of image super-resolution(SR)has brought significant assistance in the medical field,aiding doctors to make more precise diagnoses.However,solely relying on a convolutional neural network(CNN)for image...The application of image super-resolution(SR)has brought significant assistance in the medical field,aiding doctors to make more precise diagnoses.However,solely relying on a convolutional neural network(CNN)for image SR may lead to issues such as blurry details and excessive smoothness.To address the limitations,we proposed an algorithm based on the generative adversarial network(GAN)framework.In the generator network,three different sizes of convolutions connected by a residual dense structure were used to extract detailed features,and an attention mechanism combined with dual channel and spatial information was applied to concentrate the computing power on crucial areas.In the discriminator network,using InstanceNorm to normalize tensors sped up the training process while retaining feature information.The experimental results demonstrate that our algorithm achieves higher peak signal-to-noise ratio(PSNR)and structural similarity index measure(SSIM)compared to other methods,resulting in an improved visual quality.展开更多
Accurate traffic flow prediction has a profound impact on modern traffic management. Traffic flow has complex spatial-temporal correlations and periodicity, which poses difficulties for precise prediction. To address ...Accurate traffic flow prediction has a profound impact on modern traffic management. Traffic flow has complex spatial-temporal correlations and periodicity, which poses difficulties for precise prediction. To address this problem, a Multi-head Self-attention and Spatial-Temporal Graph Convolutional Network (MSSTGCN) for multiscale traffic flow prediction is proposed. Firstly, to capture the hidden traffic periodicity of traffic flow, traffic flow is divided into three kinds of periods, including hourly, daily, and weekly data. Secondly, a graph attention residual layer is constructed to learn the global spatial features across regions. Local spatial-temporal dependence is captured by using a T-GCN module. Thirdly, a transformer layer is introduced to learn the long-term dependence in time. A position embedding mechanism is introduced to label position information for all traffic sequences. Thus, this multi-head self-attention mechanism can recognize the sequence order and allocate weights for different time nodes. Experimental results on four real-world datasets show that the MSSTGCN performs better than the baseline methods and can be successfully adapted to traffic prediction tasks.展开更多
Multimodal image fusion plays an important role in image analysis and applications.Multimodal medical image fusion helps to combine contrast features from two or more input imaging modalities to represent fused inform...Multimodal image fusion plays an important role in image analysis and applications.Multimodal medical image fusion helps to combine contrast features from two or more input imaging modalities to represent fused information in a single image.One of the critical clinical applications of medical image fusion is to fuse anatomical and functional modalities for rapid diagnosis of malignant tissues.This paper proposes a multimodal medical image fusion network(MMIF-Net)based on multiscale hybrid attention.The method first decomposes the original image to obtain the low-rank and significant parts.Then,to utilize the features at different scales,we add amultiscalemechanism that uses three filters of different sizes to extract the features in the encoded network.Also,a hybrid attention module is introduced to obtain more image details.Finally,the fused images are reconstructed by decoding the network.We conducted experiments with clinical images from brain computed tomography/magnetic resonance.The experimental results show that the multimodal medical image fusion network method based on multiscale hybrid attention works better than other advanced fusion methods.展开更多
This paper proposes an automated detection framework for transmission facilities using a featureattention multi-scale robustness network(FAMSR-Net)with high-fidelity virtual images.The proposed framework exhibits thre...This paper proposes an automated detection framework for transmission facilities using a featureattention multi-scale robustness network(FAMSR-Net)with high-fidelity virtual images.The proposed framework exhibits three key characteristics.First,virtual images of the transmission facilities generated using StyleGAN2-ADA are co-trained with real images.This enables the neural network to learn various features of transmission facilities to improve the detection performance.Second,the convolutional block attention module is deployed in FAMSR-Net to effectively extract features from images and construct multi-dimensional feature maps,enabling the neural network to perform precise object detection in various environments.Third,an effective bounding box optimization method called Scylla-IoU is deployed on FAMSR-Net,considering the intersection over union,center point distance,angle,and shape of the bounding box.This enables the detection of power facilities of various sizes accurately.Extensive experiments demonstrated that FAMSRNet outperforms other neural networks in detecting power facilities.FAMSR-Net also achieved the highest detection accuracy when virtual images of the transmission facilities were co-trained in the training phase.The proposed framework is effective for the scheduled operation and maintenance of transmission facilities because an optical camera is currently the most promising tool for unmanned aerial vehicles.This ultimately contributes to improved inspection efficiency,reduced maintenance risks,and more reliable power delivery across extensive transmission facilities.展开更多
Semantic segmentation plays a foundational role in biomedical image analysis, providing precise information about cellular, tissue, and organ structures in both biological and medical imaging modalities. Traditional a...Semantic segmentation plays a foundational role in biomedical image analysis, providing precise information about cellular, tissue, and organ structures in both biological and medical imaging modalities. Traditional approaches often fail in the face of challenges such as low contrast, morphological variability, and densely packed structures. Recent advancements in deep learning have transformed segmentation capabilities through the integration of fine-scale detail preservation, coarse-scale contextual modeling, and multi-scale feature fusion. This work provides a comprehensive analysis of state-of-the-art deep learning models, including U-Net variants, attention-based frameworks, and Transformer-integrated networks, highlighting innovations that improve accuracy, generalizability, and computational efficiency. Key architectural components such as convolution operations, shallow and deep blocks, skip connections, and hybrid encoders are examined for their roles in enhancing spatial representation and semantic consistency. We further discuss the importance of hierarchical and instance-aware segmentation and annotation in interpreting complex biological scenes and multiplexed medical images. By bridging methodological developments with diverse application domains, this paper outlines current trends and future directions for semantic segmentation, emphasizing its critical role in facilitating annotation, diagnosis, and discovery in biomedical research.展开更多
As one of the key technologies of intelligent vehicles, traffic sign detection is still a challenging task because of the tiny size of its target object. To address the challenge, we present a novel detection network ...As one of the key technologies of intelligent vehicles, traffic sign detection is still a challenging task because of the tiny size of its target object. To address the challenge, we present a novel detection network improved from yolo-v3 for the tiny traffic sign with high precision in real-time. First, a visual multi-scale attention module(MSAM), a light-weight yet effective module, is devised to fuse the multi-scale feature maps with channel weights and spatial masks. It increases the representation power of the network by emphasizing useful features and suppressing unnecessary ones. Second, we exploit effectively fine-grained features about tiny objects from the shallower layers through modifying backbone Darknet-53 and adding one prediction head to yolo-v3. Finally, a receptive field block is added into the neck of the network to broaden the receptive field. Experiments prove the effectiveness of our network in both quantitative and qualitative aspects. The m AP@0.5 of our network reaches 0.965 and its detection speed is55.56 FPS for 512 × 512 images on the challenging Tsinghua-Tencent 100 k(TT100 k) dataset.展开更多
随着光伏发电在全球能源体系中占比不断提升,超短期光伏发电量预测对电力系统调度与安全运行至关重要。然而,光伏发电量受多因素影响,具有显著随机性与波动性。为此,提出了一种基于TCN-BiLSTM-Attention模型的超短期光伏发电量预测方法...随着光伏发电在全球能源体系中占比不断提升,超短期光伏发电量预测对电力系统调度与安全运行至关重要。然而,光伏发电量受多因素影响,具有显著随机性与波动性。为此,提出了一种基于TCN-BiLSTM-Attention模型的超短期光伏发电量预测方法。首先通过皮尔逊相关分析筛选关键特征,并利用孤立森林算法检测异常值,结合线性插值法和标准化完成数据预处理。随后,通过时间卷积网络(Temporal Convolutional Network,TCN)提取时序特征,再利用双向长短期记忆网络(Bidirectional Long Short-Term Memory,BiLSTM)网络捕获前后向时间依赖关系,并在输出端引入注意力机制聚焦关键时间步特征。最后,在Desert Knowledge Australia Solar Centre(DKASC)数据集上的对比实验表明,与传统LSTM、BiLSTM模型相比,提出的TCN-BiLSTM-Attention模型在预测精度、稳定性等方面均表现出一定优势。展开更多
With the rapid expansion of drone applications,accurate detection of objects in aerial imagery has become crucial for intelligent transportation,urban management,and emergency rescue missions.However,existing methods ...With the rapid expansion of drone applications,accurate detection of objects in aerial imagery has become crucial for intelligent transportation,urban management,and emergency rescue missions.However,existing methods face numerous challenges in practical deployment,including scale variation handling,feature degradation,and complex backgrounds.To address these issues,we propose Edge-enhanced and Detail-Capturing You Only Look Once(EHDC-YOLO),a novel framework for object detection in Unmanned Aerial Vehicle(UAV)imagery.Based on the You Only Look Once version 11 nano(YOLOv11n)baseline,EHDC-YOLO systematically introduces several architectural enhancements:(1)a Multi-Scale Edge Enhancement(MSEE)module that leverages multi-scale pooling and edge information to enhance boundary feature extraction;(2)an Enhanced Feature Pyramid Network(EFPN)that integrates P2-level features with Cross Stage Partial(CSP)structures and OmniKernel convolutions for better fine-grained representation;and(3)Dynamic Head(DyHead)with multi-dimensional attention mechanisms for enhanced cross-scale modeling and perspective adaptability.Comprehensive experiments on the Vision meets Drones for Detection(VisDrone-DET)2019 dataset demonstrate that EHDC-YOLO achieves significant improvements,increasing mean Average Precision(mAP)@0.5 from 33.2%to 46.1%(an absolute improvement of 12.9 percentage points)and mAP@0.5:0.95 from 19.5%to 28.0%(an absolute improvement of 8.5 percentage points)compared with the YOLOv11n baseline,while maintaining a reasonable parameter count(2.81 M vs the baseline’s 2.58 M).Further ablation studies confirm the effectiveness of each proposed component,while visualization results highlight EHDC-YOLO’s superior performance in detecting objects and handling occlusions in complex drone scenarios.展开更多
Brain tumors require precise segmentation for diagnosis and treatment plans due to their complex morphology and heterogeneous characteristics.While MRI-based automatic brain tumor segmentation technology reduces the b...Brain tumors require precise segmentation for diagnosis and treatment plans due to their complex morphology and heterogeneous characteristics.While MRI-based automatic brain tumor segmentation technology reduces the burden on medical staff and provides quantitative information,existing methodologies and recent models still struggle to accurately capture and classify the fine boundaries and diverse morphologies of tumors.In order to address these challenges and maximize the performance of brain tumor segmentation,this research introduces a novel SwinUNETR-based model by integrating a new decoder block,the Hierarchical Channel-wise Attention Decoder(HCAD),into a powerful SwinUNETR encoder.The HCAD decoder block utilizes hierarchical features and channelspecific attention mechanisms to further fuse information at different scales transmitted from the encoder and preserve spatial details throughout the reconstruction phase.Rigorous evaluations on the recent BraTS GLI datasets demonstrate that the proposed SwinHCAD model achieved superior and improved segmentation accuracy on both the Dice score and HD95 metrics across all tumor subregions(WT,TC,and ET)compared to baseline models.In particular,the rationale and contribution of the model design were clarified through ablation studies to verify the effectiveness of the proposed HCAD decoder block.The results of this study are expected to greatly contribute to enhancing the efficiency of clinical diagnosis and treatment planning by increasing the precision of automated brain tumor segmentation.展开更多
基金supported by the Natural Science Foundation of the Anhui Higher Education Institutions of China(Grant Nos.2023AH040149 and 2024AH051915)the Anhui Provincial Natural Science Foundation(Grant No.2208085MF168)+1 种基金the Science and Technology Innovation Tackle Plan Project of Maanshan(Grant No.2024RGZN001)the Scientific Research Fund Project of Anhui Medical University(Grant No.2023xkj122).
文摘Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to the inability to effectively capture global information from images,CNNs can easily lead to loss of contours and textures in segmentation results.Notice that the transformer model can effectively capture the properties of long-range dependencies in the image,and furthermore,combining the CNN and the transformer can effectively extract local details and global contextual features of the image.Motivated by this,we propose a multi-branch and multi-scale attention network(M2ANet)for medical image segmentation,whose architecture consists of three components.Specifically,in the first component,we construct an adaptive multi-branch patch module for parallel extraction of image features to reduce information loss caused by downsampling.In the second component,we apply residual block to the well-known convolutional block attention module to enhance the network’s ability to recognize important features of images and alleviate the phenomenon of gradient vanishing.In the third component,we design a multi-scale feature fusion module,in which we adopt adaptive average pooling and position encoding to enhance contextual features,and then multi-head attention is introduced to further enrich feature representation.Finally,we validate the effectiveness and feasibility of the proposed M2ANet method through comparative experiments on four benchmark medical image segmentation datasets,particularly in the context of preserving contours and textures.
基金supported in part by the National Natural Science Foundation of China[62301374]Hubei Provincial Natural Science Foundation of China[2022CFB804]+2 种基金Hubei Provincial Education Research Project[B2022057]the Youths Science Foundation of Wuhan Institute of Technology[K202240]the 15th Graduate Education Innovation Fund of Wuhan Institute of Technology[CX2023295].
文摘This paper aims to develop a nonrigid registration method of preoperative and intraoperative thoracoabdominal CT images in computer-assisted interventional surgeries for accurate tumor localization and tissue visualization enhancement.However,fine structure registration of complex thoracoabdominal organs and large deformation registration caused by respiratory motion is challenging.To deal with this problem,we propose a 3D multi-scale attention VoxelMorph(MAVoxelMorph)registration network.To alleviate the large deformation problem,a multi-scale axial attention mechanism is utilized by using a residual dilated pyramid pooling for multi-scale feature extraction,and position-aware axial attention for long-distance dependencies between pixels capture.To further improve the large deformation and fine structure registration results,a multi-scale context channel attention mechanism is employed utilizing content information via adjacent encoding layers.Our method was evaluated on four public lung datasets(DIR-Lab dataset,Creatis dataset,Learn2Reg dataset,OASIS dataset)and a local dataset.Results proved that the proposed method achieved better registration performance than current state-of-the-art methods,especially in handling the registration of large deformations and fine structures.It also proved to be fast in 3D image registration,using about 1.5 s,and faster than most methods.Qualitative and quantitative assessments proved that the proposed MA-VoxelMorph has the potential to realize precise and fast tumor localization in clinical interventional surgeries.
基金supported by the National Natural Science Foundation of China (Nos.61806107 and 61702135)。
文摘We propose a hierarchical multi-scale attention mechanism-based model in response to the low accuracy and inefficient manual classification of existing oceanic biological image classification methods. Firstly, the hierarchical efficient multi-scale attention(H-EMA) module is designed for lightweight feature extraction, achieving outstanding performance at a relatively low cost. Secondly, an improved EfficientNetV2 block is used to integrate information from different scales better and enhance inter-layer message passing. Furthermore, introducing the convolutional block attention module(CBAM) enhances the model's perception of critical features, optimizing its generalization ability. Lastly, Focal Loss is introduced to adjust the weights of complex samples to address the issue of imbalanced categories in the dataset, further improving the model's performance. The model achieved 96.11% accuracy on the intertidal marine organism dataset of Nanji Islands and 84.78% accuracy on the CIFAR-100 dataset, demonstrating its strong generalization ability to meet the demands of oceanic biological image classification.
基金This work was supported by the National Natural Science Foundation of China(No.61906006).
文摘Whole brain functional connectivity(FC)patterns obtained from resting-state functional magnetic resonance imaging(rs-fMRI)have been widely used in the diagnosis of brain disorders such as autism spectrum disorder(ASD).Recently,an increasing number of studies have focused on employing deep learning techniques to analyze FC patterns for brain disease classification.However,the high dimensionality of the FC features and the interpretation of deep learning results are issues that need to be addressed in the FC-based brain disease classification.In this paper,we proposed a multi-scale attention-based deep neural network(MSA-DNN)model to classify FC patterns for the ASD diagnosis.The model was implemented by adding a flexible multi-scale attention(MSA)module to the auto-encoder based backbone DNN,which can extract multi-scale features of the FC patterns and change the level of attention for different FCs by continuous learning.Our model will reinforce the weights of important FC features while suppress the unimportant FCs to ensure the sparsity of the model weights and enhance the model interpretability.We performed systematic experiments on the large multi-sites ASD dataset with both ten-fold and leaveone-site-out cross-validations.Results showed that our model outperformed classical methods in brain disease classification and revealed robust intersite prediction performance.We also localized important FC features and brain regions associated with ASD classification.Overall,our study further promotes the biomarker detection and computer-aided classification for ASD diagnosis,and the proposed MSA module is flexible and easy to implement in other classification networks.
基金National Natural Science Foundation of China,Grant/Award Number:62106177supported by the Central University Basic Research Fund of China(No.2042020KF0016)supported by the supercomputing system in the Supercomputing Center of Wuhan University.
文摘The goal of street-to-aerial cross-view image geo-localization is to determine the location of the query street-view image by retrieving the aerial-view image from the same place.The drastic viewpoint and appearance gap between the aerial-view and the street-view images brings a huge challenge against this task.In this paper,we propose a novel multiscale attention encoder to capture the multiscale contextual information of the aerial/street-view images.To bridge the domain gap between these two view images,we first use an inverse polar transform to make the street-view images approximately aligned with the aerial-view images.Then,the explored multiscale attention encoder is applied to convert the image into feature representation with the guidance of the learnt multiscale information.Finally,we propose a novel global mining strategy to enable the network to pay more attention to hard negative exemplars.Experiments on standard benchmark datasets show that our approach obtains 81.39%top-1 recall rate on the CVUSA dataset and 71.52%on the CVACT dataset,achieving the state-of-the-art performance and outperforming most of the existing methods significantly.
基金This research is partially supported by grant from the National Natural Science Foundation of China(No.72071019)grant from the Natural Science Foundation of Chongqing(No.cstc2021jcyj-msxmX0185)grant from the Chongqing Graduate Education and Teaching Reform Research Project(No.yjg193096).
文摘Bone age assessment(BAA)helps doctors determine how a child’s bones grow and develop in clinical medicine.Traditional BAA methods rely on clinician expertise,leading to time-consuming predictions and inaccurate results.Most deep learning-based BAA methods feed the extracted critical points of images into the network by providing additional annotations.This operation is costly and subjective.To address these problems,we propose a multi-scale attentional densely connected network(MSADCN)in this paper.MSADCN constructs a multi-scale dense connectivity mechanism,which can avoid overfitting,obtain the local features effectively and prevent gradient vanishing even in limited training data.First,MSADCN designs multi-scale structures in the densely connected network to extract fine-grained features at different scales.Then,coordinate attention is embedded to focus on critical features and automatically locate the regions of interest(ROI)without additional annotation.In addition,to improve the model’s generalization,transfer learning is applied to train the proposed MSADCN on the public dataset IMDB-WIKI,and the obtained pre-trained weights are loaded onto the Radiological Society of North America(RSNA)dataset.Finally,label distribution learning(LDL)and expectation regression techniques are introduced into our model to exploit the correlation between hand bone images of different ages,which can obtain stable age estimates.Extensive experiments confirm that our model can converge more efficiently and obtain a mean absolute error(MAE)of 4.64 months,outperforming some state-of-the-art BAA methods.
基金jointly supported by the National Science and Technology Major Project(2022ZD0117103)the National Natural Science Foundations of China(62272364)+2 种基金the provincial Key Research and Development Program of Shaanxi(2024GH-ZDXM-47)the Research Project on Higher Education Teaching Reform of Shaanxi Province(23JG003)the Natural Science Basic Research Program of Shaanxi(2024JC-YBQN0639).
文摘Fine-grained visual classification(FGVC)is a very challenging task due to distinguishing subcategories under the same super-category.Recent works mainly localize discriminative image regions and capture subtle inter-class differences by utilizing attention-based methods.However,at the same layer,most attention-based works only consider large-scale attention blocks with the same size as feature maps,and they ignore small-scale attention blocks that are smaller than feature maps.To distinguish subcategories,it is important to exploit small local regions.In this work,a novel multi-scale attention network(MSANet)is proposed to capture large and small regions at the same layer in fine-grained visual classification.Specifically,a novel multi-scale attention layer(MSAL)is proposed,which generates multiple groups in each feature maps to capture different-scale discriminative regions.The groups based on large-scale regions can exploit global features and the groups based on the small-scale regions can extract local subtle features.Then,a simple feature fusion strategy is utilized to fully integrate global features and local subtle features to mine information that are more conducive to FGVC.Comprehensive experiments in Caltech-UCSD Birds-200-2011(CUB),FGVC-Aircraft(AIR)and Stanford Cars(Cars)datasets show that our method achieves the competitive performances,which demonstrate its effectiveness.
基金funded by the National Natural Science Foundation of China,grant numbers 52374156 and 62476005。
文摘Images taken in dim environments frequently exhibit issues like insufficient brightness,noise,color shifts,and loss of detail.These problems pose significant challenges to dark image enhancement tasks.Current approaches,while effective in global illumination modeling,often struggle to simultaneously suppress noise and preserve structural details,especially under heterogeneous lighting.Furthermore,misalignment between luminance and color channels introduces additional challenges to accurate enhancement.In response to the aforementioned difficulties,we introduce a single-stage framework,M2ATNet,using the multi-scale multi-attention and Transformer architecture.First,to address the problems of texture blurring and residual noise,we design a multi-scale multi-attention denoising module(MMAD),which is applied separately to the luminance and color channels to enhance the structural and texture modeling capabilities.Secondly,to solve the non-alignment problem of the luminance and color channels,we introduce the multi-channel feature fusion Transformer(CFFT)module,which effectively recovers the dark details and corrects the color shifts through cross-channel alignment and deep feature interaction.To guide the model to learn more stably and efficiently,we also fuse multiple types of loss functions to form a hybrid loss term.We extensively evaluate the proposed method on various standard datasets,including LOL-v1,LOL-v2,DICM,LIME,and NPE.Evaluation in terms of numerical metrics and visual quality demonstrate that M2ATNet consistently outperforms existing advanced approaches.Ablation studies further confirm the critical roles played by the MMAD and CFFT modules to detail preservation and visual fidelity under challenging illumination-deficient environments.
基金supported by the Science and Technology Project of State Grid Corporation of China(No.5100202119559A-0-5-SF)。
文摘Distributed secondary control has been proposed to maintain frequency/voltage synchronization and power sharing for distributed energy sources in AC microgrids(MGs).The cyber layer is susceptible to time delays and cyber failures and thus,a distributed resilient secondary control should be investigated.This paper proposes a distributed multi-scale attention and predictor-based control(DMAPC)strategy to address false data injection attacks and packet loss failures with time delays.The multi-scale attention mechanism enables the system to selectively focus on neighbors'states with higher confidence evaluated in different time scales,while the data-driven predictor compensates for lost neighbors'states in the nonlinear controller.The DMAPC does not impose strict limitations on the number of false communication links or upper bound for false data.Besides,the DMAPC is formulated as an uncertain system with time delays and is proven to be uniformly ultimately bounded.Extensive experiments on a hardware-in-the-loop MG testbed have validated the effectiveness of DMAPC,which successfully relaxes restrictions on cyber failures compared to existing strategies.
基金supported by the National Natural Science Foundation of China,Youth Fund,China(No.62102318)the Basic Research Programs of Taicang,China,2023(No.TC2023JC23)+2 种基金funded in part by the National Natural Science Foundation of China(No.92267203)in part by the Science and Technology Major Project of Guangzhou,China(No.202007030006)in part by the Program for Guangdong Introducing Innovative and Entrepreneurial Teams,China(No.2019ZT08X214).
文摘Shadows in document images are undesirable yet inevitable.They can decrease the clarity and readability of the images.The existing methods for removing shadows from documents still face some challenges,such as the traditional heuristics lack universality and the optimization goal of subnetworks is not consistent for multistage deep learning methods.In this paper,we introduce an end-to-end direct document shadow removal network(DDSR-Net),where we employ a 3-layer UNet++as the backbone to extract features from diverse scales.To further improve the performance of DDSR-Net,we integrate the multi-scale attention(MSA)blocks into each node.The MSA block allocates different weights to feature vectors at different positions,achieving automatic feature alignment and significantly enhancing the end-to-end network's ability to handle shadow processing.To verify the effectiveness of the proposed DDSR-Net,qualitative and quantitative experiments are conducted on multiple open-source document shadow removal datasets.The experimental results demonstrate that our method outperforms the existing state-of-the-art methods on these datasets.Our code and models will be released to the public.
基金supported in part by the National Natural Science Foundation of China Grants 62402085,61972062,62306060the Liaoning Doctoral Research Start-Up Fund 2023-BS-078+1 种基金the Dalian Youth Science and Technology Star Project 2023RQ023the Liaoning Basic Research Project 2023JH2/101300191.
文摘Underwater target detection is extensively applied in domains such as underwater search and rescue,environmental monitoring,and marine resource surveys.It is crucial in enabling autonomous underwater robot operations and promoting ocean exploration.Nevertheless,low imaging quality,harsh underwater environments,and obscured objects considerably increase the difficulty of detecting underwater targets,making it difficult for current detection methods to achieve optimal performance.In order to enhance underwater object perception and improve target detection precision,we propose a lightweight underwater target detection method using You Only Look Once(YOLO)v8 with multi-scale cross-channel attention(MSCCA),named YOLOv8-UOD.In the proposed multiscale cross-channel attention module,multi-scale attention(MSA)augments the variety of attentional perception by extracting information from innately diverse sensory fields.The cross-channel strategy utilizes RepVGGbased channel shuffling(RCS)and one-shot aggregation(OSA)to rearrange feature map channels according to specific rules.It aggregates all features only once in the final feature mapping,resulting in the extraction of more comprehensive and valuable feature information.The experimental results show that the proposed YOLOv8-UOD achieves a mAP50 of 95.67%and FLOPs of 23.8 G on the Underwater Robot Picking Contest 2017(URPC2017)dataset,outperforming other methods in terms of detection precision and computational cost-efficiency.
文摘The application of image super-resolution(SR)has brought significant assistance in the medical field,aiding doctors to make more precise diagnoses.However,solely relying on a convolutional neural network(CNN)for image SR may lead to issues such as blurry details and excessive smoothness.To address the limitations,we proposed an algorithm based on the generative adversarial network(GAN)framework.In the generator network,three different sizes of convolutions connected by a residual dense structure were used to extract detailed features,and an attention mechanism combined with dual channel and spatial information was applied to concentrate the computing power on crucial areas.In the discriminator network,using InstanceNorm to normalize tensors sped up the training process while retaining feature information.The experimental results demonstrate that our algorithm achieves higher peak signal-to-noise ratio(PSNR)and structural similarity index measure(SSIM)compared to other methods,resulting in an improved visual quality.
基金supported by the National Natural Science Foundation of China(Grant Nos.62472149,62376089,62202147)Hubei Provincial Science and Technology Plan Project(2023BCB04100).
文摘Accurate traffic flow prediction has a profound impact on modern traffic management. Traffic flow has complex spatial-temporal correlations and periodicity, which poses difficulties for precise prediction. To address this problem, a Multi-head Self-attention and Spatial-Temporal Graph Convolutional Network (MSSTGCN) for multiscale traffic flow prediction is proposed. Firstly, to capture the hidden traffic periodicity of traffic flow, traffic flow is divided into three kinds of periods, including hourly, daily, and weekly data. Secondly, a graph attention residual layer is constructed to learn the global spatial features across regions. Local spatial-temporal dependence is captured by using a T-GCN module. Thirdly, a transformer layer is introduced to learn the long-term dependence in time. A position embedding mechanism is introduced to label position information for all traffic sequences. Thus, this multi-head self-attention mechanism can recognize the sequence order and allocate weights for different time nodes. Experimental results on four real-world datasets show that the MSSTGCN performs better than the baseline methods and can be successfully adapted to traffic prediction tasks.
基金supported by Qingdao Huanghai University School-Level ScientificResearch Project(2023KJ14)Undergraduate Teaching Reform Research Project of Shandong Provincial Department of Education(M2022328)+1 种基金National Natural Science Foundation of China under Grant(42472324)Qingdao Postdoctoral Foundation under Grant(QDBSH202402049).
文摘Multimodal image fusion plays an important role in image analysis and applications.Multimodal medical image fusion helps to combine contrast features from two or more input imaging modalities to represent fused information in a single image.One of the critical clinical applications of medical image fusion is to fuse anatomical and functional modalities for rapid diagnosis of malignant tissues.This paper proposes a multimodal medical image fusion network(MMIF-Net)based on multiscale hybrid attention.The method first decomposes the original image to obtain the low-rank and significant parts.Then,to utilize the features at different scales,we add amultiscalemechanism that uses three filters of different sizes to extract the features in the encoded network.Also,a hybrid attention module is introduced to obtain more image details.Finally,the fused images are reconstructed by decoding the network.We conducted experiments with clinical images from brain computed tomography/magnetic resonance.The experimental results show that the multimodal medical image fusion network method based on multiscale hybrid attention works better than other advanced fusion methods.
基金supported by the Korea Electric Power Corporation(R22TA14,Development of Drone Systemfor Diagnosis of Porcelain Insulators in Overhead Transmission Lines)the National Fire Agency of Korea(RS-2024-00408270,Fire Hazard Analysis and Fire Safety Standards Development for Transportation and Storage Stage of Reuse Battery)the Ministry of the Interior and Safety of Korea(RS-2024-00408982,Development of Intelligent Fire Detection and Sprinkler Facility Technology Reflecting the Characteristics of Logistics Facilities).
文摘This paper proposes an automated detection framework for transmission facilities using a featureattention multi-scale robustness network(FAMSR-Net)with high-fidelity virtual images.The proposed framework exhibits three key characteristics.First,virtual images of the transmission facilities generated using StyleGAN2-ADA are co-trained with real images.This enables the neural network to learn various features of transmission facilities to improve the detection performance.Second,the convolutional block attention module is deployed in FAMSR-Net to effectively extract features from images and construct multi-dimensional feature maps,enabling the neural network to perform precise object detection in various environments.Third,an effective bounding box optimization method called Scylla-IoU is deployed on FAMSR-Net,considering the intersection over union,center point distance,angle,and shape of the bounding box.This enables the detection of power facilities of various sizes accurately.Extensive experiments demonstrated that FAMSRNet outperforms other neural networks in detecting power facilities.FAMSR-Net also achieved the highest detection accuracy when virtual images of the transmission facilities were co-trained in the training phase.The proposed framework is effective for the scheduled operation and maintenance of transmission facilities because an optical camera is currently the most promising tool for unmanned aerial vehicles.This ultimately contributes to improved inspection efficiency,reduced maintenance risks,and more reliable power delivery across extensive transmission facilities.
基金Open Access funding provided by the National Institutes of Health(NIH)The funding for this project was provided by NCATS Intramural Fund.
文摘Semantic segmentation plays a foundational role in biomedical image analysis, providing precise information about cellular, tissue, and organ structures in both biological and medical imaging modalities. Traditional approaches often fail in the face of challenges such as low contrast, morphological variability, and densely packed structures. Recent advancements in deep learning have transformed segmentation capabilities through the integration of fine-scale detail preservation, coarse-scale contextual modeling, and multi-scale feature fusion. This work provides a comprehensive analysis of state-of-the-art deep learning models, including U-Net variants, attention-based frameworks, and Transformer-integrated networks, highlighting innovations that improve accuracy, generalizability, and computational efficiency. Key architectural components such as convolution operations, shallow and deep blocks, skip connections, and hybrid encoders are examined for their roles in enhancing spatial representation and semantic consistency. We further discuss the importance of hierarchical and instance-aware segmentation and annotation in interpreting complex biological scenes and multiplexed medical images. By bridging methodological developments with diverse application domains, this paper outlines current trends and future directions for semantic segmentation, emphasizing its critical role in facilitating annotation, diagnosis, and discovery in biomedical research.
基金supported by the National Key R&D Program of China(Grant Nos.2018YFB2101100 and 2019YFB2101600)the National Natural Science Foundation of China(Grant No.62176016)+2 种基金the Guizhou Province Science and Technology Project:Research and Demonstration of Science and Technology Big Data Mining Technology Based on Knowledge Graph(Qiankehe[2021]General 382)the Training Program of the Major Research Plan of the National Natural Science Foundation of China(Grant No.92046015)the Beijing Natural Science Foundation Program and Scientific Research Key Program of Beijing Municipal Commission of Education(Grant No.KZ202010025047)。
文摘As one of the key technologies of intelligent vehicles, traffic sign detection is still a challenging task because of the tiny size of its target object. To address the challenge, we present a novel detection network improved from yolo-v3 for the tiny traffic sign with high precision in real-time. First, a visual multi-scale attention module(MSAM), a light-weight yet effective module, is devised to fuse the multi-scale feature maps with channel weights and spatial masks. It increases the representation power of the network by emphasizing useful features and suppressing unnecessary ones. Second, we exploit effectively fine-grained features about tiny objects from the shallower layers through modifying backbone Darknet-53 and adding one prediction head to yolo-v3. Finally, a receptive field block is added into the neck of the network to broaden the receptive field. Experiments prove the effectiveness of our network in both quantitative and qualitative aspects. The m AP@0.5 of our network reaches 0.965 and its detection speed is55.56 FPS for 512 × 512 images on the challenging Tsinghua-Tencent 100 k(TT100 k) dataset.
文摘随着光伏发电在全球能源体系中占比不断提升,超短期光伏发电量预测对电力系统调度与安全运行至关重要。然而,光伏发电量受多因素影响,具有显著随机性与波动性。为此,提出了一种基于TCN-BiLSTM-Attention模型的超短期光伏发电量预测方法。首先通过皮尔逊相关分析筛选关键特征,并利用孤立森林算法检测异常值,结合线性插值法和标准化完成数据预处理。随后,通过时间卷积网络(Temporal Convolutional Network,TCN)提取时序特征,再利用双向长短期记忆网络(Bidirectional Long Short-Term Memory,BiLSTM)网络捕获前后向时间依赖关系,并在输出端引入注意力机制聚焦关键时间步特征。最后,在Desert Knowledge Australia Solar Centre(DKASC)数据集上的对比实验表明,与传统LSTM、BiLSTM模型相比,提出的TCN-BiLSTM-Attention模型在预测精度、稳定性等方面均表现出一定优势。
文摘With the rapid expansion of drone applications,accurate detection of objects in aerial imagery has become crucial for intelligent transportation,urban management,and emergency rescue missions.However,existing methods face numerous challenges in practical deployment,including scale variation handling,feature degradation,and complex backgrounds.To address these issues,we propose Edge-enhanced and Detail-Capturing You Only Look Once(EHDC-YOLO),a novel framework for object detection in Unmanned Aerial Vehicle(UAV)imagery.Based on the You Only Look Once version 11 nano(YOLOv11n)baseline,EHDC-YOLO systematically introduces several architectural enhancements:(1)a Multi-Scale Edge Enhancement(MSEE)module that leverages multi-scale pooling and edge information to enhance boundary feature extraction;(2)an Enhanced Feature Pyramid Network(EFPN)that integrates P2-level features with Cross Stage Partial(CSP)structures and OmniKernel convolutions for better fine-grained representation;and(3)Dynamic Head(DyHead)with multi-dimensional attention mechanisms for enhanced cross-scale modeling and perspective adaptability.Comprehensive experiments on the Vision meets Drones for Detection(VisDrone-DET)2019 dataset demonstrate that EHDC-YOLO achieves significant improvements,increasing mean Average Precision(mAP)@0.5 from 33.2%to 46.1%(an absolute improvement of 12.9 percentage points)and mAP@0.5:0.95 from 19.5%to 28.0%(an absolute improvement of 8.5 percentage points)compared with the YOLOv11n baseline,while maintaining a reasonable parameter count(2.81 M vs the baseline’s 2.58 M).Further ablation studies confirm the effectiveness of each proposed component,while visualization results highlight EHDC-YOLO’s superior performance in detecting objects and handling occlusions in complex drone scenarios.
基金supported by Institute of Information&Communications Technology Planning&Evaluation(IITP)under the Metaverse Support Program to Nurture the Best Talents(IITP-2024-RS-2023-00254529)grant funded by the Korea government(MSIT).
文摘Brain tumors require precise segmentation for diagnosis and treatment plans due to their complex morphology and heterogeneous characteristics.While MRI-based automatic brain tumor segmentation technology reduces the burden on medical staff and provides quantitative information,existing methodologies and recent models still struggle to accurately capture and classify the fine boundaries and diverse morphologies of tumors.In order to address these challenges and maximize the performance of brain tumor segmentation,this research introduces a novel SwinUNETR-based model by integrating a new decoder block,the Hierarchical Channel-wise Attention Decoder(HCAD),into a powerful SwinUNETR encoder.The HCAD decoder block utilizes hierarchical features and channelspecific attention mechanisms to further fuse information at different scales transmitted from the encoder and preserve spatial details throughout the reconstruction phase.Rigorous evaluations on the recent BraTS GLI datasets demonstrate that the proposed SwinHCAD model achieved superior and improved segmentation accuracy on both the Dice score and HD95 metrics across all tumor subregions(WT,TC,and ET)compared to baseline models.In particular,the rationale and contribution of the model design were clarified through ablation studies to verify the effectiveness of the proposed HCAD decoder block.The results of this study are expected to greatly contribute to enhancing the efficiency of clinical diagnosis and treatment planning by increasing the precision of automated brain tumor segmentation.