Zanthoxylum bungeanum Maxim,generally called prickly ash,is widely grown in China.Zanthoxylum rust is the main disease affecting the growth and quality of Zanthoxylum.Traditional method for recognizing the degree of i...Zanthoxylum bungeanum Maxim,generally called prickly ash,is widely grown in China.Zanthoxylum rust is the main disease affecting the growth and quality of Zanthoxylum.Traditional method for recognizing the degree of infection of Zanthoxylum rust mainly rely on manual experience.Due to the complex colors and shapes of rust areas,the accuracy of manual recognition is low and difficult to be quantified.In recent years,the application of artificial intelligence technology in the agricultural field has gradually increased.In this paper,based on the DeepLabV2 model,we proposed a Zanthoxylum rust image segmentation model based on the FASPP module and enhanced features of rust areas.This paper constructed a fine-grained Zanthoxylum rust image dataset.In this dataset,the Zanthoxylum rust image was segmented and labeled according to leaves,spore piles,and brown lesions.The experimental results showed that the Zanthoxylum rust image segmentation method proposed in this paper was effective.The segmentation accuracy rates of leaves,spore piles and brown lesions reached 99.66%,85.16%and 82.47%respectively.MPA reached 91.80%,and MIoU reached 84.99%.At the same time,the proposed image segmentation model also had good efficiency,which can process 22 images per minute.This article provides an intelligent method for efficiently and accurately recognizing the degree of infection of Zanthoxylum rust.展开更多
Images taken in dim environments frequently exhibit issues like insufficient brightness,noise,color shifts,and loss of detail.These problems pose significant challenges to dark image enhancement tasks.Current approach...Images taken in dim environments frequently exhibit issues like insufficient brightness,noise,color shifts,and loss of detail.These problems pose significant challenges to dark image enhancement tasks.Current approaches,while effective in global illumination modeling,often struggle to simultaneously suppress noise and preserve structural details,especially under heterogeneous lighting.Furthermore,misalignment between luminance and color channels introduces additional challenges to accurate enhancement.In response to the aforementioned difficulties,we introduce a single-stage framework,M2ATNet,using the multi-scale multi-attention and Transformer architecture.First,to address the problems of texture blurring and residual noise,we design a multi-scale multi-attention denoising module(MMAD),which is applied separately to the luminance and color channels to enhance the structural and texture modeling capabilities.Secondly,to solve the non-alignment problem of the luminance and color channels,we introduce the multi-channel feature fusion Transformer(CFFT)module,which effectively recovers the dark details and corrects the color shifts through cross-channel alignment and deep feature interaction.To guide the model to learn more stably and efficiently,we also fuse multiple types of loss functions to form a hybrid loss term.We extensively evaluate the proposed method on various standard datasets,including LOL-v1,LOL-v2,DICM,LIME,and NPE.Evaluation in terms of numerical metrics and visual quality demonstrate that M2ATNet consistently outperforms existing advanced approaches.Ablation studies further confirm the critical roles played by the MMAD and CFFT modules to detail preservation and visual fidelity under challenging illumination-deficient environments.展开更多
With the rapid expansion of drone applications,accurate detection of objects in aerial imagery has become crucial for intelligent transportation,urban management,and emergency rescue missions.However,existing methods ...With the rapid expansion of drone applications,accurate detection of objects in aerial imagery has become crucial for intelligent transportation,urban management,and emergency rescue missions.However,existing methods face numerous challenges in practical deployment,including scale variation handling,feature degradation,and complex backgrounds.To address these issues,we propose Edge-enhanced and Detail-Capturing You Only Look Once(EHDC-YOLO),a novel framework for object detection in Unmanned Aerial Vehicle(UAV)imagery.Based on the You Only Look Once version 11 nano(YOLOv11n)baseline,EHDC-YOLO systematically introduces several architectural enhancements:(1)a Multi-Scale Edge Enhancement(MSEE)module that leverages multi-scale pooling and edge information to enhance boundary feature extraction;(2)an Enhanced Feature Pyramid Network(EFPN)that integrates P2-level features with Cross Stage Partial(CSP)structures and OmniKernel convolutions for better fine-grained representation;and(3)Dynamic Head(DyHead)with multi-dimensional attention mechanisms for enhanced cross-scale modeling and perspective adaptability.Comprehensive experiments on the Vision meets Drones for Detection(VisDrone-DET)2019 dataset demonstrate that EHDC-YOLO achieves significant improvements,increasing mean Average Precision(mAP)@0.5 from 33.2%to 46.1%(an absolute improvement of 12.9 percentage points)and mAP@0.5:0.95 from 19.5%to 28.0%(an absolute improvement of 8.5 percentage points)compared with the YOLOv11n baseline,while maintaining a reasonable parameter count(2.81 M vs the baseline’s 2.58 M).Further ablation studies confirm the effectiveness of each proposed component,while visualization results highlight EHDC-YOLO’s superior performance in detecting objects and handling occlusions in complex drone scenarios.展开更多
Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to ...Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to the inability to effectively capture global information from images,CNNs can easily lead to loss of contours and textures in segmentation results.Notice that the transformer model can effectively capture the properties of long-range dependencies in the image,and furthermore,combining the CNN and the transformer can effectively extract local details and global contextual features of the image.Motivated by this,we propose a multi-branch and multi-scale attention network(M2ANet)for medical image segmentation,whose architecture consists of three components.Specifically,in the first component,we construct an adaptive multi-branch patch module for parallel extraction of image features to reduce information loss caused by downsampling.In the second component,we apply residual block to the well-known convolutional block attention module to enhance the network’s ability to recognize important features of images and alleviate the phenomenon of gradient vanishing.In the third component,we design a multi-scale feature fusion module,in which we adopt adaptive average pooling and position encoding to enhance contextual features,and then multi-head attention is introduced to further enrich feature representation.Finally,we validate the effectiveness and feasibility of the proposed M2ANet method through comparative experiments on four benchmark medical image segmentation datasets,particularly in the context of preserving contours and textures.展开更多
The fusion of infrared and visible images should emphasize the salient targets in the infrared image while preserving the textural details of the visible images.To meet these requirements,an autoencoder-based method f...The fusion of infrared and visible images should emphasize the salient targets in the infrared image while preserving the textural details of the visible images.To meet these requirements,an autoencoder-based method for infrared and visible image fusion is proposed.The encoder designed according to the optimization objective consists of a base encoder and a detail encoder,which is used to extract low-frequency and high-frequency information from the image.This extraction may lead to some information not being captured,so a compensation encoder is proposed to supplement the missing information.Multi-scale decomposition is also employed to extract image features more comprehensively.The decoder combines low-frequency,high-frequency and supplementary information to obtain multi-scale features.Subsequently,the attention strategy and fusion module are introduced to perform multi-scale fusion for image reconstruction.Experimental results on three datasets show that the fused images generated by this network effectively retain salient targets while being more consistent with human visual perception.展开更多
This paper aims to develop a nonrigid registration method of preoperative and intraoperative thoracoabdominal CT images in computer-assisted interventional surgeries for accurate tumor localization and tissue visualiz...This paper aims to develop a nonrigid registration method of preoperative and intraoperative thoracoabdominal CT images in computer-assisted interventional surgeries for accurate tumor localization and tissue visualization enhancement.However,fine structure registration of complex thoracoabdominal organs and large deformation registration caused by respiratory motion is challenging.To deal with this problem,we propose a 3D multi-scale attention VoxelMorph(MAVoxelMorph)registration network.To alleviate the large deformation problem,a multi-scale axial attention mechanism is utilized by using a residual dilated pyramid pooling for multi-scale feature extraction,and position-aware axial attention for long-distance dependencies between pixels capture.To further improve the large deformation and fine structure registration results,a multi-scale context channel attention mechanism is employed utilizing content information via adjacent encoding layers.Our method was evaluated on four public lung datasets(DIR-Lab dataset,Creatis dataset,Learn2Reg dataset,OASIS dataset)and a local dataset.Results proved that the proposed method achieved better registration performance than current state-of-the-art methods,especially in handling the registration of large deformations and fine structures.It also proved to be fast in 3D image registration,using about 1.5 s,and faster than most methods.Qualitative and quantitative assessments proved that the proposed MA-VoxelMorph has the potential to realize precise and fast tumor localization in clinical interventional surgeries.展开更多
This paper introduces a novel method for medical image retrieval and classification by integrating a multi-scale encoding mechanism with Vision Transformer(ViT)architectures and a dynamic multi-loss function.The multi...This paper introduces a novel method for medical image retrieval and classification by integrating a multi-scale encoding mechanism with Vision Transformer(ViT)architectures and a dynamic multi-loss function.The multi-scale encoding significantly enhances the model’s ability to capture both fine-grained and global features,while the dynamic loss function adapts during training to optimize classification accuracy and retrieval performance.Our approach was evaluated on the ISIC-2018 and ChestX-ray14 datasets,yielding notable improvements.Specifically,on the ISIC-2018 dataset,our method achieves an F1-Score improvement of+4.84% compared to the standard ViT,with a precision increase of+5.46% for melanoma(MEL).On the ChestX-ray14 dataset,the method delivers an F1-Score improvement of 5.3%over the conventional ViT,with precision gains of+5.0% for pneumonia(PNEU)and+5.4%for fibrosis(FIB).Experimental results demonstrate that our approach outperforms traditional CNN-based models and existing ViT variants,particularly in retrieving relevant medical cases and enhancing diagnostic accuracy.These findings highlight the potential of the proposedmethod for large-scalemedical image analysis,offering improved tools for clinical decision-making through superior classification and case comparison.展开更多
The application of image super-resolution(SR)has brought significant assistance in the medical field,aiding doctors to make more precise diagnoses.However,solely relying on a convolutional neural network(CNN)for image...The application of image super-resolution(SR)has brought significant assistance in the medical field,aiding doctors to make more precise diagnoses.However,solely relying on a convolutional neural network(CNN)for image SR may lead to issues such as blurry details and excessive smoothness.To address the limitations,we proposed an algorithm based on the generative adversarial network(GAN)framework.In the generator network,three different sizes of convolutions connected by a residual dense structure were used to extract detailed features,and an attention mechanism combined with dual channel and spatial information was applied to concentrate the computing power on crucial areas.In the discriminator network,using InstanceNorm to normalize tensors sped up the training process while retaining feature information.The experimental results demonstrate that our algorithm achieves higher peak signal-to-noise ratio(PSNR)and structural similarity index measure(SSIM)compared to other methods,resulting in an improved visual quality.展开更多
Multi-label image classification is a challenging task due to the diverse sizes and complex backgrounds of objects in images.Obtaining class-specific precise representations at different scales is a key aspect of feat...Multi-label image classification is a challenging task due to the diverse sizes and complex backgrounds of objects in images.Obtaining class-specific precise representations at different scales is a key aspect of feature representation.However,existing methods often rely on the single-scale deep feature,neglecting shallow and deeper layer features,which poses challenges when predicting objects of varying scales within the same image.Although some studies have explored multi-scale features,they rarely address the flow of information between scales or efficiently obtain class-specific precise representations for features at different scales.To address these issues,we propose a two-stage,three-branch Transformer-based framework.The first stage incorporates multi-scale image feature extraction and hierarchical scale attention.This design enables the model to consider objects at various scales while enhancing the flow of information across different feature scales,improving the model’s generalization to diverse object scales.The second stage includes a global feature enhancement module and a region selection module.The global feature enhancement module strengthens interconnections between different image regions,mitigating the issue of incomplete represen-tations,while the region selection module models the cross-modal relationships between image features and labels.Together,these components enable the efficient acquisition of class-specific precise feature representations.Extensive experiments on public datasets,including COCO2014,VOC2007,and VOC2012,demonstrate the effectiveness of our proposed method.Our approach achieves consistent performance gains of 0.3%,0.4%,and 0.2%over state-of-the-art methods on the three datasets,respectively.These results validate the reliability and superiority of our approach for multi-label image classification.展开更多
Multimodal image fusion plays an important role in image analysis and applications.Multimodal medical image fusion helps to combine contrast features from two or more input imaging modalities to represent fused inform...Multimodal image fusion plays an important role in image analysis and applications.Multimodal medical image fusion helps to combine contrast features from two or more input imaging modalities to represent fused information in a single image.One of the critical clinical applications of medical image fusion is to fuse anatomical and functional modalities for rapid diagnosis of malignant tissues.This paper proposes a multimodal medical image fusion network(MMIF-Net)based on multiscale hybrid attention.The method first decomposes the original image to obtain the low-rank and significant parts.Then,to utilize the features at different scales,we add amultiscalemechanism that uses three filters of different sizes to extract the features in the encoded network.Also,a hybrid attention module is introduced to obtain more image details.Finally,the fused images are reconstructed by decoding the network.We conducted experiments with clinical images from brain computed tomography/magnetic resonance.The experimental results show that the multimodal medical image fusion network method based on multiscale hybrid attention works better than other advanced fusion methods.展开更多
The Pressure Sensitive Paint Technique(PSP)has gained attention in recent years because of its significant benefits in measuring surface pressure on wind tunnel models.However,in the post-processing process of PSP ima...The Pressure Sensitive Paint Technique(PSP)has gained attention in recent years because of its significant benefits in measuring surface pressure on wind tunnel models.However,in the post-processing process of PSP images,issues such as pressure taps,paint peeling,and contamination can lead to the loss of pressure data on the image,which seriously affects the subsequent calculation and analysis of pressure distribution.Therefore,image inpainting is particularly important in the post-processing process of PSP images.Deep learning offers new methods for PSP image inpainting,but some basic characteristics of convolutional neural networks(CNNs)may limit their ability to handle restoration tasks.By contrast,the self-attention mechanism in the transformer can efficiently model nonlocal relationships among input features by generating adaptive attention scores.As a result,we propose an efficient transformer network model for the PSP image inpainting task,named multi-scale dilated attention transformer(D-former).The model utilizes the redundancy of global dependencies modeling in Vision Transformers(ViTs)to introducemulti-scale dilated attention(MDA),thismechanism effectivelymodels the interaction between localized and sparse patches within the shifted window,achieving a better balance between computational complexity and receptive field.As a result,D-former allows efficient modeling of long-range features while using fewer parameters and lower computational costs.The experiments on two public datasets and the PSP dataset indicate that the method in this article performs better compared to several advancedmethods.Through the verification of real wind tunnel tests,thismethod can accurately restore the luminescent intensity data of holes in PSP images,thereby improving the accuracy of full field pressure data,and has a promising future in practical applications.展开更多
Computer-aided diagnosis(CAD)can detect tuberculosis(TB)cases,providing radiologists with more accurate and efficient diagnostic solutions.Various noise information in TB chest X-ray(CXR)images is a major challenge in...Computer-aided diagnosis(CAD)can detect tuberculosis(TB)cases,providing radiologists with more accurate and efficient diagnostic solutions.Various noise information in TB chest X-ray(CXR)images is a major challenge in this classification task.This study aims to propose a model with high performance in TB CXR image detection named multi-scale input mirror network(MIM-Net)based on CXR image symmetry,which consists of a multi-scale input feature extraction network and mirror loss.The multi-scale image input can enhance feature extraction,while the mirror loss can improve the network performance through self-supervision.We used a publicly available TB CXR image classification dataset to evaluate our proposed method via 5-fold cross-validation,with accuracy,sensitivity,specificity,positive predictive value,negative predictive value,and area under curve(AUC)of 99.67%,100%,99.60%,99.80%,100%,and 0.9999,respectively.Compared to other models,MIM-Net performed best in all metrics.Therefore,the proposed MIM-Net can effectively help the network learn more features and can be used to detect TB in CXR images,thus assisting doctors in diagnosing.展开更多
Despite its remarkable performance on natural images,the segment anything model(SAM)lacks domain-specific information in medical imaging.and faces the challenge of losing local multi-scale information in the encoding ...Despite its remarkable performance on natural images,the segment anything model(SAM)lacks domain-specific information in medical imaging.and faces the challenge of losing local multi-scale information in the encoding phase.This paper presents a medical image segmentation model based on SAM with a local multi-scale feature encoder(LMSFE-SAM)to address the issues above.Firstly,based on the SAM,a local multi-scale feature encoder is introduced to improve the representation of features within local receptive field,thereby supplying the Vision Transformer(ViT)branch in SAM with enriched local multi-scale contextual information.At the same time,a multiaxial Hadamard product module(MHPM)is incorporated into the local multi-scale feature encoder in a lightweight manner to reduce the quadratic complexity and noise interference.Subsequently,a cross-branch balancing adapter is designed to balance the local and global information between the local multi-scale feature encoder and the ViT encoder in SAM.Finally,to obtain smaller input image size and to mitigate overlapping in patch embeddings,the size of the input image is reduced from 1024×1024 pixels to 256×256 pixels,and a multidimensional information adaptation component is developed,which includes feature adapters,position adapters,and channel-spatial adapters.This component effectively integrates the information from small-sized medical images into SAM,enhancing its suitability for clinical deployment.The proposed model demonstrates an average enhancement ranging from 0.0387 to 0.3191 across six objective evaluation metrics on BUSI,DDTI,and TN3K datasets compared to eight other representative image segmentation models.This significantly enhances the performance of the SAM on medical images,providing clinicians with a powerful tool in clinical diagnosis.展开更多
To improve image quality under low illumination conditions,a novel low-light image enhancement method is proposed in this paper based on multi-illumination estimation and multi-scale fusion(MIMS).Firstly,the illuminat...To improve image quality under low illumination conditions,a novel low-light image enhancement method is proposed in this paper based on multi-illumination estimation and multi-scale fusion(MIMS).Firstly,the illumination is processed by contrast-limited adaptive histogram equalization(CLAHE),adaptive complementary gamma function(ACG),and adaptive detail preserving S-curve(ADPS),respectively,to obtain three components.Then,the fusion-relevant features,exposure,and color contrast are selected as the weight maps.Subsequently,these components and weight maps are fused through multi-scale to generate enhanced illumination.Finally,the enhanced images are obtained by multiplying the enhanced illumination and reflectance.Compared with existing approaches,this proposed method achieves an average increase of 0.81%and 2.89%in the structural similarity index measurement(SSIM)and peak signal-to-noise ratio(PSNR),and a decrease of 6.17%and 32.61%in the natural image quality evaluator(NIQE)and gradient magnitude similarity deviation(GMSD),respectively.展开更多
Organoids possess immense potential for unraveling the intricate functions of human tissues and facilitating preclinical disease treatment.Their applications span from high-throughput drug screening to the modeling of...Organoids possess immense potential for unraveling the intricate functions of human tissues and facilitating preclinical disease treatment.Their applications span from high-throughput drug screening to the modeling of complex diseases,with some even achieving clinical translation.Changes in the overall size,shape,boundary,and other morphological features of organoids provide a noninvasive method for assessing organoid drug sensitivity.However,the precise segmentation of organoids in bright-field microscopy images is made difficult by the complexity of the organoid morphology and interference,including overlapping organoids,bubbles,dust particles,and cell fragments.This paper introduces the precision organoid segmentation technique(POST),which is a deep-learning algorithm for segmenting challenging organoids under simple bright-field imaging conditions.Unlike existing methods,POST accurately segments each organoid and eliminates various artifacts encountered during organoid culturing and imaging.Furthermore,it is sensitive to and aligns with measurements of organoid activity in drug sensitivity experiments.POST is expected to be a valuable tool for drug screening using organoids owing to its capability of automatically and rapidly eliminating interfering substances and thereby streamlining the organoid analysis and drug screening process.展开更多
Camouflaged Object Detection(COD)aims to identify objects that share highly similar patterns—such as texture,intensity,and color—with their surrounding environment.Due to their intrinsic resemblance to the backgroun...Camouflaged Object Detection(COD)aims to identify objects that share highly similar patterns—such as texture,intensity,and color—with their surrounding environment.Due to their intrinsic resemblance to the background,camouflaged objects often exhibit vague boundaries and varying scales,making it challenging to accurately locate targets and delineate their indistinct edges.To address this,we propose a novel camouflaged object detection network called Edge-Guided and Multi-scale Fusion Network(EGMFNet),which leverages edge-guided multi-scale integration for enhanced performance.The model incorporates two innovative components:a Multi-scale Fusion Module(MSFM)and an Edge-Guided Attention Module(EGA).These designs exploit multi-scale features to uncover subtle cues between candidate objects and the background while emphasizing camouflaged object boundaries.Moreover,recognizing the rich contextual information in fused features,we introduce a Dual-Branch Global Context Module(DGCM)to refine features using extensive global context,thereby generatingmore informative representations.Experimental results on four benchmark datasets demonstrate that EGMFNet outperforms state-of-the-art methods across five evaluation metrics.Specifically,on COD10K,our EGMFNet-P improves F_(β)by 4.8 points and reduces mean absolute error(MAE)by 0.006 compared with ZoomNeXt;on NC4K,it achieves a 3.6-point increase in F_(β).OnCAMO and CHAMELEON,it obtains 4.5-point increases in F_(β),respectively.These consistent gains substantiate the superiority and robustness of EGMFNet.展开更多
Background:Diabetic macular edema is a prevalent retinal condition and a leading cause of visual impairment among diabetic patients’Early detection of affected areas is beneficial for effective diagnosis and treatmen...Background:Diabetic macular edema is a prevalent retinal condition and a leading cause of visual impairment among diabetic patients’Early detection of affected areas is beneficial for effective diagnosis and treatment.Traditionally,diagnosis relies on optical coherence tomography imaging technology interpreted by ophthalmologists.However,this manual image interpretation is often slow and subjective.Therefore,developing automated segmentation for macular edema images is essential to enhance to improve the diagnosis efficiency and accuracy.Methods:In order to improve clinical diagnostic efficiency and accuracy,we proposed a SegNet network structure integrated with a convolutional block attention module(CBAM).This network introduces a multi-scale input module,the CBAM attention mechanism,and jump connection.The multi-scale input module enhances the network’s perceptual capabilities,while the lightweight CBAM effectively fuses relevant features across channels and spatial dimensions,allowing for better learning of varying information levels.Results:Experimental results demonstrate that the proposed network achieves an IoU of 80.127%and an accuracy of 99.162%.Compared to the traditional segmentation network,this model has fewer parameters,faster training and testing speed,and superior performance on semantic segmentation tasks,indicating its highly practical applicability.Conclusion:The C-SegNet proposed in this study enables accurate segmentation of Diabetic macular edema lesion images,which facilitates quicker diagnosis for healthcare professionals.展开更多
Recent advances in deep learning have significantly improved image deblurring;however,existing approaches still suffer from limited global context modeling,inadequate detail restoration,and poor texture or edge percep...Recent advances in deep learning have significantly improved image deblurring;however,existing approaches still suffer from limited global context modeling,inadequate detail restoration,and poor texture or edge perception,especially under complex dynamic blur.To address these challenges,we propose the Multi-Resolution Fusion Network(MRFNet),a blind multi-scale deblurring framework that integrates progressive residual connectivity for hierarchical feature fusion.The network employs a three-stage design:(1)TransformerBlocks capture long-range dependencies and reconstruct coarse global structures;(2)Nonlinear Activation Free Blocks(NAFBlocks)enhance local detail representation and mid-level feature fusion;and(3)an optimized residual subnetwork based on gated feature modulation refines texture and edge details for high-fidelity restoration.Extensive experiments demonstrate that MRFNet achieves superior performance compared to state-of-the-art methods.On GoPro,it attains 32.52 dB Peak Signal-to-Noise Ratio(PSNR)and 0.071 Learned Perceptual Image Patch Similarity(LPIPS),outperforming MIMOWNet(32.50 dB,0.075).On HIDE,it achieves 30.25 dB PSNR and 0.945 Structural Similarity Index Measure(SSIM),representing gains of+0.26 dB and+0.015 SSIM over MIMO-UNet(29.99 dB,0.930).On RealBlur-J,it reaches 28.82 dB PSNR and 0.872 SSIM,surpassing MIMO-UNet by+1.19 dB and+0.035 SSIM(27.63 dB,0.837).These results validate the effectiveness of the proposed progressive residual fusion and hybrid attention mechanisms in balancing global context understanding and local detail recovery for blind image deblurring.展开更多
Accurate and efficient detection of building changes in remote sensing imagery is crucial for urban planning,disaster emergency response,and resource management.However,existing methods face challenges such as spectra...Accurate and efficient detection of building changes in remote sensing imagery is crucial for urban planning,disaster emergency response,and resource management.However,existing methods face challenges such as spectral similarity between buildings and backgrounds,sensor variations,and insufficient computational efficiency.To address these challenges,this paper proposes a novel Multi-scale Efficient Wavelet-based Change Detection Network(MewCDNet),which integrates the advantages of Convolutional Neural Networks and Transformers,balances computational costs,and achieves high-performance building change detection.The network employs EfficientNet-B4 as the backbone for hierarchical feature extraction,integrates multi-level feature maps through a multi-scale fusion strategy,and incorporates two key modules:Cross-temporal Difference Detection(CTDD)and Cross-scale Wavelet Refinement(CSWR).CTDD adopts a dual-branch architecture that combines pixel-wise differencing with semanticaware Euclidean distance weighting to enhance the distinction between true changes and background noise.CSWR integrates Haar-based Discrete Wavelet Transform with multi-head cross-attention mechanisms,enabling cross-scale feature fusion while significantly improving edge localization and suppressing spurious changes.Extensive experiments on four benchmark datasets demonstrate MewCDNet’s superiority over comparison methods:achieving F1 scores of 91.54%on LEVIR,93.70%on WHUCD,and 64.96%on S2Looking for building change detection.Furthermore,MewCDNet exhibits optimal performance on the multi-class⋅SYSU dataset(F1:82.71%),highlighting its exceptional generalization capability.展开更多
基金This work was supported by Natural Science Foundation of China(Grant No.62071098)Sichuan Science and Technology Program(Grant Nos.2019YFG0191,2021YFG0307)Sichuan Zizhou Agricultural Science and Technology Co.,Ltd.project:Internet+smart Zanthoxylum planting weather risk warning system.
文摘Zanthoxylum bungeanum Maxim,generally called prickly ash,is widely grown in China.Zanthoxylum rust is the main disease affecting the growth and quality of Zanthoxylum.Traditional method for recognizing the degree of infection of Zanthoxylum rust mainly rely on manual experience.Due to the complex colors and shapes of rust areas,the accuracy of manual recognition is low and difficult to be quantified.In recent years,the application of artificial intelligence technology in the agricultural field has gradually increased.In this paper,based on the DeepLabV2 model,we proposed a Zanthoxylum rust image segmentation model based on the FASPP module and enhanced features of rust areas.This paper constructed a fine-grained Zanthoxylum rust image dataset.In this dataset,the Zanthoxylum rust image was segmented and labeled according to leaves,spore piles,and brown lesions.The experimental results showed that the Zanthoxylum rust image segmentation method proposed in this paper was effective.The segmentation accuracy rates of leaves,spore piles and brown lesions reached 99.66%,85.16%and 82.47%respectively.MPA reached 91.80%,and MIoU reached 84.99%.At the same time,the proposed image segmentation model also had good efficiency,which can process 22 images per minute.This article provides an intelligent method for efficiently and accurately recognizing the degree of infection of Zanthoxylum rust.
基金funded by the National Natural Science Foundation of China,grant numbers 52374156 and 62476005。
文摘Images taken in dim environments frequently exhibit issues like insufficient brightness,noise,color shifts,and loss of detail.These problems pose significant challenges to dark image enhancement tasks.Current approaches,while effective in global illumination modeling,often struggle to simultaneously suppress noise and preserve structural details,especially under heterogeneous lighting.Furthermore,misalignment between luminance and color channels introduces additional challenges to accurate enhancement.In response to the aforementioned difficulties,we introduce a single-stage framework,M2ATNet,using the multi-scale multi-attention and Transformer architecture.First,to address the problems of texture blurring and residual noise,we design a multi-scale multi-attention denoising module(MMAD),which is applied separately to the luminance and color channels to enhance the structural and texture modeling capabilities.Secondly,to solve the non-alignment problem of the luminance and color channels,we introduce the multi-channel feature fusion Transformer(CFFT)module,which effectively recovers the dark details and corrects the color shifts through cross-channel alignment and deep feature interaction.To guide the model to learn more stably and efficiently,we also fuse multiple types of loss functions to form a hybrid loss term.We extensively evaluate the proposed method on various standard datasets,including LOL-v1,LOL-v2,DICM,LIME,and NPE.Evaluation in terms of numerical metrics and visual quality demonstrate that M2ATNet consistently outperforms existing advanced approaches.Ablation studies further confirm the critical roles played by the MMAD and CFFT modules to detail preservation and visual fidelity under challenging illumination-deficient environments.
文摘With the rapid expansion of drone applications,accurate detection of objects in aerial imagery has become crucial for intelligent transportation,urban management,and emergency rescue missions.However,existing methods face numerous challenges in practical deployment,including scale variation handling,feature degradation,and complex backgrounds.To address these issues,we propose Edge-enhanced and Detail-Capturing You Only Look Once(EHDC-YOLO),a novel framework for object detection in Unmanned Aerial Vehicle(UAV)imagery.Based on the You Only Look Once version 11 nano(YOLOv11n)baseline,EHDC-YOLO systematically introduces several architectural enhancements:(1)a Multi-Scale Edge Enhancement(MSEE)module that leverages multi-scale pooling and edge information to enhance boundary feature extraction;(2)an Enhanced Feature Pyramid Network(EFPN)that integrates P2-level features with Cross Stage Partial(CSP)structures and OmniKernel convolutions for better fine-grained representation;and(3)Dynamic Head(DyHead)with multi-dimensional attention mechanisms for enhanced cross-scale modeling and perspective adaptability.Comprehensive experiments on the Vision meets Drones for Detection(VisDrone-DET)2019 dataset demonstrate that EHDC-YOLO achieves significant improvements,increasing mean Average Precision(mAP)@0.5 from 33.2%to 46.1%(an absolute improvement of 12.9 percentage points)and mAP@0.5:0.95 from 19.5%to 28.0%(an absolute improvement of 8.5 percentage points)compared with the YOLOv11n baseline,while maintaining a reasonable parameter count(2.81 M vs the baseline’s 2.58 M).Further ablation studies confirm the effectiveness of each proposed component,while visualization results highlight EHDC-YOLO’s superior performance in detecting objects and handling occlusions in complex drone scenarios.
基金supported by the Natural Science Foundation of the Anhui Higher Education Institutions of China(Grant Nos.2023AH040149 and 2024AH051915)the Anhui Provincial Natural Science Foundation(Grant No.2208085MF168)+1 种基金the Science and Technology Innovation Tackle Plan Project of Maanshan(Grant No.2024RGZN001)the Scientific Research Fund Project of Anhui Medical University(Grant No.2023xkj122).
文摘Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to the inability to effectively capture global information from images,CNNs can easily lead to loss of contours and textures in segmentation results.Notice that the transformer model can effectively capture the properties of long-range dependencies in the image,and furthermore,combining the CNN and the transformer can effectively extract local details and global contextual features of the image.Motivated by this,we propose a multi-branch and multi-scale attention network(M2ANet)for medical image segmentation,whose architecture consists of three components.Specifically,in the first component,we construct an adaptive multi-branch patch module for parallel extraction of image features to reduce information loss caused by downsampling.In the second component,we apply residual block to the well-known convolutional block attention module to enhance the network’s ability to recognize important features of images and alleviate the phenomenon of gradient vanishing.In the third component,we design a multi-scale feature fusion module,in which we adopt adaptive average pooling and position encoding to enhance contextual features,and then multi-head attention is introduced to further enrich feature representation.Finally,we validate the effectiveness and feasibility of the proposed M2ANet method through comparative experiments on four benchmark medical image segmentation datasets,particularly in the context of preserving contours and textures.
基金Supported by the Henan Province Key Research and Development Project(231111211300)the Central Government of Henan Province Guides Local Science and Technology Development Funds(Z20231811005)+2 种基金Henan Province Key Research and Development Project(231111110100)Henan Provincial Outstanding Foreign Scientist Studio(GZS2024006)Henan Provincial Joint Fund for Scientific and Technological Research and Development Plan(Application and Overcoming Technical Barriers)(242103810028)。
文摘The fusion of infrared and visible images should emphasize the salient targets in the infrared image while preserving the textural details of the visible images.To meet these requirements,an autoencoder-based method for infrared and visible image fusion is proposed.The encoder designed according to the optimization objective consists of a base encoder and a detail encoder,which is used to extract low-frequency and high-frequency information from the image.This extraction may lead to some information not being captured,so a compensation encoder is proposed to supplement the missing information.Multi-scale decomposition is also employed to extract image features more comprehensively.The decoder combines low-frequency,high-frequency and supplementary information to obtain multi-scale features.Subsequently,the attention strategy and fusion module are introduced to perform multi-scale fusion for image reconstruction.Experimental results on three datasets show that the fused images generated by this network effectively retain salient targets while being more consistent with human visual perception.
基金supported in part by the National Natural Science Foundation of China[62301374]Hubei Provincial Natural Science Foundation of China[2022CFB804]+2 种基金Hubei Provincial Education Research Project[B2022057]the Youths Science Foundation of Wuhan Institute of Technology[K202240]the 15th Graduate Education Innovation Fund of Wuhan Institute of Technology[CX2023295].
文摘This paper aims to develop a nonrigid registration method of preoperative and intraoperative thoracoabdominal CT images in computer-assisted interventional surgeries for accurate tumor localization and tissue visualization enhancement.However,fine structure registration of complex thoracoabdominal organs and large deformation registration caused by respiratory motion is challenging.To deal with this problem,we propose a 3D multi-scale attention VoxelMorph(MAVoxelMorph)registration network.To alleviate the large deformation problem,a multi-scale axial attention mechanism is utilized by using a residual dilated pyramid pooling for multi-scale feature extraction,and position-aware axial attention for long-distance dependencies between pixels capture.To further improve the large deformation and fine structure registration results,a multi-scale context channel attention mechanism is employed utilizing content information via adjacent encoding layers.Our method was evaluated on four public lung datasets(DIR-Lab dataset,Creatis dataset,Learn2Reg dataset,OASIS dataset)and a local dataset.Results proved that the proposed method achieved better registration performance than current state-of-the-art methods,especially in handling the registration of large deformations and fine structures.It also proved to be fast in 3D image registration,using about 1.5 s,and faster than most methods.Qualitative and quantitative assessments proved that the proposed MA-VoxelMorph has the potential to realize precise and fast tumor localization in clinical interventional surgeries.
基金funded by the Deanship of Research and Graduate Studies at King Khalid University through small group research under grant number RGP1/278/45.
文摘This paper introduces a novel method for medical image retrieval and classification by integrating a multi-scale encoding mechanism with Vision Transformer(ViT)architectures and a dynamic multi-loss function.The multi-scale encoding significantly enhances the model’s ability to capture both fine-grained and global features,while the dynamic loss function adapts during training to optimize classification accuracy and retrieval performance.Our approach was evaluated on the ISIC-2018 and ChestX-ray14 datasets,yielding notable improvements.Specifically,on the ISIC-2018 dataset,our method achieves an F1-Score improvement of+4.84% compared to the standard ViT,with a precision increase of+5.46% for melanoma(MEL).On the ChestX-ray14 dataset,the method delivers an F1-Score improvement of 5.3%over the conventional ViT,with precision gains of+5.0% for pneumonia(PNEU)and+5.4%for fibrosis(FIB).Experimental results demonstrate that our approach outperforms traditional CNN-based models and existing ViT variants,particularly in retrieving relevant medical cases and enhancing diagnostic accuracy.These findings highlight the potential of the proposedmethod for large-scalemedical image analysis,offering improved tools for clinical decision-making through superior classification and case comparison.
文摘The application of image super-resolution(SR)has brought significant assistance in the medical field,aiding doctors to make more precise diagnoses.However,solely relying on a convolutional neural network(CNN)for image SR may lead to issues such as blurry details and excessive smoothness.To address the limitations,we proposed an algorithm based on the generative adversarial network(GAN)framework.In the generator network,three different sizes of convolutions connected by a residual dense structure were used to extract detailed features,and an attention mechanism combined with dual channel and spatial information was applied to concentrate the computing power on crucial areas.In the discriminator network,using InstanceNorm to normalize tensors sped up the training process while retaining feature information.The experimental results demonstrate that our algorithm achieves higher peak signal-to-noise ratio(PSNR)and structural similarity index measure(SSIM)compared to other methods,resulting in an improved visual quality.
基金supported by the National Natural Science Foundation of China(62302167,62477013)Natural Science Foundation of Shanghai(No.24ZR1456100)+1 种基金Science and Technology Commission of Shanghai Municipality(No.24DZ2305900)the Shanghai Municipal Special Fund for Promoting High-Quality Development of Industries(2211106).
文摘Multi-label image classification is a challenging task due to the diverse sizes and complex backgrounds of objects in images.Obtaining class-specific precise representations at different scales is a key aspect of feature representation.However,existing methods often rely on the single-scale deep feature,neglecting shallow and deeper layer features,which poses challenges when predicting objects of varying scales within the same image.Although some studies have explored multi-scale features,they rarely address the flow of information between scales or efficiently obtain class-specific precise representations for features at different scales.To address these issues,we propose a two-stage,three-branch Transformer-based framework.The first stage incorporates multi-scale image feature extraction and hierarchical scale attention.This design enables the model to consider objects at various scales while enhancing the flow of information across different feature scales,improving the model’s generalization to diverse object scales.The second stage includes a global feature enhancement module and a region selection module.The global feature enhancement module strengthens interconnections between different image regions,mitigating the issue of incomplete represen-tations,while the region selection module models the cross-modal relationships between image features and labels.Together,these components enable the efficient acquisition of class-specific precise feature representations.Extensive experiments on public datasets,including COCO2014,VOC2007,and VOC2012,demonstrate the effectiveness of our proposed method.Our approach achieves consistent performance gains of 0.3%,0.4%,and 0.2%over state-of-the-art methods on the three datasets,respectively.These results validate the reliability and superiority of our approach for multi-label image classification.
基金supported by Qingdao Huanghai University School-Level ScientificResearch Project(2023KJ14)Undergraduate Teaching Reform Research Project of Shandong Provincial Department of Education(M2022328)+1 种基金National Natural Science Foundation of China under Grant(42472324)Qingdao Postdoctoral Foundation under Grant(QDBSH202402049).
文摘Multimodal image fusion plays an important role in image analysis and applications.Multimodal medical image fusion helps to combine contrast features from two or more input imaging modalities to represent fused information in a single image.One of the critical clinical applications of medical image fusion is to fuse anatomical and functional modalities for rapid diagnosis of malignant tissues.This paper proposes a multimodal medical image fusion network(MMIF-Net)based on multiscale hybrid attention.The method first decomposes the original image to obtain the low-rank and significant parts.Then,to utilize the features at different scales,we add amultiscalemechanism that uses three filters of different sizes to extract the features in the encoded network.Also,a hybrid attention module is introduced to obtain more image details.Finally,the fused images are reconstructed by decoding the network.We conducted experiments with clinical images from brain computed tomography/magnetic resonance.The experimental results show that the multimodal medical image fusion network method based on multiscale hybrid attention works better than other advanced fusion methods.
基金partly supported by the National Natural Science Foundation of China under Grant 12202476,author Chunhua Wei,https://www.nsfc.gov.cn/.
文摘The Pressure Sensitive Paint Technique(PSP)has gained attention in recent years because of its significant benefits in measuring surface pressure on wind tunnel models.However,in the post-processing process of PSP images,issues such as pressure taps,paint peeling,and contamination can lead to the loss of pressure data on the image,which seriously affects the subsequent calculation and analysis of pressure distribution.Therefore,image inpainting is particularly important in the post-processing process of PSP images.Deep learning offers new methods for PSP image inpainting,but some basic characteristics of convolutional neural networks(CNNs)may limit their ability to handle restoration tasks.By contrast,the self-attention mechanism in the transformer can efficiently model nonlocal relationships among input features by generating adaptive attention scores.As a result,we propose an efficient transformer network model for the PSP image inpainting task,named multi-scale dilated attention transformer(D-former).The model utilizes the redundancy of global dependencies modeling in Vision Transformers(ViTs)to introducemulti-scale dilated attention(MDA),thismechanism effectivelymodels the interaction between localized and sparse patches within the shifted window,achieving a better balance between computational complexity and receptive field.As a result,D-former allows efficient modeling of long-range features while using fewer parameters and lower computational costs.The experiments on two public datasets and the PSP dataset indicate that the method in this article performs better compared to several advancedmethods.Through the verification of real wind tunnel tests,thismethod can accurately restore the luminescent intensity data of holes in PSP images,thereby improving the accuracy of full field pressure data,and has a promising future in practical applications.
基金supported by the Joint Fund of the Ministry of Education for Equipment Pre-research(No.8091B0203)National Key Research and Development Program of China(No.2020YFC2008700)。
文摘Computer-aided diagnosis(CAD)can detect tuberculosis(TB)cases,providing radiologists with more accurate and efficient diagnostic solutions.Various noise information in TB chest X-ray(CXR)images is a major challenge in this classification task.This study aims to propose a model with high performance in TB CXR image detection named multi-scale input mirror network(MIM-Net)based on CXR image symmetry,which consists of a multi-scale input feature extraction network and mirror loss.The multi-scale image input can enhance feature extraction,while the mirror loss can improve the network performance through self-supervision.We used a publicly available TB CXR image classification dataset to evaluate our proposed method via 5-fold cross-validation,with accuracy,sensitivity,specificity,positive predictive value,negative predictive value,and area under curve(AUC)of 99.67%,100%,99.60%,99.80%,100%,and 0.9999,respectively.Compared to other models,MIM-Net performed best in all metrics.Therefore,the proposed MIM-Net can effectively help the network learn more features and can be used to detect TB in CXR images,thus assisting doctors in diagnosing.
基金supported by Natural Science Foundation Programme of Gansu Province(No.24JRRA231)National Natural Science Foundation of China(No.62061023)Gansu Provincial Science and Technology Plan Key Research and Development Program Project(No.24YFFA024).
文摘Despite its remarkable performance on natural images,the segment anything model(SAM)lacks domain-specific information in medical imaging.and faces the challenge of losing local multi-scale information in the encoding phase.This paper presents a medical image segmentation model based on SAM with a local multi-scale feature encoder(LMSFE-SAM)to address the issues above.Firstly,based on the SAM,a local multi-scale feature encoder is introduced to improve the representation of features within local receptive field,thereby supplying the Vision Transformer(ViT)branch in SAM with enriched local multi-scale contextual information.At the same time,a multiaxial Hadamard product module(MHPM)is incorporated into the local multi-scale feature encoder in a lightweight manner to reduce the quadratic complexity and noise interference.Subsequently,a cross-branch balancing adapter is designed to balance the local and global information between the local multi-scale feature encoder and the ViT encoder in SAM.Finally,to obtain smaller input image size and to mitigate overlapping in patch embeddings,the size of the input image is reduced from 1024×1024 pixels to 256×256 pixels,and a multidimensional information adaptation component is developed,which includes feature adapters,position adapters,and channel-spatial adapters.This component effectively integrates the information from small-sized medical images into SAM,enhancing its suitability for clinical deployment.The proposed model demonstrates an average enhancement ranging from 0.0387 to 0.3191 across six objective evaluation metrics on BUSI,DDTI,and TN3K datasets compared to eight other representative image segmentation models.This significantly enhances the performance of the SAM on medical images,providing clinicians with a powerful tool in clinical diagnosis.
基金supported by the National Key R&D Program of China(No.2022YFB3205101)NSAF(No.U2230116)。
文摘To improve image quality under low illumination conditions,a novel low-light image enhancement method is proposed in this paper based on multi-illumination estimation and multi-scale fusion(MIMS).Firstly,the illumination is processed by contrast-limited adaptive histogram equalization(CLAHE),adaptive complementary gamma function(ACG),and adaptive detail preserving S-curve(ADPS),respectively,to obtain three components.Then,the fusion-relevant features,exposure,and color contrast are selected as the weight maps.Subsequently,these components and weight maps are fused through multi-scale to generate enhanced illumination.Finally,the enhanced images are obtained by multiplying the enhanced illumination and reflectance.Compared with existing approaches,this proposed method achieves an average increase of 0.81%and 2.89%in the structural similarity index measurement(SSIM)and peak signal-to-noise ratio(PSNR),and a decrease of 6.17%and 32.61%in the natural image quality evaluator(NIQE)and gradient magnitude similarity deviation(GMSD),respectively.
基金supported by the National Key R&D Program of China(No.2022YFC2504403)the National Natural Science Foundation of China(No.62172202)+1 种基金the Experiment Project of China Manned Space Program(No.HYZHXM01019)the Fundamental Research Funds for the Central Universities from Southeast University(No.3207032101C3)。
文摘Organoids possess immense potential for unraveling the intricate functions of human tissues and facilitating preclinical disease treatment.Their applications span from high-throughput drug screening to the modeling of complex diseases,with some even achieving clinical translation.Changes in the overall size,shape,boundary,and other morphological features of organoids provide a noninvasive method for assessing organoid drug sensitivity.However,the precise segmentation of organoids in bright-field microscopy images is made difficult by the complexity of the organoid morphology and interference,including overlapping organoids,bubbles,dust particles,and cell fragments.This paper introduces the precision organoid segmentation technique(POST),which is a deep-learning algorithm for segmenting challenging organoids under simple bright-field imaging conditions.Unlike existing methods,POST accurately segments each organoid and eliminates various artifacts encountered during organoid culturing and imaging.Furthermore,it is sensitive to and aligns with measurements of organoid activity in drug sensitivity experiments.POST is expected to be a valuable tool for drug screening using organoids owing to its capability of automatically and rapidly eliminating interfering substances and thereby streamlining the organoid analysis and drug screening process.
基金financially supported byChongqingUniversity of Technology Graduate Innovation Foundation(Grant No.gzlcx20253267).
文摘Camouflaged Object Detection(COD)aims to identify objects that share highly similar patterns—such as texture,intensity,and color—with their surrounding environment.Due to their intrinsic resemblance to the background,camouflaged objects often exhibit vague boundaries and varying scales,making it challenging to accurately locate targets and delineate their indistinct edges.To address this,we propose a novel camouflaged object detection network called Edge-Guided and Multi-scale Fusion Network(EGMFNet),which leverages edge-guided multi-scale integration for enhanced performance.The model incorporates two innovative components:a Multi-scale Fusion Module(MSFM)and an Edge-Guided Attention Module(EGA).These designs exploit multi-scale features to uncover subtle cues between candidate objects and the background while emphasizing camouflaged object boundaries.Moreover,recognizing the rich contextual information in fused features,we introduce a Dual-Branch Global Context Module(DGCM)to refine features using extensive global context,thereby generatingmore informative representations.Experimental results on four benchmark datasets demonstrate that EGMFNet outperforms state-of-the-art methods across five evaluation metrics.Specifically,on COD10K,our EGMFNet-P improves F_(β)by 4.8 points and reduces mean absolute error(MAE)by 0.006 compared with ZoomNeXt;on NC4K,it achieves a 3.6-point increase in F_(β).OnCAMO and CHAMELEON,it obtains 4.5-point increases in F_(β),respectively.These consistent gains substantiate the superiority and robustness of EGMFNet.
基金supported by the Guangdong Pharmaceutical University 2024 Higher Education Research Projects(GKP202403,GMP202402)the Guangdong Pharmaceutical University College Students’Innovation and Entrepreneurship Training Programs(Grant No.202504302033,202504302034,202504302036,and 202504302244).
文摘Background:Diabetic macular edema is a prevalent retinal condition and a leading cause of visual impairment among diabetic patients’Early detection of affected areas is beneficial for effective diagnosis and treatment.Traditionally,diagnosis relies on optical coherence tomography imaging technology interpreted by ophthalmologists.However,this manual image interpretation is often slow and subjective.Therefore,developing automated segmentation for macular edema images is essential to enhance to improve the diagnosis efficiency and accuracy.Methods:In order to improve clinical diagnostic efficiency and accuracy,we proposed a SegNet network structure integrated with a convolutional block attention module(CBAM).This network introduces a multi-scale input module,the CBAM attention mechanism,and jump connection.The multi-scale input module enhances the network’s perceptual capabilities,while the lightweight CBAM effectively fuses relevant features across channels and spatial dimensions,allowing for better learning of varying information levels.Results:Experimental results demonstrate that the proposed network achieves an IoU of 80.127%and an accuracy of 99.162%.Compared to the traditional segmentation network,this model has fewer parameters,faster training and testing speed,and superior performance on semantic segmentation tasks,indicating its highly practical applicability.Conclusion:The C-SegNet proposed in this study enables accurate segmentation of Diabetic macular edema lesion images,which facilitates quicker diagnosis for healthcare professionals.
基金funded by Qinghai University Postgraduate Research and Practice Innovation Program of Funder,grant number 2025-GMKY-42.
文摘Recent advances in deep learning have significantly improved image deblurring;however,existing approaches still suffer from limited global context modeling,inadequate detail restoration,and poor texture or edge perception,especially under complex dynamic blur.To address these challenges,we propose the Multi-Resolution Fusion Network(MRFNet),a blind multi-scale deblurring framework that integrates progressive residual connectivity for hierarchical feature fusion.The network employs a three-stage design:(1)TransformerBlocks capture long-range dependencies and reconstruct coarse global structures;(2)Nonlinear Activation Free Blocks(NAFBlocks)enhance local detail representation and mid-level feature fusion;and(3)an optimized residual subnetwork based on gated feature modulation refines texture and edge details for high-fidelity restoration.Extensive experiments demonstrate that MRFNet achieves superior performance compared to state-of-the-art methods.On GoPro,it attains 32.52 dB Peak Signal-to-Noise Ratio(PSNR)and 0.071 Learned Perceptual Image Patch Similarity(LPIPS),outperforming MIMOWNet(32.50 dB,0.075).On HIDE,it achieves 30.25 dB PSNR and 0.945 Structural Similarity Index Measure(SSIM),representing gains of+0.26 dB and+0.015 SSIM over MIMO-UNet(29.99 dB,0.930).On RealBlur-J,it reaches 28.82 dB PSNR and 0.872 SSIM,surpassing MIMO-UNet by+1.19 dB and+0.035 SSIM(27.63 dB,0.837).These results validate the effectiveness of the proposed progressive residual fusion and hybrid attention mechanisms in balancing global context understanding and local detail recovery for blind image deblurring.
基金supported by the Henan Province Key R&D Project under Grant 241111210400the Henan Provincial Science and Technology Research Project under Grants 252102211047,252102211062,252102211055 and 232102210069+2 种基金the Jiangsu Provincial Scheme Double Initiative Plan JSS-CBS20230474,the XJTLU RDF-21-02-008the Science and Technology Innovation Project of Zhengzhou University of Light Industry under Grant 23XNKJTD0205the Higher Education Teaching Reform Research and Practice Project of Henan Province under Grant 2024SJGLX0126。
文摘Accurate and efficient detection of building changes in remote sensing imagery is crucial for urban planning,disaster emergency response,and resource management.However,existing methods face challenges such as spectral similarity between buildings and backgrounds,sensor variations,and insufficient computational efficiency.To address these challenges,this paper proposes a novel Multi-scale Efficient Wavelet-based Change Detection Network(MewCDNet),which integrates the advantages of Convolutional Neural Networks and Transformers,balances computational costs,and achieves high-performance building change detection.The network employs EfficientNet-B4 as the backbone for hierarchical feature extraction,integrates multi-level feature maps through a multi-scale fusion strategy,and incorporates two key modules:Cross-temporal Difference Detection(CTDD)and Cross-scale Wavelet Refinement(CSWR).CTDD adopts a dual-branch architecture that combines pixel-wise differencing with semanticaware Euclidean distance weighting to enhance the distinction between true changes and background noise.CSWR integrates Haar-based Discrete Wavelet Transform with multi-head cross-attention mechanisms,enabling cross-scale feature fusion while significantly improving edge localization and suppressing spurious changes.Extensive experiments on four benchmark datasets demonstrate MewCDNet’s superiority over comparison methods:achieving F1 scores of 91.54%on LEVIR,93.70%on WHUCD,and 64.96%on S2Looking for building change detection.Furthermore,MewCDNet exhibits optimal performance on the multi-class⋅SYSU dataset(F1:82.71%),highlighting its exceptional generalization capability.