融合通道注意力的跨尺度Transformer图像超分辨率重建被引量：1

Cross-scale Transformer image super-resolution reconstruction with fusion channel attention

导出

摘要目的针对在超分辨率任务中,Transformer模型存在特征提取模式单一、重建图像高频细节丢失和结构失真的问题,提出了一种融合通道注意力的跨尺度Transformer图像超分辨率重建模型。方法模型由4个模块组成:浅层特征提取、跨尺度深层特征提取、多级特征融合以及高质量重建模块。浅层特征提取利用卷积处理早期图像,获得更稳定的输出;跨尺度深层特征提取利用跨尺度Transformer和强化通道注意力机制,扩大感受野并通过加权筛选提取不同尺度特征以便融合;多级特征融合模块利用强化通道注意力机制,实现对不同尺度特征通道权重的动态调整,促进模型对丰富上下文信息的学习,增强模型在图像超分辨率重建任务中的能力。结果在Set5、Set14、BSD100(Berkeley segmentation dataset 100)、Urban100(urban scene 100)和Manga109标准数据集上的模型评估结果表明,相较于SwinIR超分辨率模型,所提模型在峰值信噪比上提高了0.06~0.25 dB,且重建图像视觉效果更好。结论提出的融合通道注意力的跨尺度Transformer图像超分辨率重建模型,通过融合卷积特征与Transformer特征,并利用强化通道注意力机制减少图像中噪声和冗余信息,降低模型产生图像模糊失真的可能性,图像超分辨率性能有效提升,在多个公共实验数据集的测试结果验证了所提模型的有效性。 Objective The image super-resolution reconstruction technique refers to a method for converting low-resolution(LR)images to high-resolution(HR)images within the same scene.In recent years,this technique has been widely used in computer vision,image processing,and other fields due to its wide practical application value and far-reaching theoretical importance.Although the model based on convolutional neural networks has made remarkable progress,most superresolution network structures remain in a single-layer level end-to-end format to improve the reconstruction performance.This approach often overlooks the multilayer level feature information during the network reconstruction process,limiting the reconstruction performance of the model.With the advancement of deep learning technology,Transformer-based network architectures have been introduced into the field of computer vision,yielding substantial results.Researchers have applied Transform models to underlying vision tasks,including image super-resolution reconstruction.However,in this context,the Transformer model suffers from a single feature extraction pattern,loss of high-frequency details in the reconstructed image,and structural distortion.A cross-scale Transformer image super-resolution reconstruction model with fusion channel attention is proposed to address these problems.Method The model comprises the following four modules:shallow feature extraction,cross-scale deep feature extraction,multilevel feature fusion,and a high-quality reconstruction module.Shallow feature extraction uses convolution to process early images to obtain highly stable outputs,and the convolutional layer can provide stable optimization and extraction results during early visual feature processing.The cross-scale deep feature extraction module uses the cross-scale Transformer and the enhanced channel attention mechanism to acquire features at different scales.The core of the cross-scale Transformer lies in the cross-scale self-attention mechanism and the gated convolutional feedforward network,which down samples the feature maps to different scales by scale factors and learns contextual information using image self-similarity,and the gated convolutional network encodes spatial neighboring pixel position information and helps learn the local image structure,replacing the feedforward network in the traditional Transformer.A reinforced channel attention mechanism is used after the cross-scale Transformer to expand the sensory field and extract different scale features to replace the original features via weighted filtering for backward propagation.Increasing the depth of the network will lead to saturation.Thus,the number of residual cross-scale Transformer blocks is set to 3 to maintain a balance between model complexity and super-resolution reconstruction performance.After stacking different scale features in the multilevel feature fusion module,the enhanced channel attention mechanism is used to dynamically adjust the channel weights of different scale features and learn rich contextual information,thereby enhancing the network reconstruction capability.In the high-quality reconstruction module,convolutional layers and pixel blending methods are used to up-sample features to the corresponding dimensions of high-resolution images.In the training phase,the model is trained using 900 HR images from the DIV2K dataset,and the corresponding LR images are generated from the HR images using double-triple downsampling(with downsampling multiples of×2,×3 and×4).The network is optimized using Adam’s algorithm with L1loss as the loss function.Result Tests on five standard datasets,namely,Set5,Set14,BSD100,Urban100,and Manga109,are performed,and the performance of the proposed model is compared with 10 state-of-the-art models.These models include the following:enhanced deep residual networks for single image superresolution(EDSR),residual channel attention networks(RCAN),second-order attention network(SAN),cross-scale non-local attention(CSNLA),the cross-scale internal graph neural network(IGNN),holistic attention network(HAN),non-local sparse attention(NLSA),image restoration using Swin Transformer(SwinIR),efficient long-range attention network(ELAN),and permuted self-attention(SRFormer).Peak signal-to-noise ratio(PSNR)and structural similarity(SSIM)are used as metrics to measure the performance of these methods.Humans are very sensitive to the brightness of an image;therefore,these metrics are measured in the Y-channel of the image.Experimental results show that the proposed model obtains high PSNR and SSIM values and recovers additional detailed information and highly accurate textures at magnification factors of×2,×3,and×4.The proposed method improves 0.13~0.25 dB over SwinIR and 0.07~0.21 dB over ELAN on the Urban100 dataset and 0.07~0.21 dB over SwinIR and 0.06~0.19 dB over ELAN on the Manga109 dataset.The localized attribution map(LAM)is used to further explore the model performance.The experimental results revealed that the proposed model can utilize a wider range of pixel information,and the proposed model exhibits a higher diffusion index(DI)compared to SwinIR,proving the effectiveness of the proposed model from the interpretability viewpoint.Conclusion The proposed cross-scale Transformer image super-resolution reconstruction model with multilevel fusion channel attention reduces noise and redundant information in the image by fusing convolutional features with Transformer features.This model also uses a strengthened channel attention mechanism to reduce the likelihood of image blurring and distortion in the model,and the image super-resolution performance is effectively improved.The test results verify the effectiveness of the multi-tip model in numerous public experimental datasets.The model visually obtains a reconstructed image that is sharper and closer to the real image with fewer artefacts.

作者李焱董仕豪张家伟赵茹郑钰辉 Li Yan;Dong Shihao;Zhang Jiawei;Zhao Ru;Zheng Yuhui(School of Computer Science,Nanjing University of Information Science and Technology,Nanjing 210044,China;School of Journalism and Communication,Northwest University,Xi’an 710127,China)

机构地区南京信息工程大学计算机学院西北大学新闻传播学院

出处《中国图象图形学报》北大核心 2025年第3期784-797,共14页 Journal of Image and Graphics

基金国家自然科学基金项目(U20B2065)。

关键词图像超分辨率跨尺度Transformer 通道注意力机制特征融合深度学习 image super-resolution cross-scale Transformer channel attention mechanism feature fusion deep learning

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献1

1熊巍,熊承义,高志荣,陈文旗,郑瑞华,田金文.通道注意力嵌入的Transformer图像超分辨率重构[J].中国图象图形学报,2023,28(12):3744-3757. 被引量：11

二级参考文献2

1雷鹏程,刘丛,唐坚刚,彭敦陆.分层特征融合注意力网络图像超分辨率重建[J].中国图象图形学报,2020,25(9):1773-1786. 被引量：18
2蒋梦洁,钱文华,徐丹,吴昊,柳春宇.残差密集结构的东巴画渐进式重建[J].中国图象图形学报,2022,27(4):1084-1096. 被引量：4

共引文献10

1张伟,张俊杰,宋杰,吕圣,王生怀.基于改进SwinIR的条纹图去噪方法[J].电子测量技术,2023,46(23):105-111. 被引量：1
2王琦,张涛,徐超炜,卢梦凡,王子辰.多尺度注意力融合与视觉Transformer方法优化的电阻抗层析成像深度学习方法[J].仪器仪表学报,2024,45(7):52-63. 被引量：9
3徐晗,董仕豪,张家伟,郑钰辉.融合上下文感知注意力的Transformer目标跟踪方法[J].中国图象图形学报,2025,30(1):212-224. 被引量：1
4徐正国,普碧才,秦建明,项炎平,彭振江,宋纯锋.面向高清人体图像生成的数据基准与模型框架[J].中国图象图形学报,2025,30(2):375-390.
5刘玉玖.基于动态时间规整算法的三维数字雕塑图像平面化重构[J].广东通信技术,2025,45(3):64-67.
6倪劼,柳青远,周莉.利用改进的Real-ESRGAN模型进行历史图像超分辨率重建研究[J].信息与管理研究,2025,10(1):65-77. 被引量：1
7宋霄罡,张鹏飞,刘万波,鲁晓锋,黑新宏.多尺度大核注意力特征融合网络的图像超分辨率重建[J].中国图象图形学报,2025,30(4):1084-1099.
8韦炎炎,毛天一,李柏昂,王飞,李锋,张召,赵洋.视觉模型及多模态大模型推进图像复原增强研究进展[J].中国图象图形学报,2025,30(5):1197-1219. 被引量：3
9刘烨,鲍娜,曹克让,陈吉,王星.连续测试场景中退化图像的动态自适应超分辨率[J].中国图象图形学报,2025,30(8):2645-2659.
10肖杰,范子豪,李东,傅雪阳,查正军.面向图像复原的非因果选择性状态空间模型[J].中国图象图形学报,2025,30(10):3173-3186.

同被引文献20

1邓向武,齐龙,马旭,蒋郁,陈学深,刘海云,陈伟烽.基于多特征融合和深度置信网络的稻田苗期杂草识别[J].农业工程学报,2018,34(14):165-172. 被引量：64
2孙俊,谭文军,武小红,沈继锋,芦兵,戴春霞.多通道深度可分离卷积模型实时识别复杂背景下甜菜与杂草[J].农业工程学报,2019,35(12):184-190. 被引量：36
3亢洁,刘港,郭国法.基于多尺度融合模块和特征增强的杂草检测方法[J].农业机械学报,2022,53(4):254-260. 被引量：25
4胡文泽,王宝聚,耿丽杰,兰玉彬,李文华,李东升.基于Cascade R-CNN的玉米幼苗检测[J].农机化研究,2023,45(5):26-31. 被引量：8
5储震,张小玲,殷高方,贾仁庆,漆艳菊,徐敏,胡翔,黄朋,马明俊,杨瑞芳,方丽,赵南京.基于改进YOLOv3的浮游藻类检测算法[J].激光与光电子学进展,2023,60(2):247-254. 被引量：4
6黄磊磊,苗玉彬.基于深度学习的重叠柑橘分割与形态复原[J].农机化研究,2023,45(10):70-75. 被引量：7
7张倩,刘紫燕,陈运雷,吴应雨,郑旭晖.融合Transformer和改进PANet的YOLOv5s交通标志检测[J].传感技术学报,2023,36(2):232-241. 被引量：12
8张伟康,孙浩,陈鑫凯,李叙兵,姚立纲,东辉.基于改进YOLOv5的智能除草机器人蔬菜苗田杂草检测研究[J].图学学报,2023,44(2):346-356. 被引量：19
9刘雄彪,杨贤昭,陈洋,赵帅通.基于CIoU改进边界框损失函数的目标检测方法[J].液晶与显示,2023,38(5):656-665. 被引量：33
10王焱,丁华,孙晓春,李莉,刘泽平,楚寒驰.基于改进ECANet-TCN和迁移学习的轴承剩余寿命预测[J].振动与冲击,2023,42(21):149-159. 被引量：7

引证文献1

1李祥,田绍华,朱月浩,庞东林,张心久.基于特征关注机制的田间杂草检测方法研究[J].农机化研究,2026,48(1):119-126.

1王丽,陈菲,董雨梦,俞锴,俞文娟.超声影像组学鉴别诊断甲状腺皱缩结节与乳头状癌的价值[J].浙江创伤外科,2025,30(3):586-589.
2龚舒,宋章永,于红,张金平.高校公共科研平台入室人员管理规范实践[J].实验室科学,2025,28(1):127-130. 被引量：1
3Beike Yu,Dafang Wang,Xing Cui,Bowen Yang.A Perspective-Aware Cyclist Image Generation Method for Perception Development of Autonomous Vehicles[J].Computers, Materials & Continua,2025,82(2):2687-2702.

中国图象图形学报

2025年第3期

浏览历史

内容加载中请稍等...

融合通道注意力的跨尺度Transformer图像超分辨率重建被引量：1

参考文献1

二级参考文献2

共引文献10

同被引文献20

引证文献1

相关作者

相关机构

相关主题

浏览历史

融合通道注意力的跨尺度Transformer图像超分辨率重建 被引量：1

参考文献1

二级参考文献2

共引文献10

同被引文献20

引证文献1

相关作者

相关机构

相关主题

浏览历史

融合通道注意力的跨尺度Transformer图像超分辨率重建被引量：1