期刊文献+

融合Transformer与卷积神经网络的图像分类算法 被引量:1

Research on Image Classification Algorithm Combining Transformer with CNN
在线阅读 下载PDF
导出
摘要 在传统图像分类网络中,卷积神经网络(Convolutional Neural Network,CNN)的卷积运算需要大量乘法和累加操作,计算成本较高。Transformer模型灵活的自注意力机制使其需要大规模数据以减少过拟合风险,导致其具有较大的参数量与计算复杂度。针对上述问题,文中提出一种多阶段图像分类模型HTCNet(Hybrid Transformer-Convolution Network)。在模型的浅层阶段使用部分卷积,利用特征图冗余对部分通道进行卷积运算以减少模型的浮点运算次数(Floating Point Operations,FLOPs)。在深层阶段将卷积运算加入自注意力机制,构建一种高效的自注意力机制,有效缓解模型的过拟合风险并降低对数据的依赖性。通过自适应输入分辨率能够获取更多位置信息的卷积位置编码(Convolution Positional Encoding,CPE)。HTCNet在不同规模数据集CIFAR-10和ImageNet-1K上的分类准确率分别达到95.4%和82.6%。实验结果表明与同等规模的卷积神经网络和其他Transformer模型比较,HTCNet性能更好。 In traditional image classification networks,the convolutional operation of CNN(Convolutional Neural Network)requires a lot of multiplication and accumulation operations,and the calculation cost is high.The flexible self-attention mechanism of the Transformer model requires large-scale data to reduce the risk of overfitting,but has a large number of parameters and computational complexity.To solve these problems,a multi-stage image classification model HTCNet(Hybrid Transformer-Convolution Network)is proposed.In the shallow stage of the model,partial convolution is used,and some channels are convolved with feature graph redundancy to reduce the FLOPs(Floating Point Operations).In the deep stage,convolution operation is added to the self-attention mechanism to build an efficient self-attention mechanism,which can effectively alleviate the overfitting risk and data dependence of the model.CPE(Convolutional Position Coding)with more position information can be obtained by adaptive input resolution.The classification accuracy of HTCNet on different scale data sets CIFAR-10 and ImageNet-1K reached 95.4%and 82.6%,respectively.Experimental results show that HTCNet performs better than other Transformer models and convolutional neural networks of the same scale.
作者 朱灵龙 王亚刚 陈怡 ZHU Linglong;WANG Yagang;CHEN Yi(School of Optical-Electrical&Computer Engineering,Shanghai University of Science&Technology,Shanghai 200093,China)
出处 《电子科技》 2025年第10期96-105,共10页 Electronic Science and Technology
基金 国家重点研发计划(2020YFC2007502)。
关键词 图像分类 卷积神经网络 TRANSFORMER 自注意力机制 模型融合 HTCNet 深度学习 过拟合 image classification convolutional neural network Transformer self-attention mechanism model fusion HTCNet deep learning overfitting
  • 相关文献

参考文献5

二级参考文献37

共引文献2152

同被引文献17

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部