摘要
在图像分类中,有益的语义信息补充可以高效捕捉关键区域,提高分类性能。为了获得有益的图像语义信息,提出了一种SE-CMT(SE-Networks CNN Meet Transformer)模型。该模型依据简单的CNN特征提取理论,输入图像通过SE-CMT Stem重标定前面提取到的特征,再通过SE-CMT Block中的深度卷积层来增强特征。利用SE-CNN(Squeeze-and-Excitation Networks-CNN)提取低级特征、加强局部性,并结合Transformer建立长程依赖关系,通过融合SE-CNN和Transformer结构,提高特征提取性能。在ImageNet和CIFAR-10数据集上的实验结果表明:SE-CMT模型的分类准确率分别达到了85.47%和87.16%top-1精度,性能优于基线模型CMT和Vision Transformer。因此,本文提出的SE-CMT模型是一种有效的图像特征提取方法。
In image classification,beneficial semantic information supplementation can efficiently capture key regions and improve classification performance.To obtain beneficial image semantic information,an SE-CMT(SE-Networks CNN Meet Transformer)model is proposed.The model is based on the simple CNN feature extraction theory,where the input image is rescaled by the SE-CMT Stem to the previously extracted features,and then the features are enhanced by the deep convolutional layer in the SE-CMT Block.The model uses SE-CNN(Squeeze-and-Excitation Networks-CNN)to extract low-level features,enhance localization,and combine with Transformer to establish long-range dependencies to improve feature extraction performance by fusing SE-CNN and Transformer structures.The experimental results on ImageNet and CIFAR-10 datasets show that the classification accuracy of the SE-CMT model reaches 85.47%and 87.16%top-1 accuracy,respectively,and the experiments show that the method outperforms the baseline models CMT and Vision Transformer.Therefore,the proposed SE-CMT model in this study is an effective method for image feature extraction.
作者
杜睿山
周长坤
解红涛
李宏杰
DU Ruishan;ZHOU Changkun;XIE Hongtao;LI Hongjie(School of Computer and Information Technology,Northeast Petroleum School,Daqing 163318,China;Key Laboratory of Oil and Gas Reservoir and Underground Gas Storage Integrity Evaluations,Northeast Petroleum University,Daqing 163318,China)
出处
《哈尔滨理工大学学报》
北大核心
2024年第6期74-81,共8页
Journal of Harbin University of Science and Technology
基金
国家重点研发计划(2022YFE0206800)
黑龙江省自然科学基金(LH2021F004).