摘要
目的针对密集连接卷积神经网络(Dense Net)没有充分考虑通道特征相关性以及层间特征相关性的缺点,本文结合软注意力机制提出了端到端双通道特征重标定密集连接卷积神经网络。方法提出的网络同时实现了Dense Net网络的通道特征重标定与层间特征重标定。给出了Dense Net网络通道特征重标定与层间特征重标定方法;构建了端到端双通道特征重标定密集连接卷积神经网络,该网络每个卷积层的输出特征图经过两个通道分别完成通道特征重标定以及层间特征重标定,再进行两种重标定后特征图的融合。结果为了验证本文方法在不同图像分类数据集上的有效性和适应性,在图像分类数据集CIFAR-10/100以及人脸年龄数据集MORPH、Adience上进行了实验,提高了图像分类准确率,并分析了模型的参数量、训练及测试时长,验证了本文方法的实用性。与Dense Net网络相比,40层及64层双通道特征重标定密集连接卷积神经网络DFR-DenseNet (dual feature reweight Dense Net),在CIFAR-10数据集上,参数量仅分别增加1.87%、1.23%,错误率分别降低了12%、9.11%,在CIFAR-100数据集上,错误率分别降低了5.56%、5.41%;与121层DFR-DenseNet网络相比,在MORPH数据集上,平均绝对误差(MAE)值降低了7.33%,在Adience数据集上,年龄组估计准确率提高了2%;与多级特征重标定密集连接卷积神经网络MFR-DenseNet(multiple feature reweight Dense Net)相比,DFR-DenseNet网络参数量减少了一半,测试耗时约缩短为MFR-DenseNet的61%。结论实验结果表明本文端到端双通道特征重标定密集连接卷积神经网络能够增强网络的学习能力,提高图像分类的准确率,并对不同图像分类数据集具有一定的适应性、实用性。
Objective Image classification is one of the important research technologies in computer vision. The development of deep learning and convolutional neural networks(CNNs) has laid the technical foundation for image classification. In recent years,image classification methods based on deep CNN have become an important research topic. DenseN et is one of the widely applied deep CNNs in image classification,encouraging feature reusage and alleviating the vanishing gradient problem. However,this approach has obvious limitations. First,each layer simply combines the feature maps obtained from preceding layers by concatenating operation without considering the interdependencies between different channels. The network representation can be further improved by modeling feature channel correlation and realizing channel feature recalibration. Second,the correlation of the interlayer feature map is not explicitly modeled. Thus,adaptively learning the correlation coefficients by modeling the correlation of feature maps between the layers is important. Method The conventional DenseN et networks do not adequately consider the channel feature correlation and interlayer feature correlation. To address these limitations,multiple feature reweight DenseN et(MFR-DenseN et) combines channel feature reweight DenseN et(CFR-DenseN et) and inter-layer feature reweight DenseN et(ILFR-DenseN et) by ensemble learning method,thereby improving the representation power of the DenseN et by adaptively recalibrating the channel-wise feature responses and explicitly modeling the interdependencies between the features of different convolutional layers. However,MFR-DenseN et uses two independent parallel networks for image classification,which is not end-to-end training. The CFR-DenseN et and the ILFR-DenseN et models should be trained and saved in training. First,the models and weights are loaded,and the MFR-DenseN et needs multiple save and load. The training process is cumbersome. Second,the parameters and calculations are large,so the training takes a long time. In the test,the final prediction results of the MFR-DenseN et are obtained by taking an average of predictions from the two models. The parameters and test time are almost doubled compared with a single-channel feature reweight or interlayer feature reweight network. Therefore,the MFR-DenseN et has high requirements on the storage space and computing performance of the device in practical applications,thereby limiting its application. To address these limitations of MFR-DenseN et,this paper proposes an end-to-end dual feature reweight DenseN et(DFRDenseN et) based on the soft attention mechanism. The network implements the channel feature reweight and interlayer feature reweight of DenseN et. First,the channel feature reweight and interlayer feature reweight method are integrated in DenseN et. By introducing a squeeze-and-excitation module(SEM) after each 3 × 3 convolutional layer,our method solves the problem of exploiting the channel dependencies. Each feature map of each layer in the SEM obtains a weight through a squeeze and excitation operation. The representation of the network can be improved by explicitly modeling the interdependencies between the channels. The output feature map of the convolutional layer is subjected to two squeeze excitation operations. Thus,the weight value of each layer can be obtained to achieve the reweight of the interlayer features. Then,DFRDenseN et was constructed. The output feature map of each convolution layer completes the channel feature reweight and interlayer feature reweight through two channels. The concat and convolution operations were used to achieve the combination of two types of reweighted feature maps. Result First,the DFR-DenseN et is compared with the serial fusion method and parallel-addition fusion method on the image classification dataset CIFAR-10,which proves that DFR-DenseN et is the most effective. Second,to demonstrate the advantage of the DFR-DenseN et,we performed different experiments on the image classification dataset CIFAR-10/100. To show the effectiveness of the method on the high-resolution dataset,we conducted the age classification experiment on the face dataset MORPH,and the age group classification comparison experiment was performed on the unconstrained Adience dataset. The image classification accuracy was significantly improved.The 40-layer DFR-DenseN et had a 4. 69% error and outperformed the 40-layer DenseN et by 12% on CIFAR-10 with only1. 87% more parameters. The 64-layer DFR-DenseN et resulted in a 4. 29% error on CIFAR-10 and outperformed the 64-layer DenseN et by 9. 11%. On CIFAR-100,the 40-layer DFR-DenseN et and 64-layer DFR-DenseN et resulted in a24. 29% and 21. 86% test error on the test set,and they outperformed the 40-layer DenseN et and 64-layer DenseN et by5. 56% and 5. 41%,respectively. Age estimation from a single face image is an essential task in the field of human-computer interaction and computer vision,which has a wide range of practical applications. Age estimation consists of two categories: age classification and age regression. Adience is used for age group classification and obtained 58. 79% accuracy.MORPH Album 2 is used for age regression. The 121-layer DFR-DenseN et had a 3. 16 mean absolute error and outperformed the 121-layer DenseN et by 7. 33% on the MORPH Album 2. Compared with the MFR-DenseN et,the DFRDenseN et reduced the number of parameters by half. The test time of the DFR-DenseN et network was shortened to approximately 61% in the MFR-DenseN et test. Conclusion The experimental results show that the end-to-end dual feature reweight DenseN et can enhance the learning ability of the network and improve the accuracy of image classification.
作者
郭玉荣
张珂
王新胜
苑津莎
赵振兵
马占宇
Guo Yurong;Zhang Ke;Wang Xinsheng;Yuan Jinsha;Zhao Zhenbing;Ma Zhanyu(The Department of Electronic and Communication Engineering,North China Electric Power University,Baoding 071000,China;School of Information and Communication Engineering,Institute of Artificial Intelligence,Beijing University of Posts and Telecommunication,Beijing 100086,China)
出处
《中国图象图形学报》
CSCD
北大核心
2020年第3期486-497,共12页
Journal of Image and Graphics
基金
国家自然科学基金项目(61871182,61922015,61773071,61302163)
河北省自然科学基金项目(F2015502062,F2016502101,F2017502016)
北京市自然科学基金项目(4192055)
中央高校基本科研经费项目(2018MS094,2018MS095)。