期刊文献+

一种有效神经网络训练优化方法 被引量:4

A novel optimization method of neural network training
在线阅读 下载PDF
导出
摘要 指数移动平均(EMA)算法,通常可用于过滤由小批量梯度下降引起的噪声,提高模型鲁棒性。然而,传统EMA算法在持续训练后期,无法有效优化网络参数,深层神经网络经常出现过拟合。因此,本文提出一种以变系数Tanh为衰减函数的动态衰减EMA算法,结合SGD优化器的T-ADEMA+SGD算法,进行神经网络训练。针对MNIST、CIFAR_10、CIFAR_100数据集,采用优化器SGD训练ResNet50模型,并针对胸部X射线图像训练Vision Transformer(ViT)模型,同时采用深度卷积生成对抗网络(DCGAN)进行数据增强和基于t分布,随机邻域嵌入(t-SNE)模型用于可视化分析。实验表明,对于CIFAR_100测试集,T-ADEMA+SGD算法的准确率、精度、召回率和F值分别为74.15%、74.39%、74.15%、74.04%;而对于Kaggle COVID-19三分类图像,相应的评价指标分别为87.94%、91.19%、84.43%、86.87%,与典型算法相比,本文模型可以更好地根据训练时间,动态调整最优参数、降低噪声,具有更好的泛化性能,适用于各种常用数据集。 The exponential moving average(EMA) algorithm can often be used to filter noises caused by mini-batch gradient descent for the improvement of model robustness. However, the traditional EMA algorithm cannot optimize the network parameters effectively at the late period of continuous training, and the overfitting often occurs for the deep networks. Therefore, the paper proposes a novel EMA algorithm(T-ADEMA) with dynamic decay which regards Tanh function with variable coefficient as the decay function. In addition, T-ADEMA algorithm is used to train ResNet50 with the optimizer SGD on three different datasets, i.e. MNIST, CIFAR_10, CIFAR_100, and train Vision Transformer(ViT) on chest X-ray images, including data augmentation based on deep convolutional generative adversarial networks(DCGAN) and t-Distributed Stochastic Neighbor Embedding(t-SNE) for visualization. The experiments show that for CIFAR_100 test set, the Accuracy, Precision, Recall and FScore of T-ADEMA algorithm are 74.15%, 74.39%, 74.15%, 74.04% respectively. And for chest X-ray test set, the corresponding evaluation indicators are 87.94%, 91.19%, 84.43%, 86.87% respectively. Moreover, in comparison with the other state-of-art algorithms, T-ADEMA+SGD algorithm can adjust optimal parameters better according to training times dynamically and reduce noises. The proposed method achieves better generalization performance and be suitable for variety of classical datasets.
作者 陈青 杨晶东 王晗 彭坤 CHEN Qing;YANG Jingdong;WANG Han;PENG Kun(School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China)
出处 《智能计算机与应用》 2022年第9期56-64,共9页 Intelligent Computer and Applications
基金 国家自然科学基金(81973749)。
关键词 指数移动平均 深层神经网络 衰减 泛化性能 exponential moving average deep neural networks decay generalization performance
  • 相关文献

参考文献1

二级参考文献3

共引文献5

同被引文献37

引证文献4

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部