期刊文献+

基于FPGA的卷积神经网络加速模块设计 被引量:4

Design of Convolutional Neural Network acceleration module based on FPGA
在线阅读 下载PDF
导出
摘要 针对卷积神经网络前向推理硬件加速的研究,提出一种基于FPGA(Field Programmable Gate Array)的卷积神经网络加速模块,以期在资源受限的硬件平台中加速卷积运算.通过分析卷积神经网络基本结构与常见卷积神经网络的特性,设计了一种适用于常见卷积神经网络的硬件加速架构.在该架构中,采用分层次缓存数据与分类复用数据策略,优化卷积层片外访存总量,缓解带宽压力;在计算模块中,在输入输出通道上并行计算,设计了将乘加树与脉动阵列相结合的高效率计算阵列,兼顾了计算性能与资源消耗.实验结果表明,提出的加速模块运行VGG‐16(Visual Geometry Group)卷积神经网络性能达到189.03 GOPS(Giga Operations per Second),在DSP(Digital Signal Processor)性能效率上优于大部分现有的解决方案,内存资源消耗比现有解决方案减少41%,适用于移动端卷积神经网络硬件加速. To accelerate the convolutional operation of Convolutional Neural Network in resource‐constrained hardware platforms,a Convolutional Neural Network acceleration module based on FPGA(Field Programmable Logic Gate Array)is proposed.By analyzing the basic structure of Convolutional Neural Network and the characteristics of common Convolutional Neural Networks,a hardware acceleration architecture for common convolutional neural networks is designed.In the above architecture,the strategies of hierarchical caching data and classified reusing data are adopted to minimize the total amount of external memory access data and reduce the pressure of bandwidth.Considering the computing performance and resource consumption,a high efficiency computing array is designed which combines multiplicative and additive tree with systolic array for parallel computation on input and output channels in the computing module.The experimental results show that the performance of the proposed acceleration module reaches 189.03 GOPS(Giga Operations per Second)when running VGG‐16(Visual Geometry Group)Convolutional Neural Network,which is better than most of the existing solutions in terms of DSP performance efficiency,and 41%lower than the existing solutions in terms of memory resource consumption.The proposed module is suitable for hardware acceleration of mobile terminal convolutional neural network.
作者 梅志伟 王维东 Mei Zhiwei;Wang Weidong(College of Information Science&Electronic Engineering,Zhejiang University,Hangzhou,310013,China;ZJU‐Rock Chips Joint Laboratory of Multimedia System,College of Information Science&Electronic Engineering,Zhejiang University,Hangzhou,310013,China)
出处 《南京大学学报(自然科学版)》 CAS CSCD 北大核心 2020年第4期581-590,共10页 Journal of Nanjing University(Natural Science)
关键词 卷积神经网络 硬件加速 FPGA 并行计算 高效率乘加阵列 Convolutional Neural Network hardware acceleration FPGA parallel computation DSP performance efficiency
  • 相关文献

同被引文献41

引证文献4

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部