期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
高能效CNN加速器设计
1
作者 喇超 李淼 +1 位作者 张峰 张翠婷 《计算机科学与探索》 北大核心 2025年第9期2520-2531,共12页
当前,卷积神经网络(CNN)被广泛应用于图片分类、目标检测与识别以及自然语言理解等领域。随着卷积神经网络的复杂度和规模不断增加,对硬件部署带来了极大的挑战,尤其是面对嵌入式应用领域的低功耗、低时延需求,大多数现有平台存在高功... 当前,卷积神经网络(CNN)被广泛应用于图片分类、目标检测与识别以及自然语言理解等领域。随着卷积神经网络的复杂度和规模不断增加,对硬件部署带来了极大的挑战,尤其是面对嵌入式应用领域的低功耗、低时延需求,大多数现有平台存在高功耗、控制复杂的问题。为此,以优化加速器能效为目标,对决定系统能效的关键因素进行分析,以缩放计算精度和降低系统频率为主要出发点,研究极低比特下全网络统一量化方法,设计一种高能效CNN加速器MSNAP。该加速器以1比特权重和4比特激活值的轻量化计算单元为基础,构建了128×128空间并行加速阵列结构,由于空间并行度高,整个系统采用低运行频率。同时,采用权重固定、特征图广播的数据传播方式,有效减少权重、特征图的数据搬移次数,达到降低功耗、提高系统能效比的目的。通过22 nm工艺流片验证,结果表明,在20 MHz频率下,峰值算力达到10.54 TOPS,能效比达到64.317 TOPS/W,相较同类型加速器在采用CIFAR-10数据集的分类网络中,该加速器能效比有5倍的提升。部署的目标检测网络YOLO能够达到60 FPS的检测速率,完全满足嵌入式应用需求。 展开更多
关键词 加速器 卷积神经网络(CNN) 轻量化神经元计算单元(NCU) MSNAP 分支卷积量化(bcq)
在线阅读 下载PDF
Branch Convolution Quantization for Object Detection 被引量:1
2
作者 Miao Li Feng Zhang Cuiting Zhang 《Machine Intelligence Research》 EI CSCD 2024年第6期1192-1200,共9页
Quantization is one of the research topics on lightweight and edge-deployed convolutional neural networks(CNNs).Usu-ally,the activation and weight bit-widths between layers are inconsistent to ensure good performance ... Quantization is one of the research topics on lightweight and edge-deployed convolutional neural networks(CNNs).Usu-ally,the activation and weight bit-widths between layers are inconsistent to ensure good performance of CNN,meaning that dedicated hardware has to be designed for specific layers.In this work,we explore a unified quantization method with extremely low-bit quantized weights for all layers.We use thermometer coding to convert the 8-bit RGB input images to the same bit-width as that of the activa-tions of middle layers.For the quantization of the results of the last layer,we propose a branch convolution quantization(BCQ)method.Together with the extremely low-bit quantization of the weights,the deployment of the network on circuits will be simpler than that of other works and consistent throughout all the layers including the first layer and the last layer.Taking tiny_yolo_v3 and yolo_v3 on VOC and COCO datasets as examples,the feasibility of thermometer coding on input images and branch convolution quantization on output results is verified.Finally,tiny_yolo_v3 is deployed on FPGA,which further demonstrates the high performance of the proposed algorithm on hardware. 展开更多
关键词 branch convolution quantization thermometer coding extremely low-bit quantization hardware deployment object detection
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部