Accelerating hybrid and compact neural networks targeting perception and control domains with coarse-grained dataflow reconfiguration

Accelerating hybrid and compact neural networks targeting perception and control domains with coarse-grained dataflow reconfiguration

下载PDF

导出

摘要 Driven by continuous scaling of nanoscale semiconductor technologies,the past years have witnessed the progressive advancement of machine learning techniques and applications.Recently,dedicated machine learning accelerators,especially for neural networks,have attracted the research interests of computer architects and VLSI designers.State-of-the-art accelerators increase performance by deploying a huge amount of processing elements,however still face the issue of degraded resource utilization across hybrid and non-standard algorithmic kernels.In this work,we exploit the properties of important neural network kernels for both perception and control to propose a reconfigurable dataflow processor,which adjusts the patterns of data flowing,functionalities of processing elements and on-chip storages according to network kernels.In contrast to stateof-the-art fine-grained data flowing techniques,the proposed coarse-grained dataflow reconfiguration approach enables extensive sharing of computing and storage resources.Three hybrid networks for MobileNet,deep reinforcement learning and sequence classification are constructed and analyzed with customized instruction sets and toolchain.A test chip has been designed and fabricated under UMC 65 nm CMOS technology,with the measured power consumption of 7.51 mW under 100 MHz frequency on a die size of 1.8×1.8 mm^2. Driven by continuous scaling of nanoscale semiconductor technologies, the past years have witnessed the progressive advancement of machine learning techniques and applications. Recently, dedicated machine learning accelerators, especially for neural networks, have attracted the research interests of computer architects and VLSI designers. State-of-the-art accelerators increase performance by deploying a huge amount of processing elements, however still face the issue of degraded resource utilization across hybrid and non-standard algorithmic kernels. In this work, we exploit the properties of important neural network kernels for both perception and control to propose a reconfigurable dataflow processor, which adjusts the patterns of data flowing, functionalities of processing elements and on-chip storages according to network kernels. In contrast to stateof-the-art fine-grained data flowing techniques, the proposed coarse-grained dataflow reconfiguration approach enables extensive sharing of computing and storage resources. Three hybrid networks for MobileNet, deep reinforcement learning and sequence classification are constructed and analyzed with customized instruction sets and toolchain. A test chip has been designed and fabricated under UMC 65 nm CMOS technology, with the measured power consumption of 7.51 mW under100 MHz frequency on a die size of 1.8 × 1.8 mm^2.

作者 Zheng Wang Libing Zhou Wenting Xie Weiguang Chen Jinyuan Su Wenxuan Chen Anhua Du Shanliao Li Minglan Liang Yuejin Lin Wei Zhao Yanze Wu Tianfu Sun Wenqi Fang Zhibin Yu

机构地区 Shenzhen Institutes of Advanced Technology School of Microelectronics School of Information and Communication Changzhou Campus of Hohai University

出处《Journal of Semiconductors》 EI CAS CSCD 2020年第2期29-41,共13页 半导体学报（英文版）

基金 supported by NSFC with Grant No. 61702493, 51707191 Science and Technology Planning Project of Guangdong Province with Grant No. 2018B030338001 Shenzhen S&T Funding with Grant No. KQJSCX20170731163915914 Basic Research Program No. JCYJ20170818164527303, JCYJ20180507182619669 SIAT Innovation Program for Excellent Young Researchers with Grant No. 2017001

关键词 CMOS technology digital integrated circuits neural networks dataflow architecture CMOS technology digital integrated circuits neural networks dataflow architecture

分类号 TP332 [自动化与计算机技术—计算机系统结构] TP183 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

1Shalini Singh,Rajeev Tripathi.Enhancement in QoS for Hybrid Networks Using IEEE 802.11e HCCA with Extended AODV Routing Protocol[J].International Journal of Communications, Network and System Sciences,2015,8(6):236-248.
2Weixiong Jiang,Heng Yu,Jiale Zhang,Jiaxuan Wu,Shaobo Luo,Yajun Ha.Optimizing energy efficiency of CNN-based object detection with dynamic voltage and frequency scaling[J].Journal of Semiconductors,2020,41(2):83-92.
3MIAO Shanshan.Yunhai 1-02 Launched atop a LM-2D[J].Aerospace China,2019,20(3):49-49.
4Christophe Michard,Yosuke Tanigawa,Hideki Tode.Deployment of the Content-Based Switching Network[J].Communications and Network,2014,6(1):29-42.
5ZHAO Cong,YANG Qiang.LM-4B Orbited 3 Satellites Successfully[J].Aerospace China,2019,20(3):52-52.
6Ruiqi Luo,Xiaolei Chen,Yajun Ha.A routing algorithm for FPGAs with time-multiplexed interconnects[J].Journal of Semiconductors,2020,41(2):73-82.
7Haixia Wu,Long He,Xiaoran Li,Yilong Bai,Minghao Zhang.Design of AB^2 in Galois Fields Based on Multiple-Valued Logic[J].Journal of Beijing Institute of Technology,2019,28(4):764-769.
8Malik Yousef,Naim Najami,Loai Abedallah,Waleed Khalifa.Computational Approaches for Biomarker Discovery[J].Journal of Intelligent Learning Systems and Applications,2014,6(4):153-161.
9Liu Zhiliang.Build World Leading Professional Maritime Expo--An Interview with Xing Wenhua, Chairman of the Chinese Organizing Committee of Marintec China 2019 and Chairman of the Shanghai Society of Naval Architects and Marine Engineers (SSNAME)[J].船舶经济贸易,2019(12):53-56.
10Demin Gao,Shuo Zhang,Fuquan Zhang,Xijian Fan,Jinchi Zhang.Maximum Data Generation Rate Routing Protocol Based on Data Flow Controlling Technology for Rechargeable Wireless Sensor Networks[J].Computers, Materials & Continua,2019(5):649-667. 被引量：2

Journal of Semiconductors

2020年第2期

浏览历史

内容加载中请稍等...

Accelerating hybrid and compact neural networks targeting perception and control domains with coarse-grained dataflow reconfiguration

相关作者

相关机构

相关主题

浏览历史