期刊文献+
共找到3篇文章
< 1 >
每页显示 20 50 100
An Efficient Network-on-Chip Router for Dataflow Architecture 被引量:6
1
作者 Xiao-Wei Shen Xiao-Chun Ye +6 位作者 Xu Tan Da Wang Lunkai Zhang Wen-Ming Li Zhi-Min Zhang Dong-Rui Fan Ning-Hui Sun 《Journal of Computer Science & Technology》 SCIE EI CSCD 2017年第1期11-25,共15页
Dataflow architecture has shown its advantages in many high-performance computing cases. In dataflow computing, a large amount of data are frequently transferred among processing elements through the network-on-chip ... Dataflow architecture has shown its advantages in many high-performance computing cases. In dataflow computing, a large amount of data are frequently transferred among processing elements through the network-on-chip (NoC). Thus the router design has a significant impact on the performance of dataflow architecture. Common routers are designed for control-flow multi-core architecture and we find they are not suitable for dataflow architecture. In this work, we analyze and extract the features of data transfers in NoCs of dataflow architecture: multiple destinations, high injection rate, and performance sensitive to delay. Based on the three features, we propose a novel and efficient NoC router for dataflow architecture. The proposed router supports multi-destination; thus it can transfer data with multiple destinations in a single transfer. Moreover, the router adopts output buffer to maximize throughput and adopts non-flit packets to minimize transfer delay. Experimental results show that the proposed router can improve the performance of dataflow architecture by 3.6x over a state-of-the-art router. 展开更多
关键词 multi-destination ROUTER NETWORK-ON-CHIP dataflow architecture high-performance computing
原文传递
A Non-Stop Double Buffering Mechanism for Dataflow Architecture 被引量:4
2
作者 Xu Tan Xiao-Wei Shen +6 位作者 Xiao-Chun Ye Da Wang Dong-Rui Fan Lunkai Zhang Wen-Ming Li Zhi-Min Zhang Zhi-Min Tang 《Journal of Computer Science & Technology》 SCIE EI CSCD 2018年第1期145-157,共13页
Double buffering is an effective mechanism to hide the latency of data transfers between on-chip and off-chip memory. However, in dataflow architecture, the swapping of two buffers during the execution of many tiles d... Double buffering is an effective mechanism to hide the latency of data transfers between on-chip and off-chip memory. However, in dataflow architecture, the swapping of two buffers during the execution of many tiles decreases the performance because of repetitive filling and draining of the dataflow accelerator. In this work, we propose a non-stop double buffering mechanism for dataflow architecture. The proposed non-stop mechanism assigns tiles to the processing element array without stopping the execution of processing elements through optimizing control logic in dataflow architecture. Moreover, we propose a work-flow program to cooperate with the non-stop double buffering mechanism. After optimizations both on control logic and on work-flow program, the filling and draining of the array needs to be done only once across the execution of all tiles belonging to the same dataflow graph. Experimental results show that the proposed double buffering mechanism for dataftow architecture achieves a 16.2% average efficiency improvement over that without the optimization. 展开更多
关键词 non-stop double buffering dataflow architecture high-performance computing
原文传递
Accelerating hybrid and compact neural networks targeting perception and control domains with coarse-grained dataflow reconfiguration
3
作者 Zheng Wang Libing Zhou +12 位作者 Wenting Xie Weiguang Chen Jinyuan Su Wenxuan Chen Anhua Du Shanliao Li Minglan Liang Yuejin Lin Wei Zhao Yanze Wu Tianfu Sun Wenqi Fang Zhibin Yu 《Journal of Semiconductors》 EI CAS CSCD 2020年第2期29-41,共13页
Driven by continuous scaling of nanoscale semiconductor technologies,the past years have witnessed the progressive advancement of machine learning techniques and applications.Recently,dedicated machine learning accele... Driven by continuous scaling of nanoscale semiconductor technologies,the past years have witnessed the progressive advancement of machine learning techniques and applications.Recently,dedicated machine learning accelerators,especially for neural networks,have attracted the research interests of computer architects and VLSI designers.State-of-the-art accelerators increase performance by deploying a huge amount of processing elements,however still face the issue of degraded resource utilization across hybrid and non-standard algorithmic kernels.In this work,we exploit the properties of important neural network kernels for both perception and control to propose a reconfigurable dataflow processor,which adjusts the patterns of data flowing,functionalities of processing elements and on-chip storages according to network kernels.In contrast to stateof-the-art fine-grained data flowing techniques,the proposed coarse-grained dataflow reconfiguration approach enables extensive sharing of computing and storage resources.Three hybrid networks for MobileNet,deep reinforcement learning and sequence classification are constructed and analyzed with customized instruction sets and toolchain.A test chip has been designed and fabricated under UMC 65 nm CMOS technology,with the measured power consumption of 7.51 mW under 100 MHz frequency on a die size of 1.8×1.8 mm^2. 展开更多
关键词 CMOS technology digital integrated circuits neural networks dataflow architecture
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部