期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
Reinventing Memory System Design for Many-Accelerator Architecture
1
作者 王颖 张磊 +1 位作者 韩银和 李华伟 《Journal of Computer Science & Technology》 SCIE EI CSCD 2014年第2期273-280,共8页
The many-accelerator architecture, mostly composed of general-purpose cores and accelerator-like function units (FUs), becomes a great alternative to homogeneous chip multiprocessors (CMPs) for its superior power-... The many-accelerator architecture, mostly composed of general-purpose cores and accelerator-like function units (FUs), becomes a great alternative to homogeneous chip multiprocessors (CMPs) for its superior power-efficiency. However, the emerging many-accelerator processor shows a much more complicated memory accessing pattern than general purpose processors (GPPs) because the abundant on-chip FUs tend to generate highly-concurrent memory streams with distinct locality and bandwidth demand. The disordered memory streams issued by diverse accelerators exhibit a mutual- interference behavior and cannot be efficiently handled by the orthodox main memory interface that provides an inflexible data fetching mode. Unlike the traditional DRAM memory, our proposed Aggregation Memory System (AMS) can function adaptively to the characterized memory streams from different FUs, because it provides the FUs with different data fetching sizes and protects their locality in memory access by intelligently interleaving their data to memory devices through sub-rank binding. Moreover, AMS can batch the requests without sub-rank conflict into a read burst with our optimized memory scheduling policy. Experimental results from trace-based simulation show both conspicuous performance boost and energy saving brought by AMS. 展开更多
关键词 many-accelerator chip multiprocessor MEMORY general purpose processor
原文传递
A General-Purpose Many-Accelerator Architecture Based on Dataflow Graph Clustering of Applications
2
作者 陈鹏 张磊 +1 位作者 韩银和 陈云霁 《Journal of Computer Science & Technology》 SCIE EI CSCD 2014年第2期239-246,共8页
The combination of growing transistor counts and limited power budget within a silicon die leads to the utilization wall problem (a.k.a. "Dark Silicon"), that is only a small fraction of chip can run at full speed... The combination of growing transistor counts and limited power budget within a silicon die leads to the utilization wall problem (a.k.a. "Dark Silicon"), that is only a small fraction of chip can run at full speed during a period of time. Designing accelerators for specific applications or algorithms is considered to be one of the most promising approaches to improving energy-efficiency. However, most current design methods for accelerators are dedicated for certain applications or algorithms, which greatly constrains their applicability. In this paper, we propose a novel general-purpose many-accelerator architecture. Our contributions are two-fold. Firstly, we propose to cluster dataflow graphs (DFGs) of hotspot basic blocks (BBs) in applications. The DFG clusters are then used for accelerators design. This is because a DFC is the largest program unit which is not specific to a certain application. We analyze 17 benchmarks in SPEC CPU 2006, acquire over 300 DFGs hotspots by using LLVM compiler tool, and divide them into 15 clusters based on graph similarity. Secondly, we introduce a function instruction set architecture (FISC) and illustrate how DFG accelerators can be integrated with a processor core and how they can be used by applications. Our results show that the proposed DFG clustering and FISC design can speed up SPEC benchmarks 6.2X on average. 展开更多
关键词 dataflow graph many-accelerator CLUSTERING function instruction set architecture
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部