期刊文献+

高带宽远程内存结构中的预取研究 被引量:2

The Study on Prefetching of Remote Memory Architecture
在线阅读 下载PDF
导出
摘要 高速电路和光互联技术的发展极大地提高了网络的速度与带宽。因而,突破高性能计算机 CPU与内存紧耦合的传统结构成为可能,CPU与内存的耦合不再受距离的限制,这必将引起体系结构的变革。文[1]提出 DSAG结构——CPU与内存在空间上分离,每个CPU节点上仅留少量内存,将海量内存放在远程统一管理作为内存服务器,CPU节点和内存服务器之间通过高速网络互连。这种新的体系结构带来了更好的共享性和可扩展性,但同时也对我们解决CPU和内存之间的不平衡性问题带来了挑战。为了降低DSAG这种远程内存结构增加的访存时延,我们考虑到CPU正常访存没有充分利用网络的高带宽,因此可以利用剩余的网络带宽来进行远程内存数据的预取。本论文在应用程序执行时记录本地(相对于远程内存)不命中的地址信息,以页对齐分析其中存在的页框流(Page Frame Stream)的统计特征,并提出可基于页框流的预取机制可降低访存延迟、提升系统性能的观点。最后我们采用模拟的方法验证了观点的可行性与正确性,进一步提出了三种预取策略,比较并分析影响预取效果的因素。 High speed electrical and optical interconnection technique brings us high-speed and high-bandwidth network. Thus, we can break through the traditional computer architecture by decoupling memory from CPU. Distance between CPU and memory is no longer restricted, and this will consequentially cause innovation in high performance computer architecture. In paper[1]the authors present DSAG architecture-each CPU node is only attached with a small quantity of memory, while massive memory served as memory server is located away, and they are connected by high-speed network. This architecture provides better shareability and more scalability, but it also challenges us to reduce the gap between processor and memory. To reduce the delay of remote memory access, with abundant network bandwidth, we can use the spare network bandwidth while CPU runs to prefetch data from the remote memory. In this paper, we record and analyze the address missed in local memory access while program runs, and analyze the statistical characteristic of the page frame stream. We propose a prefetching approach based on page frame stream to reduce remote memory access delay and improve the system performance. Finally, we use simulation technique to verify the feasibility and correctness of the prefetching approach, and propose three prefetching policies as well as the factors that affect the prefetching.
出处 《计算机科学》 CSCD 北大核心 2005年第8期15-20,共6页 Computer Science
  • 相关文献

参考文献2

二级参考文献8

  • 1胡伟武,J Comput Sci Technol,1998年,13卷,2期,110页
  • 2Iftode L,Proc 8th Annual ACM Sympo Parallel Algorithms and Architectures,1996年,277页
  • 3K Compton, S Hauck. Reconfigurable computing: A survey of systems and software. ACM Computing Surveys, 2002, 34 (2):171~210
  • 4I Foster, C Kesselman, S Tuecke. The anatomy of the grid:Enabling scalable virtual organizations. International Journal of Supercomputer Applications, 2001, 15(3): 200~222
  • 5Neil Savage. Linking with light. IEEE Spectrum, 2002, 39(8):32 - 36
  • 6William J Dally. Computer architecture is all about interconnect.The 8th Int'l Symp High Performance Computer Archifecture,Boston, Massachusettes, 2002
  • 7David Patterson, Aron Brown et al. Recover oriented computing(ROC): Motivation, definition, techniques, and case studies. U C Berkeley, Tech Rep: UCB/CSD-02-1175, 2002
  • 8Dona L Crawford. Fifty years of computing at LLNL as a lens to the future. The 17th Int'l Supercomputer Conf (ISC2002),Heidelberg, Germany, 2002

共引文献31

同被引文献31

  • 1刘立 陈明宇 樊建平.一种网络内存架构及性能分析.计算机科学,2006,33(7):18-23.
  • 2Katayama Y, Okazaki A. Optical interconnect opportunities for future server memory systems [C] //Proc of HPCA-13. Washington, DC: IEEE Computer Society, 2007:46-50.
  • 3Bao Yungang, Chen Mingyu, Ruan Yuan, et al. HMTT: A platform independent full-system memory trace monitoring system [C] //Proc of SIGMETRICS 08. New York: ACM, 2008: 229-240.
  • 4Przybylski S. The performance impact of block sizes and fetch strategies [C]//Proc of the 17th Annual Int Symp on Computer Architecture. New York: ACM, 1990:160-169.
  • 5Ding Chen, Zhong Yutao. Predicting whole-program locality through reuse distanee analysis [C]//Proc of PLDI'03. New York: ACM, 2003:245-257.
  • 6Mohan Tushar, Supinski Bronis R de, MeKee Sally A, et al. Identifying and exploiting spatial regularity in data memory references [C] //Proc of Supercomputing Conf 2003. Washington, DC: IEEE Computer Society, 2003:49-49.
  • 7Smith A J. Cache memories[J]. ACM Computing Surveys, 1982, 14(3): 473-530.
  • 8Dahlgren F, Dubois M, Stenstrom P. Sequential hardware prefetching in shared-memory multiprocessors [J]. IEEE Trans on Parallel and Distributed Systems, 1995, 6 (7): 733-746.
  • 9Jouppi N P. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers [C]//Proc of the 17th Annual Int Syrup on Computer Architecture, New York: ACM, 1990:364-373.
  • 10Palacharla S, Kessler R E. Evaluating stream buffers as a secondary cache replacement[C] //Proc of the 21st Int Symp on Computer Architecture. New York: ACM, 1994:24-33.

引证文献2

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部