期刊文献+

一个新的循环分块算法 被引量:2

A NEW LOOP TILING ALGORITHM
在线阅读 下载PDF
导出
摘要 循环分块是一种提高循环 Cache命中率的循环变换技术 ,循环分块的大小是决定循环分块效率的关键因素 .CME(cache miss equations)是一种精确分析程序中循环 Cache命中率的数学模型 ,从 CME理论模型出发 ,通过比较循环分块前后 CME的变化 ,结合 PADDING技术可以得出一个循环分块算法 .实验表明 ,通过该算法计算出的分块大小较之经典的 L RW循环分块算法 ,在确保完全消除循环中数组引用数据访问 Cache自冲突的同时 ,可以获得更大的分块 ,从而提高了循环分块的分块效率 . Loop tiling is a kind of loop transformation technology to improve the cache hit rate of loops, and the size of tiling is the key factor of its efficiency. CME (cache miss equations) is a kind of math framework which can accurately analyse the cache hit rate of the loops in program. Based on the theory model of CME, by comparing the changes of CME before and after the loop tiling and by being combined with the PADDING technology, a new loop tiling algorithm can be gained. The experiment result shows that the new tiling algorithm, which ensures eliminating self-conflict cache miss of array reference data access in the loop totally, can obtain larger tiling size than the classical tiling algorithm-LRW, thus improving the efficiency of tiling.
作者 舒辉 康绯
出处 《计算机研究与发展》 EI CSCD 北大核心 2002年第10期1303-1306,共4页 Journal of Computer Research and Development
基金 国家自然科学基金资助 (10 0 72 0 77)
关键词 循环分块算法 CACHE命中率 编译优化 数学模型 数组 CME理论 loop tiling, CME, cache hit rate
  • 相关文献

参考文献5

  • 1[1]Rafael H Saavedra. The combined effectiveness of unimodular transformation, tiling and software prefetching. In: 10th Int'l Parallel Processing Symposium. 1996. 61~79
  • 2[2]M J Wolf. Improving locality and parallelism in nested loops[Ph D dissertation]. University of Stanford, Stanford, 1992
  • 3[3]Monica S Lam, Edward E Rothberg, Michael E Wolf. The cache performance and optimizations of blocked algorithms.Fourth Internet Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS IV), Palo Alto, California, 1991. 9~13
  • 4[4]S Ghosh. Cache miss equations: Compiler analysis framework for tuning memory behavior [Ph D dissertation]. Princeton University Department of Electrical Engineering, 1999
  • 5[5]Gabriel Rivera, Chau-wen Tseng. Data transformations for eliminating conflict missed. The 1998 ACM SIGHAN Conf on Programming Language Design and implementation (PLDI '98), Montreal, Canada, 1998. 8~13

同被引文献17

  • 1李明,唐志敏.一种新的Cache优化方法──部分Cache局部性方法[J].计算机学报,1997,20(1):1-8. 被引量:9
  • 2Angeline P J. Using selection to improve particle swarm optimiza- tion. IEEE International Conference on Evolutionary ComPutation, USA : Alaska, 1998:84-89.
  • 3Zhang R, Zhang J, Lok T M, et al. A hybrid particle swarm optimiza- tion-back-propagation algorithm for feed forward neural network train- ing. Applied Mathematics and Computation,2007 ; 185 (2) : 1026-1037.
  • 4COUSSY P, GAJSKI D D, MEREDITH M, et al. An introduction to high-level synthesis[ J]. IEEE Design & Test of Computers ,2009,26(4) :8-17.
  • 5MARTIN G, SMITH G. High-level synthesis : past, present, and future [ J 1-IEEE Design & Test of Computers ,2009,26 (4):18-25.
  • 6CONG J, LIU B, NEUENDORFFER S,et al. High-level synthesis for FPGAs:from prototyping to deployment [ J ]. IEEE Trans on Com- puter-aided Design of Integrated Circuits and Systems,2011,30 (4) :473-491.
  • 7EDWARDS S A. The challenges of synthesizing hardware from C-Like languages[ J]. IEEE Design & Test of Computer, 2006,23 ( 5 ) : 375-386.
  • 8LI Peng, WANG Yu-xin, ZHANG Peng,et al. Memory partitioning and scheduling co-optimization in behavioral synthesis [ C ]//Proc of IEEE/ACM International Conference on Computer-Aided Design. 2012:488-495.
  • 9VlLLARREAL J, PARK A, NAJJAR W, et al. Designing modular hardware accelerators in C with ROCCC 2.0 [ C]//Proc of the Igth IEEE Annual International Symposium on Field-Programmable Custom Computing Machines. 2010 : 127-134.
  • 10CANIS A,CHOI J,GORT M,et al. LegUp 3.0[ CP/OL]. (2013-01- 21 ) [ 2013-02-20 ]. http ://legup. eecg. utoronto, ca.

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部