一种基于值预测和指令复用的按序处理器预执行机制被引量：1

A Pre-Execution Mechanism Based on Value Prediction and Instruction Reuse for In-Order Processors

下载PDF

导出

摘要为提高按序处理器的性能和能效性,本文提出一种基于值预测和指令复用的预执行机制(PVPIR).与传统预执行方法相比,PVPIR在预执行过程中能够预测失效Load指令的读数据并使用预测值执行与该Load指令数据相关的后续指令,从而对其中的长延时缓存失效提前发起存储访问以提高处理器性能.在退出预执行后,PVPIR通过复用有效的预执行结果来避免重复执行已正确完成的指令,以降低预执行的能耗开销.PVPIR实现了一种结合跨距(Stride)预测和AVD(Address-Value Delta)预测的值预测器,只记录发生过长延时缓存失效的Load指令信息,从而以较小的硬件开销取得较好的值预测效果.实验结果表明,与Runahead-AVD和iEA方法相比,PVPIR将性能分别提升7.5%和9.2%,能耗分别降低11.3%和4.9%,从而使能效性分别提高17.5%和12.9%. To improve the performance and energy-efficiency of in-order processors,this paper proposes a novel hardware mechanism,pre-execution based on value prediction and instruction reuse（PVPIR）.If a load instruction incurs a long-latency cache miss,PVPIR predicts its data value and uses the predicted value to pre-execute the following dependent instructions,including loads that incur long-latency misses,thus improving the performance.To reduce the energy consumption,PVPIR reuses the valid pre-executed results and thus avoids the re-execution of completed instructions.PVPIR also implements a hybrid value predictor which is a combination of stride prediction and address-value delta（AVD） prediction.The predictor only records history value for loads that have incurred long-latency misses,thus gaining good prediction results with little overhead.Experimental results demonstrate that PVPIR improves the performance by 7.5% and 9.2% while decreases the energy consumption by 11.3% and 4.9%,thus improving the energy-efficiency by 17.5% and 12.9%,as compared to Runahead-AVD and iEA,respectively.

作者党向磊王箫音佟冬陆俊林易江芳王克义

机构地区北京大学微处理器研究开发中心

出处《电子学报》 EI CAS CSCD 北大核心 2011年第12期2880-2883,共4页 Acta Electronica Sinica

基金国家863高技术研究发展计划(No.2006AA010202) 中国博士后科学基金资助项目(No.20110490208)

关键词预执行值预测指令复用访存延时包容 pre-execution value prediction instruction reuse load latency tolerance

分类号 TP302.7 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献17

1K Asanovic, et al. The landscape of parallel computing re- search: A view from Berkeley [ R ]. California, USA: Dept of EECS, University of California at Berkeley, 2006.
2P Kongetira,et al.Niagara:A 32-way multithreaded Sparc pro- cessor[ J]. IEEE Micro,2005,25(2) :21 - 29.
3王箫音,佟冬,党向磊,冯毅,程旭.一种高能效的面向单发射按序处理器的预执行机制[J].电子学报,2011,39(2):458-463. 被引量：2
4J Dundas, T Mudge. Improving data cache performance by pre- executing instructions under a cache miss[ A]. Int' 1 Conference on Supercomputing[ C]. Vienna, Austria: IEEE Computer Soci- ety, 1997.68 - 75.
5O Mutlu, et al. Runahead execution:An effective alternative to large instruction windows[ J]. IEFF, Micro, 2003,23(6) :20- 25.
6R D Barnes, et al. Tolerating cache-miss latency with multipass pipelines[J].IEEE Micro, 2006,26(1) :40 - 47.
7O Mutlu, et al. Address-value delta (AVD) prediction: A hard- ware technique for efficiently parallelizing dependent cache misses[ J ]. IEEE Transactions on Computers, 2006, 55 (12) : 1491 - 1508.
8Y Sazeides, J E Smith. The predictability of data values[ A]. Int' 1 Symposium on Microarchitecture[ C]. Los Alamitos, Cali- fornia, USA: IEEE Computer Society, 1997.248 - 258.
9O Mutlu, et al, On reusing the results of pre-executed instruc- tions in a runahead execution processor[J]. IEEE Computer Ar- chitecture Letters,2005,4( 1 ) :2- 5.
10A Sodani, G S Sohi. Dynamic instruction reuse [ A ]. Int' 1 Symposium on Computer Architecture[C]. Denver, Colorado, USA: IEEE Computer Society, 1997.194 - 205.

二级参考文献30

1朱德新,程旭,慎辉.UNICORE体系结构中动态转移预测机制的研究与设计[J].电子学报,2004,32(8):1351-1355. 被引量：3
2宋传华,程旭.基于北大众志-863 CPU系统芯片的多级TLB性能研究[J].电子学报,2005,33(2):363-366. 被引量：1
3赵雨来,李险峰,佟冬,程旭.An Energy-Efficient Instruction Scheduler Design with Two-Level Shelving and Adaptive Banking[J].Journal of Computer Science & Technology,2007,22(1):15-24. 被引量：3
4GUAN Xuetao LIU Shu CHENG Xu.Multiple-Interface Operating Systems Designed for Thin-Client Platforms[J].Chinese Journal of Electronics,2007,16(2):227-230. 被引量：1
5QU Ning,ZHAO Yulai,GUAN Xuetao,CHENG Xu.A Retargetable Full System Simulator for Thin Client Platform[J].Chinese Journal of Electronics,2007,16(3):401-405. 被引量：1
6K Asanovic, R Bodik, et al. The Landscape of Parallel Comput ing Research: A View from Berkeley[ R]. California, USA: De partment of Electrical Engineering and Computer Sciences, Uni versity of California at Berkeley, 2006.
7P Kongetira, et al. Niagara: A 32-way multithreaded Sparc pro cessor[ J]. IEEF. Micro,2005,25(2) :21 - 29.
8H Q Le, W J Starke, et al. IBM POWER6 microarchitecture [ J ]. IBM Journal of Research and Development, 2007,51 (6) : 639 - 662.
9O Mutlu, J Stark, C Wilkerson, Y N Patt. Runahead execution: An alternative to very large instruction windows for out-of-or der processors [ A ]. Int' 1 Symposium on High-Performance Computer Architecture [C ]. Anaheim, California, USA: IEEE Computer Society,2003.129 - 140.
10R D Barnes, S Ryoo, W W Hwu. "Flea-flicker" mulfipass pipelining: An alternative to the high-power out-of-order of- fense [ A ]. Int' I Symposium on Microarchiteeture [ C ]. Barcelona, Spain: IEEE Computer Society, 2005.319 - 330.

共引文献4

1张吉豫,刘先华,谭明星,程旭,丛京生.一种针对位操作密集应用的扩展指令自动选择方法[J].电子学报,2012,40(2):209-214. 被引量：3
2党向磊,王箫音,佟冬,陆俊林,程旭,王克义.面向按序执行处理器的预执行指导的数据预取方法[J].电子学报,2012,40(11):2145-2151. 被引量：1
3钟祺,王晶,王克义.面向多媒体SoC的存储体访存负载均衡划分方法[J].计算机辅助设计与图形学学报,2015,27(3):514-522. 被引量：1
4刘阳国,陆俊林,程旭,易江芳,佟冬,刘锋.面向异构多核系统芯片的高效动态带宽划分方法[J].计算机辅助设计与图形学学报,2016,28(10):1786-1795. 被引量：1

同被引文献6

1李笑天,郭德源,何虎.分支预测与值预测在VLIW处理器中的实现[J].微电子学与计算机,2015,32(1):54-59. 被引量：1
2Wei-WuHu Fu-XinZhang Zu-SongLi.Microarchitecture of the Godson-2 Processor[J].Journal of Computer Science & Technology,2005,20(2):243-249. 被引量：52
3冀蓉,周宏伟,张民选,陈怒兴.推测执行中值预测与指令重用技术的研究与分析[J].计算机工程与科学,2005,27(11):98-101. 被引量：1
4冀蓉,张民选,陈怒兴.值预测技术中基本值预测模型的功耗分析[J].计算机工程与科学,2006,28(4):126-129. 被引量：1
5隋兵才.基于真实历史反馈的自适应值预测器的设计与优化[J].计算机工程与科学,2021,43(2):274-279. 被引量：1
6杨智杰,王蕾,石伟,彭凌辉,王耀,徐炜遐.类脑处理器异步片上网络架构[J].计算机研究与发展,2023,60(1):17-29. 被引量：3

引证文献1

1黄立波,杨凌,杨乾明,马胜,王永文,隋兵才,沈立,徐炜遐.处理器值预测技术研究[J].电子学报,2023,51(12):3591-3618.

1钟治初,郭江鸿,张海峰.高效安全的无线传感器网络数据聚合方案[J].计算机应用,2013,33(A01):137-140. 被引量：2
2声音·观点[J].石油石化物资采购,2009(5):17-17.
3曹泓俊.长虹背投PC-10机芯工作原理及维修（中）[J].家电维修,2008(12):8-9.
4谢伦国,刘德峰.存储级并行与处理器微体系结构[J].计算机学报,2011,34(4):694-704.
5何伟,谭曙光,陈平.一种基于STRIDE威胁模型的风险评估方法[J].信息安全与通信保密,2009,31(10):47-49. 被引量：12
6王欣,王培东.移动数据库中基于Agent的缓存一致性策略[J].计算机技术与发展,2009,19(6):43-46. 被引量：1
7黄磊.基于STRIDE威胁模型的潜在威胁分析及对策探究——以网站群管理平台为例[J].莆田学院学报,2016,23(5):58-61. 被引量：2
8IEA：到2030全球光伏累计装机量有望达1721GW[J].电气时代,2017,0(3):14-14.
9于鹏,刘大有,贾海洋,杨博.基于免疫进化算法的Bayesian网结构学习[J].吉林大学学报（理学版）,2006,44(6):919-924. 被引量：2
10张向锋,李皎洁,王致杰.一种免疫进化算法及其收敛性的研究[J].上海电机学院学报,2009,12(1):10-14. 被引量：1

电子学报

2011年第12期

浏览历史

内容加载中请稍等...

一种基于值预测和指令复用的按序处理器预执行机制被引量：1

参考文献17

二级参考文献30

共引文献4

同被引文献6

引证文献1

相关作者

相关机构

相关主题

浏览历史

一种基于值预测和指令复用的按序处理器预执行机制 被引量：1

参考文献17

二级参考文献30

共引文献4

同被引文献6

引证文献1

相关作者

相关机构

相关主题

浏览历史

一种基于值预测和指令复用的按序处理器预执行机制被引量：1