共享指令缓存XOR散列索引的研究与设计被引量：2

Research and Design of XOR-Hash Indexing for Shared Instruction Cache

下载PDF

导出

摘要 SPMD(Single Program Multiple Data)是高性能领域的主要工作模式之一,该模式下邻近核心执行相同的程序块,但根据处理数据或控制流的差异,临近核心的指令流并不完全相同.L1 ICache(Instruction Cache)共享技术通过将邻近核心的L1 ICache共享,能有效利用众核处理器SPMD工作模式的特点,同时能缓解片上资源紧张的问题.但共享结构会带来访问冲突,对性能有不利影响.本文基于排队网络对共享ICache的访问冲突进行了理论分析,该理论分析依据核心对共享ICache体的访问特性进行建模,避免了直接抽象物理节点导致的模型访存特性模糊问题.根据理论推导的指令缓存性能损失原因,本文设计了面向共享L1 ICache的低访问冲突XOR散列函数.函数的设计综合考虑搜索了代价和工程实现复杂性,在保证散列线性空间随机散列能力的前提下,对附加延迟、功耗开销进行控制.该散列函数基于异或操作,通过调整ICache排队网络模型的节点转换概率,降低了共享L1 ICache的访问冲突.实验结果表明,在指令缓存总容量为32 KB的四核心簇上,使用XOR散列的共享L1 ICache结构较私有L1 ICache结构性能平均优化11%,较使用低位交错策略的共享L1 ICache结构性能平均优化8%,较使用面向跨步访存散列策略的共享L1 ICache结构性能平均优化3.2%. Single program multiple data(SPMD)is a main execution mode in the high-performance computing domain.While processing the same program segment,each adjacent core’s execution varies depends on the data it processes and its own control flow.Many-core processor has been widely used in high performance computing domain for its advantages in high peak performance,high calculated density and high energy efficiency.While ensuring the performance,many-core processor has put forward higher requirements on power and area cost for it incorporates more cores and larger scale logic into a single chip.The SPMD execution mode can be effectively utilized by sharing the L1 instruction cache across adjacent cores.The strain on the on-chip resources is also alleviated by using the shared instruction cache.However,the sharing structure has the negative impact on performance,which is caused by access conflicts in the shared instruction cache.In this paper,we first give a theoretical analysis on access conflicts of the shared instruction cache based on the queuing network.Rather than physically corresponding banks of the instruction cache to queuing nodes,we model the shared instruction cache according to the cores’instruction fetch pattern.Queueing network reflects the steady-state performance of system when time tends to infinity.However,the access frequencies on each instruction cache bank tend to be the same in such a long period of time.In other words,queueing network on physical cache banks may not precisely reflect the intensive conflicts on each bank.The theoretical analysis can achieve more accurate characteristics of access in shared instruction cache by utilizing the model on cores’instruction fetch pattern.The model of shared instruction cache given in this paper is later verified by simulation results.Based on the causes of performance loss in the theoretical analysis,we then design an XOR-hash function to minimize access conflicts in the shared L1 instruction cache.In the design of XOR-hash function,we accelerate the search of the hash function by leveraging null space of the hash matrix to eliminate the hash functions with the same hash effects.Under the premise of getting better randomization hash effects in the linear space of hash function,search cost and hardware implementation complexity are also taken into consideration by controlling the timing and power overhead.To control the propagation delay of the hash function,the maximum stages of the XOR-gate are restricted.We also limit the maximum load on each driver feeding to the XOR-gate to control the power overhead.By adjusting transformation probabilities of the instruction cache queueing network,this XOR-based hash function can effectively reduce bank conflicts in the shared L1 instruction cache.Experimental results show that in a 4-core cluster with 32 KB instruction cache in all,the XOR hash-indexing shared instruction cache yields an 11%performance improvement compared to the private instruction cache and yields an 8%performance improvement compared to the set-interleaving shared instruction cache.The XOR hash-indexing shared instruction cache yields an 3.2%performance improvement compared to the shared instruction cache using the hash function for stride-based access.

作者刘骁唐勇郑方丁亚军 LIU Xiao;TANG Yong;ZHENG Fang;DING Ya-Jun(Jiangnan Institute of Computing Technology,Wuxi,Jiangsu 214083)

机构地区江南计算技术研究所

出处《计算机学报》 EI CSCD 北大核心 2019年第11期2499-2511,共13页 Chinese Journal of Computers

基金国家重点研发计划(2016YFB0200500)资助~~

关键词单程序多数据流模型指令缓存众核处理器排队网络模型 XOR散列函数 single program multiple data instruction cache many-core processor queueing network XOR-hash function

分类号 TP302 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献3

1王国澎,胡向东,尹飞,朱英.BTB索引散列算法的研究与设计[J].计算机研究与发展,2014,51(9):2003-2011. 被引量：4
2高珂,陈荔城,范东睿,刘志勇.多核系统共享内存资源分配和管理研究[J].计算机学报,2015,38(5):1020-1034. 被引量：14
3郑方,张昆,邬贵明,高红光,唐勇,吕晖,过锋,李宏亮,谢向辉,陈左宁.面向高性能计算的众核处理器结构级高能效技术[J].计算机学报,2014,37(10):2176-2186. 被引量：17

二级参考文献113

1易会战,杨学军.高性能微处理器的微体系结构能量有效性[J].计算机学报,2004,27(7):874-880. 被引量：2
2高庆狮,刘志勇.一个基于孙子定理的素数存储系统方案[J].计算机研究与发展,1995,32(5):1-7. 被引量：3
3黄海林,范东睿,许彤,唐志敏.嵌入式处理器中访存部件的低功耗设计研究[J].计算机学报,2006,29(5):815-821. 被引量：11
4张戈,胡伟武.高性能通用处理器中的漏电功耗优化[J].计算机学报,2006,29(10):1764-1771. 被引量：2
5Lee J, Smith A. Branch prediction strategies and branch target buffer design[J]. Computer, 1984, 17(1): 6 22.
6Perleberg C, Smith A. Branch target buffer design and optimization[J]. IEEE Trans on Computers, 1993, 42(4): 396-412.
7Chang P Y, Hao E, Patt Y N. Predicting indirect jumps using a target cache [C] //Proc of the 24th Annual Int Conf on Computer Architecture (ICCA'97). New York: ACM, 1997:274-283.
8Li T, Bhargava R, John L K. Adapting branch target buffer to improve the target predictability of java code [J]. ACM Trans on Architecture and Code Optimization, 2005, 2(2): 109-130.
9Kaeli D R, Emma P G. Branch history table prediction of moving target branches due to subroutine returns [C] //Proc of the 18th Annual Int Syrup on Computer Architecture (ISCA'91). New York: ACM, 1991, 34-42.
10Webb C F. Subroutine call/return stack [J]. IBM Technical Disclosure Bulletin, 1988, 30(11): 221-225.

共引文献32

1王亚茹,王鹏,王德志.基于MPI的多核并行模式的性能测试与分析[J].成都信息工程大学学报,2018,33(6):617-623. 被引量：4
2石嵩,李宏亮,朱巍.阵列众核处理器上的高效归并排序算法[J].计算机研究与发展,2016,53(2):362-373. 被引量：6
3石嵩,宁永波,李宏亮,郑方.阵列众核结构上的一种多层分区Hash连接算法[J].计算机科学,2016,43(3):18-22.
4熊振亚,林正浩,任浩琪.基于跳转轨迹的分支目标缓冲研究[J].计算机科学,2017,44(3):195-201.
5刘骁,高红光,陈芳园,丁亚军.一种高能效的结构不对称指令缓存[J].计算机工程与科学,2017,39(3):443-450.
6牟刚.手机芯片带宽性能评测手段的分析和优化[J].微型机与应用,2017,36(9):81-84.
7洪文杰,李肯立,全哲,阳王东,李克勤,郝子宇,谢向辉.面向神威·太湖之光的PETSc可扩展异构并行算法及其性能优化[J].计算机学报,2017,40(9):2057-2069. 被引量：16
8赵瑞祥,郑凯,刘垚,王肃,刘艳,沈焕学,周谦豪.基于申威众核处理器的混合并行遗传算法[J].计算机应用,2017,37(9):2518-2523. 被引量：3
9吴成伟,熊燕玲,吴静,彭慧珺.基于商密二维码的农产品溯源系统微信平台设计[J].现代农业科技,2017(17):287-289. 被引量：2
10孟德龙,文敏华,韦建文,林新华.神威太湖之光上OpenFOAM的移植与优化[J].计算机科学,2017,44(10):64-70. 被引量：13

同被引文献25

1苏巧平,刘原,孙文娟.基于Multisim的三相桥式整流电路实验故障仿真[J].赤峰学院学报（自然科学版）,2018,34(5):71-72. 被引量：3
2张福民,叶子静,李占凯,唐圣学,马晨阳,姜含.基于遗传-禁忌搜索算法的微网群能量管理[J].高电压技术,2018,44(7):2323-2330. 被引量：11
3谭子兵,谭爱国.一种互感器异常数据识别及修复新方案[J].湖北民族学院学报（自然科学版）,2018,36(2):208-213. 被引量：3
4王力,倪俊,吕静,钟建伟.基于混合算法用于配电网多重故障的自愈研究[J].湖北民族学院学报（自然科学版）,2018,36(3):356-360. 被引量：2
5黄国兵,马佳宾,贾荣兴,王凯.基于短信动态口令的身份认证管理服务[J].电子测量技术,2019,42(2):108-111. 被引量：5
6曹守启,孙青,曹莉凌.动态ID多因素远程用户身份认证方案的改进[J].计算机工程与科学,2019,41(4):633-640. 被引量：6
7王琦,刘磊,高太长,胡帅,曾庆伟.基于ARTS的傅里叶红外高光谱计算模型研究及其影响因素分析[J].光谱学与光谱分析,2019,39(6):1711-1716. 被引量：3
8陈佳媛,杨丹.基于通用相关的无线传感网络节点快速定位算法[J].通信技术,2019,52(6):1353-1358. 被引量：3
9王天荆,李秀琴,白光伟,沈航.无线传感器网络中基于自适应网格的多目标定位算法[J].通信学报,2019,40(7):197-207. 被引量：18
10陈文贺,李彩林,袁斌,江晓斌.有效的绝缘子自爆缺陷定位方法[J].计算机工程与设计,2019,40(8):2346-2352. 被引量：13

引证文献2

1于娟.高效共享系统内存文件缺陷定位算法[J].内蒙古民族大学学报（自然科学版）,2021,36(2):114-119.
2吴龙腾,王勇超,曾恺,黄伟杰,郭文鑫.基于动态口令的配电网调度系统身份认证方案设计[J].电子设计工程,2022,30(20):132-136. 被引量：1

二级引证文献1

1陈思宇,孙勇,荆江平,滕俊,汪波.基于安全访问能力框架的配电网分层算力网络协同调度关键技术研究[J].自动化与仪器仪表,2025(1):232-236. 被引量：2

1陈亮.现代医学中数字化影像技术的应用分析[J].今日健康,2016,15(8):2-2.
2彭祉竣,万兴财,吴炜轩.电子信息工程技术的发展应用研究[J].中国航班,2019(13):169-169.
3李永钰.浅谈如何让初中语文课堂轻松有趣[J].课程教育研究,2019,0(38):147-147.
4陶健钊.探讨心血管内科住院患者医院感染的临床特征[J].医学信息（医学与计算机应用）,2014,0(11):498-498.
5刘燕.理想变压器电压与匝数关系的理论推导[J].物理通报,2019,0(S2):36-38. 被引量：1
6投稿小知识[J].医学信息（医学与计算机应用）,2014,0(13):205-205.
7魏梓瑶.抗阻训练在羽毛球蹬跨步教学中的作用与应用[J].体育科技文献通报,2019,27(11):118-119. 被引量：3
8凌婷婷,丁伯伦,朱翼隽.具有两个备用服务台的异步限制休假排队[J].太原师范学院学报（自然科学版）,2018,17(4):6-11.
9熊菊.新形势下国有企业基层党建工作方法创新研究[J].现代企业文化,2019,0(26):51-51.
10黄俊.基于绿色施工管理理念的建筑施工管理创新研究[J].门窗,2019,0(10):23-24. 被引量：2

计算机学报

2019年第11期

浏览历史

内容加载中请稍等...

共享指令缓存XOR散列索引的研究与设计被引量：2

参考文献3

二级参考文献113

共引文献32

同被引文献25

引证文献2

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

共享指令缓存XOR散列索引的研究与设计 被引量：2

参考文献3

二级参考文献113

共引文献32

同被引文献25

引证文献2

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

共享指令缓存XOR散列索引的研究与设计被引量：2