摘要
SPMD(Single Program Multiple Data)是高性能领域的主要工作模式之一,该模式下邻近核心执行相同的程序块,但根据处理数据或控制流的差异,临近核心的指令流并不完全相同.L1 ICache(Instruction Cache)共享技术通过将邻近核心的L1 ICache共享,能有效利用众核处理器SPMD工作模式的特点,同时能缓解片上资源紧张的问题.但共享结构会带来访问冲突,对性能有不利影响.本文基于排队网络对共享ICache的访问冲突进行了理论分析,该理论分析依据核心对共享ICache体的访问特性进行建模,避免了直接抽象物理节点导致的模型访存特性模糊问题.根据理论推导的指令缓存性能损失原因,本文设计了面向共享L1 ICache的低访问冲突XOR散列函数.函数的设计综合考虑搜索了代价和工程实现复杂性,在保证散列线性空间随机散列能力的前提下,对附加延迟、功耗开销进行控制.该散列函数基于异或操作,通过调整ICache排队网络模型的节点转换概率,降低了共享L1 ICache的访问冲突.实验结果表明,在指令缓存总容量为32 KB的四核心簇上,使用XOR散列的共享L1 ICache结构较私有L1 ICache结构性能平均优化11%,较使用低位交错策略的共享L1 ICache结构性能平均优化8%,较使用面向跨步访存散列策略的共享L1 ICache结构性能平均优化3.2%.
Single program multiple data(SPMD)is a main execution mode in the high-performance computing domain.While processing the same program segment,each adjacent core’s execution varies depends on the data it processes and its own control flow.Many-core processor has been widely used in high performance computing domain for its advantages in high peak performance,high calculated density and high energy efficiency.While ensuring the performance,many-core processor has put forward higher requirements on power and area cost for it incorporates more cores and larger scale logic into a single chip.The SPMD execution mode can be effectively utilized by sharing the L1 instruction cache across adjacent cores.The strain on the on-chip resources is also alleviated by using the shared instruction cache.However,the sharing structure has the negative impact on performance,which is caused by access conflicts in the shared instruction cache.In this paper,we first give a theoretical analysis on access conflicts of the shared instruction cache based on the queuing network.Rather than physically corresponding banks of the instruction cache to queuing nodes,we model the shared instruction cache according to the cores’instruction fetch pattern.Queueing network reflects the steady-state performance of system when time tends to infinity.However,the access frequencies on each instruction cache bank tend to be the same in such a long period of time.In other words,queueing network on physical cache banks may not precisely reflect the intensive conflicts on each bank.The theoretical analysis can achieve more accurate characteristics of access in shared instruction cache by utilizing the model on cores’instruction fetch pattern.The model of shared instruction cache given in this paper is later verified by simulation results.Based on the causes of performance loss in the theoretical analysis,we then design an XOR-hash function to minimize access conflicts in the shared L1 instruction cache.In the design of XOR-hash function,we accelerate the search of the hash function by leveraging null space of the hash matrix to eliminate the hash functions with the same hash effects.Under the premise of getting better randomization hash effects in the linear space of hash function,search cost and hardware implementation complexity are also taken into consideration by controlling the timing and power overhead.To control the propagation delay of the hash function,the maximum stages of the XOR-gate are restricted.We also limit the maximum load on each driver feeding to the XOR-gate to control the power overhead.By adjusting transformation probabilities of the instruction cache queueing network,this XOR-based hash function can effectively reduce bank conflicts in the shared L1 instruction cache.Experimental results show that in a 4-core cluster with 32 KB instruction cache in all,the XOR hash-indexing shared instruction cache yields an 11%performance improvement compared to the private instruction cache and yields an 8%performance improvement compared to the set-interleaving shared instruction cache.The XOR hash-indexing shared instruction cache yields an 3.2%performance improvement compared to the shared instruction cache using the hash function for stride-based access.
作者
刘骁
唐勇
郑方
丁亚军
LIU Xiao;TANG Yong;ZHENG Fang;DING Ya-Jun(Jiangnan Institute of Computing Technology,Wuxi,Jiangsu 214083)
出处
《计算机学报》
EI
CSCD
北大核心
2019年第11期2499-2511,共13页
Chinese Journal of Computers
基金
国家重点研发计划(2016YFB0200500)资助~~