期刊文献+

面向昇腾处理器的高性能同步原语自动插入方法

Automatic Insertion Method of High-Performance Synchronization Primitives for Ascend Processors
在线阅读 下载PDF
导出
摘要 指令级并行(instruction level parallism,ILP)是处理器体系结构研究的经典难题.以昇腾为代表的领域定制架构将更多的流水线细节暴露给上层软件,由编译器/程序员显式控制流水线之间的同步来优化ILP,但是流水线之间的物理同步资源是有限的,限制了ILP的提升.针对这一问题,提出一种面向昇腾处理器的高性能同步原语自动插入方法,通过引入“虚拟同步资源”的抽象将同步原语的插入和物理同步资源的选择进行解耦.首先提出了一种启发式算法在复杂的控制流图上进行虚拟同步原语的插入,随后通过虚拟同步原语合并等技术,将虚拟同步资源映射到有限数量的物理同步资源上,并同时在满足程序正确性与严苛硬件资源限制的前提下,根据指令间的偏序关系删除程序中冗余的同步原语.使用指令级与算子级基准测试程序在昇腾910A平台上的实验表明,该方法自动插入同步原语的程序在保证正确性的基础上,整体性能与专家程序员手动插入同步原语接近或持平. Instruction-level parallelism(ILP)is a classic challenge in the field of processor architecture.Domainspecific architectures,such as the Ascend processor,expose more pipeline details to upper-layer software,and compilers/programmers explicitly control the synchronization between pipelines to optimize ILP.However,the physical synchronization resources between pipelines are limited,which limits the improvement of ILP.To address this issue,a high-performance automatic synchronization primitive insertion method for the Ascend processor is proposed.By introducing the abstraction of“virtual synchronization resources”,this method decouples the insertion of synchronization primitives from the selection of physical synchronization resources.Firstly,a heuristic algorithm is proposed to insert virtual synchronization primitives in complex control flow graphs.Then,a significant number of virtual synchronization resources are mapped to an extremely limited number of physical synchronization resources through virtual synchronization primitive merging and other techniques.At the same time,redundant synchronization primitives in the program are removed based on the partial order relationship between instructions,while ensuring program correctness and stringent hardware resource constraints.Experiments on the Ascend 910A platform using instruction-level and operator-level benchmark programs show that the programs with automatically inserted synchronization primitives achieve performance comparable to or on par with those manually inserted by expert programmers,while ensuring correctness.
作者 李帅江 张馨元 赵家程 田行辉 石曦予 徐晓忻 崔慧敏 Li Shuaijiang;Zhang Xinyuan;Zhao Jiacheng;Tian Xinghui;Shi Xiyu;Xu Xiaoxin;Cui Huimin(Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190;University of Chinese Academy of Sciences,Beijing 100190;State Key Lab of Processors(Institute of Computing Technology,Chinese Academy of Sciences),Beijing 100190;Huawei Technologies Co.,Ltd.,Hangzhou 310051)
出处 《计算机研究与发展》 北大核心 2025年第8期1962-1978,共17页 Journal of Computer Research and Development
基金 国家重点研发计划项目(2022ZD0116316) 国家自然科学基金重点项目(62232015) 中国科学院计算技术研究所创新课题(E361010,E261110)。
关键词 昇腾处理器 同步原语 异构编程 领域定制架构 自动插入 Ascend processor synchronization primitives heterogeneous programming domain specific architecture automatic insertion
  • 相关文献

参考文献2

二级参考文献12

共引文献40

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部