摘要
HPL-MxP基准测试程序被广泛用于衡量超算在混合精度计算下的计算能力。受制于该程序的并行实现算法,矩阵分块大小(NB)值的选取是一个需要兼顾矩阵乘效率和负载均衡的权衡问题。针对该问题,在鲲鹏920系统上进行优化研究,提出多重lookahead优化策略,采用小NB值进行矩阵分块实现更好的负载均衡,同时通过合并多轮尾矩阵更新提升等效NB值,实现负载均衡与高矩阵乘效率两者兼得的目标。为实现多重lookahead优化方案,重构Panel存储方式,并设计计算与通信细粒度流水线,扩展HPL-MxP源程序接口。在鲲鹏920多节点平台上的单双精度混合测试结果表明,HPL-MxP在多重lookahead优化下可有效解决NB值的权衡问题,且相较单重lookahead策略未产生明显额外开销。
The HPL-MxP benchmark program is widely used for measuring the computational power of supercomputers in mixed-precision computing.Subject to the parallel implementation algorithm of this program,the selection of the matrix Numerical Block(NB)value of the matrix block size is a tradeoff problem that must consider matrix multiplication efficiency and load balancing.To solve this problem,this paper presents an optimization study on the Kunpeng 920 system and proposes a multi-level lookahead optimization strategy:small NB values are used for matrix chunking to achieve better load balancing,and equivalent NB values are improved by merging multiple rounds of matrix multiplication updates to achieve load balancing and high matrix multiplication efficiency.To realize a multi-level lookahead optimization scheme,this study reconstructs the Panel storage mode,designs a fine-grained computing and communication pipeline,and expands the HPL-MxP source program interface.A single-double precision hybrid test on the Kunpeng 920 multi-node platform shows that HPL-MxP can effectively solve the trade-off problem of NB values under multi-level lookahead optimization and does not incur significant additional overhead compared with the single-level lookahead strategy.
作者
高昂
王银山
燕雯
宋昌成
王龙
姚二林
GAO Ang;WANG Yinshan;YAN Wen;SONG Changcheng;WANG Long;YAO Erlin(Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 101408,China;Huawei Technologies Co.,Ltd,Hangzhou 310052,Zhejiang,China)
出处
《计算机工程》
北大核心
2025年第8期354-363,共10页
Computer Engineering
基金
中国科学院青年创新促进基金(E345060)。