摘要
循环不变量外提算法是一种针对程序中循环结构的常用编译优化算法,其通过将循环体中的不变计算移动到循环外部来减少重复计算的开销,从而提高程序运行的速度。但在LLVM编译器中,传统的循环不变量外提算法会将全部循环不变量外提到循环体外部,当循环不变量达到一定数量时会导致寄存器溢出,在循环内引入额外的访存代价,对循环产生负优化效果。针对上述问题,在传统LLVM循环不变量外提算法的基础上,引入了一种循环代价分析算法,通过计算循环不变量在循环体中的运行代价和外提操作可能带来的溢出代价,评估其外提可能带来的收益,只对产生正收益的循环不变量进行外提,在有效减少循环体内重复计算的同时,规避引入额外开销的风险。在国产申威831处理器平台,使用典型用例进行优化效果测评,在千万级循环下,相较于传统循环不变量优化算法,提出的新优化算法具有17%以上的性能提升;使用SPEC CPU2017基准测试集(SPECspeed 2017 Interger套件)、Perl解释器DKbench基准测试集、Python解释器pyperformance基准测试集进行综合优化效果测评,结果表明,相较于传统循环不变量优化算法,提出的新优化算法分别具有0.4%,0.63%和1%的性能提升。
Loop-invariant code motion(LICM)is a commonly used compilation optimization algorithm for loop structures in programs.By moving the invariant calculations in the loop body to outside the loop,the algorithm reduces the overhead of duplicate calculations,thus improving program execution speed.However,in LLVM compiler,the traditional LICM algorithm hoists all loop-invariants outside the loop body,which will lead to register overflow when the number of loop-invariant reaches a certain level.It will introduce additional memory access cost in the loop,resulting in a negative optimization effect on the loop.To address this issue,a loop cost analysis algorithm is introduced based on the traditional LLVM LICM algorithm.This algorithm evaluates the running cost of loop-invariant code inside the loop and the overflow cost that may be caused by moving the code outside the loop,and assesses the benefits of moving the code outside the loop.Only the loop-invariant code that produces positive benefits is moved outside the loop,effectively reducing the overhead of duplicate calculations in the loop while avoiding the risk of introducing additional costs.The proposed new optimization algorithm achieves more than 17%performance improvement compared to the traditional LICM algorithm in typical use cases for the domestic SW831 processor platform under millions of loops.Comprehensive optimization effect evaluations are conducted using the SPEC CPU 2017 benchmark test suite(SPECspeed2017 Integer Suite),Perl interpreter DKbench benchmark test suite,and Python interpreter pyperformance benchmark test suite.The results show that compared with the traditional LICM algorithm,the proposed algorithm has 0.4%,0.63%and 1%performance improvement respectively.
作者
姜军
翟彦河
曾志恒
顾轶超
黄亮明
JIANG Jun;ZHAI Yanhe;ZENG Zhiheng;GU Yichao;HUANG Liangming(Wuxi Institute of Advanced Technology,Wuxi,Jiangsu 214122,China)
出处
《计算机科学》
北大核心
2025年第6期44-51,共8页
Computer Science
基金
科技部重点支持项目(GG20210701)。