期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
Optimization of the ParILUT-GPU algorithm
1
作者 Shaofeng Yang Zhi Li +2 位作者 Yunting Wang Xin He Guangming Tan 《CCF Transactions on High Performance Computing》 2026年第2期196-209,共14页
We have optimized the parallel threshold ILU algorithm(ParILUT)for GPUs.The optimizations are for three building blocks:candidate search and ILU residual computation,adding and removing elements,and threshold selectio... We have optimized the parallel threshold ILU algorithm(ParILUT)for GPUs.The optimizations are for three building blocks:candidate search and ILU residual computation,adding and removing elements,and threshold selection.Firstly,we fuse candidate search and ILU residual computation by modifying the ParILUT algorithm and extending the register-aware SpGEMM algorithm to calculate it.At the same time,we developed a GPU bin search algorithm to make the register-aware SpGEMM algorithm perform better in ParILUT.Secondly,we adopt a warp-row-parallel approach to add elements to new L and U and remove elements from candidates instead of the thread-row-parallel approach.And used the efficient GPU instructions to locate the positions of elements.Thirdly,we proposed a balanced classification tree in the threshold selection to balance the buckets’data,when a large number of elements with the same value.Finally,we experimented with the performance of each optimization and the whole ParILUT.And verified the correctness of the optimized ParILUT.The result indicates that the optimized ParILUT average speedup is 4.03 times over the original version,and the speedup increases with the amount of fill-in. 展开更多
关键词 Incomplete factorization preconditioners parilut Parallel threshold ILU GPU
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部