摘要
针对大型实对称正定矩阵的Cholesky分解问题,给出其在图形处理器(GPU)上的具体实现。详细分析了Volkov计算Cholesky分解的混合并行算法,并在此基础上依据自身计算机的CPU以及GPU的计算性能,给出一种更为合理的三阶段混合调度方案,进一步减少CPU的空闲时间以及避免GPU空闲情况的出现。数值实验表明,当矩阵阶数超过7000时,新的混合调度算法相比标准的MKL算法获得了超过5倍的加速比,同时对比原Volkov混合算法获得了显著的性能提升。
A concrete implementation of Cholesky factorisation on graphic processing unit (GPU) for large real symmetric positive definite matrix is described in this article. We analyse the hybrid parallel algorithm presented by Volkov for computing the Cholesky factorisation in detail. On that basis, and according to the computational performances of CPU and GPU on our own computers, we present a more reasonable hy- brid three-phase scheduling strategy,which further reduces the idle time of CPU and avoids the occurrence of GPU in idle status. Numerical experiment shows that the new hybrid scheduling algorithm achieves a speedup of more than 5 times compared with the standard MKL algorithm when the order of a matrix is larger than 7000,and it also observably outperforms the performance of original Volkov's hybrid algorithm.
出处
《计算机应用与软件》
CSCD
2016年第9期284-287,305,共5页
Computer Applications and Software
基金
湖北省自然科学基金重点项目(ZRZ2014000286)