Sparse triangular solve(SpTRSV)is one of the most important level-2 kernels in sparse basic linear algebra subprograms(BLAS).Compared to another level-2 sparse BLAS kernel sparse matrix-vector multiplication(SpMV),SpT...Sparse triangular solve(SpTRSV)is one of the most important level-2 kernels in sparse basic linear algebra subprograms(BLAS).Compared to another level-2 sparse BLAS kernel sparse matrix-vector multiplication(SpMV),SpTRSV is in general more difficult to find high parallelism on many-core processors,such as GPUs.Nowadays,much work focuses on reducing dependencies and synchronizations in the level-set and Sync-free algorithms for SpTRSV.However,there is less work that can make good use of sparse spatial structure for SpTRSV on GPUs.In this paper,we propose a tiled algorithm called TileSpTRSV for optimizing SpTRSV on GPUs through exploiting 2D spatial structure of sparse matrices.We design two algorithm implementations,i.e.,TileSpTRSV_level-set and TileSpTRSV_sync-free,for TileSpTRSV on top of level-set and Sync-free algorithms,respectively.By testing 16 representative matrices on a latest NVIDIA GPU,the experimental results show that TileSpTRSV_level-set gives on average 5.29×(up to 38.10×),5.33×(up to 21.32×)and 2.62×(up to 12.87×)speedups over cuSPARSE,Sync-free and Recblock algorithms on the 16 representative matrices,respectively.展开更多
基金supported by the National Natural Science Foundation of China under Grant No.61972415.
文摘Sparse triangular solve(SpTRSV)is one of the most important level-2 kernels in sparse basic linear algebra subprograms(BLAS).Compared to another level-2 sparse BLAS kernel sparse matrix-vector multiplication(SpMV),SpTRSV is in general more difficult to find high parallelism on many-core processors,such as GPUs.Nowadays,much work focuses on reducing dependencies and synchronizations in the level-set and Sync-free algorithms for SpTRSV.However,there is less work that can make good use of sparse spatial structure for SpTRSV on GPUs.In this paper,we propose a tiled algorithm called TileSpTRSV for optimizing SpTRSV on GPUs through exploiting 2D spatial structure of sparse matrices.We design two algorithm implementations,i.e.,TileSpTRSV_level-set and TileSpTRSV_sync-free,for TileSpTRSV on top of level-set and Sync-free algorithms,respectively.By testing 16 representative matrices on a latest NVIDIA GPU,the experimental results show that TileSpTRSV_level-set gives on average 5.29×(up to 38.10×),5.33×(up to 21.32×)and 2.62×(up to 12.87×)speedups over cuSPARSE,Sync-free and Recblock algorithms on the 16 representative matrices,respectively.