The parallel multisection method for solving algebraic eigenproblem has been presented in recent years with the development of the parallel computers, but all the research work is limited in standard eigenproblems of ...The parallel multisection method for solving algebraic eigenproblem has been presented in recent years with the development of the parallel computers, but all the research work is limited in standard eigenproblems of symmetric tridiagonal matrix. The multisection method for solving the generalized eigenproblem applied significantly in many science and engineering domains has not been studied. The parallel region preserving multisection method (PRM for short) for solving generalized eigenproblems of large sparse and real symmetric matrix is presented in this paper. This method not only retains the advantages of the conventional determinant search method (DS for short), but also overcomes its disadvantages such as leaking roots and disconvergence. We have tested the method on the YH 1 vector computer, and compared it with the parallel region preserving determinant search method the parallel region preserving bisection method (PRB for short). The numerical results show that PRM has a higher speed up, for instance, it attains the speed up of 7.7 when the scale of the problem is 2 114 and the eigenpair found is 3, and PRM is superior to PRB when the scale of the problem is large.展开更多
The parallel multisection method for solving algebraic eigenproblem has been presented in recent years with the developing of the parallel computers, but all the research work is limited in standard eigenproblem of sy...The parallel multisection method for solving algebraic eigenproblem has been presented in recent years with the developing of the parallel computers, but all the research work is limited in standard eigenproblem of symmetric tridiagonal matrix. The multisection method for solving generalized eigenproblem applied significantly in many secience and engineering domains has not been studied. The parallel region--preserving multisection method (PRM for shotr) for solving generalized eigenproblem of large sparse real symmetric matrix is presented in this paper. This method not only retains the advantages of the conventional determinant search method (DS for short), but also overcomes its disadvantages such as leaking roots and disconvergence. We tested the method on the YH--1 vector computer,and compared with the parallel region-preserving determinant search method (parallel region--preserving bisection method)(PRB for short). The numerical results show that PRM has a higher speed-up, for instance it attains the speed-up of 7.7 when the scale of the problem is 2114 and the eigenpair found is 3; and PRM is superior to PRB when scale of the problem is large.展开更多
Accelerating the eigensolver on GPUs is getting more and more attention due to its ubiquitous usage in scientific and engineering fields.However,it is very challenging to achieve high performance on eigensolvers becau...Accelerating the eigensolver on GPUs is getting more and more attention due to its ubiquitous usage in scientific and engineering fields.However,it is very challenging to achieve high performance on eigensolvers because of the intricate computational patterns which cause inefficient memory access and workload imbalance on GPUs.In this work,we propose a series of optimizations for generalized dense symmetric eigenvalue problems from both the system and operator perspectives on AMD GPUs.Firstly,we adjust the workload assignments between CPUs and GPUs and find the computational performance balance between different levels of computation.Besides,we propose a multi-level pre-aggregation strategy for symmetric matrix-vector multiplication(SYMV)and general matrix-vector multiplication(GEMV)operators to tackle the performance issue caused by lacking hardware support for atomic operation.Furthermore,we optimize Cholesky decomposition and SYR2K by adopting a better overlapping method and utilizing symmetry to reduce computation.Experiments on AMD MI60 GPUs show that our optimized eigensolver outperforms the previous state-of-the-art with roughly 1.8x–3.8x speedups.展开更多
文摘The parallel multisection method for solving algebraic eigenproblem has been presented in recent years with the development of the parallel computers, but all the research work is limited in standard eigenproblems of symmetric tridiagonal matrix. The multisection method for solving the generalized eigenproblem applied significantly in many science and engineering domains has not been studied. The parallel region preserving multisection method (PRM for short) for solving generalized eigenproblems of large sparse and real symmetric matrix is presented in this paper. This method not only retains the advantages of the conventional determinant search method (DS for short), but also overcomes its disadvantages such as leaking roots and disconvergence. We have tested the method on the YH 1 vector computer, and compared it with the parallel region preserving determinant search method the parallel region preserving bisection method (PRB for short). The numerical results show that PRM has a higher speed up, for instance, it attains the speed up of 7.7 when the scale of the problem is 2 114 and the eigenpair found is 3, and PRM is superior to PRB when the scale of the problem is large.
文摘The parallel multisection method for solving algebraic eigenproblem has been presented in recent years with the developing of the parallel computers, but all the research work is limited in standard eigenproblem of symmetric tridiagonal matrix. The multisection method for solving generalized eigenproblem applied significantly in many secience and engineering domains has not been studied. The parallel region--preserving multisection method (PRM for shotr) for solving generalized eigenproblem of large sparse real symmetric matrix is presented in this paper. This method not only retains the advantages of the conventional determinant search method (DS for short), but also overcomes its disadvantages such as leaking roots and disconvergence. We tested the method on the YH--1 vector computer,and compared with the parallel region-preserving determinant search method (parallel region--preserving bisection method)(PRB for short). The numerical results show that PRM has a higher speed-up, for instance it attains the speed-up of 7.7 when the scale of the problem is 2114 and the eigenpair found is 3; and PRM is superior to PRB when scale of the problem is large.
基金supported by the National Key Research and Development Program of China under Grant No.2021YFB0300203the National Natural Science Foundation of China under Grant No.12131005.
文摘Accelerating the eigensolver on GPUs is getting more and more attention due to its ubiquitous usage in scientific and engineering fields.However,it is very challenging to achieve high performance on eigensolvers because of the intricate computational patterns which cause inefficient memory access and workload imbalance on GPUs.In this work,we propose a series of optimizations for generalized dense symmetric eigenvalue problems from both the system and operator perspectives on AMD GPUs.Firstly,we adjust the workload assignments between CPUs and GPUs and find the computational performance balance between different levels of computation.Besides,we propose a multi-level pre-aggregation strategy for symmetric matrix-vector multiplication(SYMV)and general matrix-vector multiplication(GEMV)operators to tackle the performance issue caused by lacking hardware support for atomic operation.Furthermore,we optimize Cholesky decomposition and SYR2K by adopting a better overlapping method and utilizing symmetry to reduce computation.Experiments on AMD MI60 GPUs show that our optimized eigensolver outperforms the previous state-of-the-art with roughly 1.8x–3.8x speedups.