期刊文献+
共找到3篇文章
< 1 >
每页显示 20 50 100
Asynchronous Approach to Memory Management in Sparse Multifrontal Methods on Multiprocessors 被引量:1
1
作者 alexander kalinkin Konstantin Arturov 《Applied Mathematics》 2013年第12期33-39,共7页
This research covers the Intel? Direct Sparse Solver for Clusters, the software that implements a direct method for solving the Ax = b equation with sparse symmetric matrix A on a cluster. This method, researched by I... This research covers the Intel? Direct Sparse Solver for Clusters, the software that implements a direct method for solving the Ax = b equation with sparse symmetric matrix A on a cluster. This method, researched by Intel, is based on Cholesky decomposition and could be considered as extension of functionality PARDISO from Intel??MKL. To achieve an efficient work balance on a large number of processes, the so-called “multifrontal” approach to Cholesky decomposition is implemented. This software implements parallelization that is based on nodes of the dependency tree and uses MPI, as well as parallelization inside a node of the tree that uses OpenMP directives. The article provides a high-level description of the algorithm to distribute the work between both computational nodes and cores within a single node, and between different computational nodes. A series of experiments shows that this implementation causes no growth of the computational time and decreases the amount of memory needed for the computations. 展开更多
关键词 Direct SOLVER Distributed Data OPENMP and MPI
在线阅读 下载PDF
Schur Complement Computations in Intel^(■) Math Kernel Library PARDISO 被引量:2
2
作者 alexander kalinkin Anton Anders Roman Anders 《Applied Mathematics》 2015年第2期304-311,共8页
This paper describes a method of calculating the Schur complement of a sparse positive definite matrix A. The main idea of this approach is to represent matrix A in the form of an elimination tree using a reordering a... This paper describes a method of calculating the Schur complement of a sparse positive definite matrix A. The main idea of this approach is to represent matrix A in the form of an elimination tree using a reordering algorithm like METIS and putting columns/rows for which the Schur complement is needed into the top node of the elimination tree. Any problem with a degenerate part of the initial matrix can be resolved with the help of iterative refinement. The proposed approach is close to the “multifrontal” one which was implemented by Ian Duff and others in 1980s. Schur complement computations described in this paper are available in Intel&reg;Math Kernel Library (Intel&reg;MKL). In this paper we present the algorithm for Schur complement computations, experiments that demonstrate a negligible increase in the number of elements in the factored matrix, and comparison with existing alternatives. 展开更多
关键词 Multifrontal Method Direct Method Sparse Linear System Schur Complement HPC Intel^(■) MKL
在线阅读 下载PDF
Intel^(■) Math Kernel Library PARDISO* forIntel^(■) Xeon Phi^(TM) Manycore Coprocessor
3
作者 alexander kalinkin Anton Anders Roman Anders 《Applied Mathematics》 2015年第8期1276-1281,共6页
The paper describes an efficient direct method to solve an equation Ax = b, where A is a sparse matrix, on the Intel&reg;Xeon PhiTM coprocessor. The main challenge for such a system is how to engage all available ... The paper describes an efficient direct method to solve an equation Ax = b, where A is a sparse matrix, on the Intel&reg;Xeon PhiTM coprocessor. The main challenge for such a system is how to engage all available threads (about 240) and how to reduce OpenMP* synchronization overhead, which is very expensive for hundreds of threads. The method consists of decomposing A into a product of lower-triangular, diagonal, and upper triangular matrices followed by solves of the resulting three subsystems. The main idea is based on the hybrid parallel algorithm used in the Intel&reg;Math Kernel Library Parallel Direct Sparse Solver for Clusters [1]. Our implementation exploits a static scheduling algorithm during the factorization step to reduce OpenMP synchronization overhead. To effectively engage all available threads, a three-level approach of parallelization is used. Furthermore, we demonstrate that our implementation can perform up to 100 times better on factorization step and up to 65 times better in terms of overall performance on the 240 threads of the Intel&reg;Xeon PhiTM coprocessor. 展开更多
关键词 Multifrontal Method Direct Method Sparse Linear System HPC OpenMP* Intel^(■) MKL Intel^(■) Xeon Phi^(TM) Coprocessor
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部