AM  Vol.6 No.8 , July 2015
Intel® Math Kernel Library PARDISO* for Intel® Xeon PhiTM Manycore Coprocessor
ABSTRACT
The paper describes an efficient direct method to solve an equation Ax = b, where A is a sparse matrix, on the Intel® Xeon PhiTM coprocessor. The main challenge for such a system is how to engage all available threads (about 240) and how to reduce OpenMP* synchronization overhead, which is very expensive for hundreds of threads. The method consists of decomposing A into a product of lower-triangular, diagonal, and upper triangular matrices followed by solves of the resulting three subsystems. The main idea is based on the hybrid parallel algorithm used in the Intel® Math Kernel Library Parallel Direct Sparse Solver for Clusters [1]. Our implementation exploits a static scheduling algorithm during the factorization step to reduce OpenMP synchronization overhead. To effectively engage all available threads, a three-level approach of parallelization is used. Furthermore, we demonstrate that our implementation can perform up to 100 times better on factorization step and up to 65 times better in terms of overall performance on the 240 threads of the Intel® Xeon PhiTM coprocessor.

Cite this paper
Kalinkin, A. , Anders, A. and Anders, R. (2015) Intel® Math Kernel Library PARDISO* for Intel® Xeon PhiTM Manycore Coprocessor. Applied Mathematics, 6, 1276-1281. doi: 10.4236/am.2015.68121.
References
[1]   Intel Math Kernel Library (Intel MKL).
https://software.intel.com/en-us/intel-mkl

[2]   OpenMP.
http://openmp.org/wp/

[3]   Duff, I.S. and Reid, J.K. (1983) The Multifrontal Solution of Indefinite Sparse Symmetric Linear. ACM Transactions on Mathematical Software, 9, 302-325.
http://dx.doi.org/10.1145/356044.356047

[4]   Liu, W.H.D. (1992) The Multifronal Method for Sparse Matrix Solution: Theory and Practice. SIAM Review, 34, 82-109.
http://dx.doi.org/10.1137/1034004

[5]   Karypis, G. and Kumar, V. (1996) Parallel Multilevel Graph Partitioning. Processing of 10th International Parallel Symposium, 314-319.

[6]   Karypis, G. and Kumar, V. (1998) A Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering. Journal of Parallel and Distributed Computing, 48, 71-85.
http://dx.doi.org/10.1006/jpdc.1997.1403

[7]   Amestoy, P.R., Duff, I.S., Pralet, S. and Voemel, C. (2003) Adapting a Parallel Sparse Direct Solver to SMP Architectures. Parallel Computing, 29, 1645-1668.
http://dx.doi.org/10.1016/j.parco.2003.05.010

[8]   Kalinkin, A. (2013) Sparse Linear Algebra Support in Intel Math Kernel Library, Sparse Linear Algebra Solvers for High Performance Computing Workshop, Scarman House, University of Warwick, 8-9 July 2013.

[9]   METIS.
http://glaros.dtc.umn.edu/gkhome/views/metis

[10]   Kalinkin, A. (2013) Intel Direct Sparse Solver for Clusters, a Research Project for Solving Large Sparse Systems of Linear Algebraic Equations on Clusters. Sparse Days Meeting 2013 at CERFACS, Toulouse, 17-18 June 2013.

[11]   Kalinkin, A., Anders, A. and Anders, R. (2014) Intel Math Kernel Library Parallel Direct Sparse Solver for Clusters, EAGE Workshop on High Performance Computing for Upstream, September 2014.

[12]   Davis, T.A. and Hu, Y. (2011) The University of Florida Sparse Matrix Collection. ACM Transactions on Mathematical Software, 38, 1:1-1:25.
http://www.cise.ufl.edu/research/sparse/matrices

[13]   Kalinkin, A., Anders, A. and Anders, R. (2015) Schur Complement Computations in Intel® Math Kernel Library PARDISO. Applied Mathematics, 6, 304-311.
http://dx.doi.org/10.4236/am.2015.62028

 
 
Top