Using Hybrid CPU-GPU Platforms to Accelerate the Computation of the Matrix Sign Function Peter Benner 1 , Pablo Ezzatti 2 , Enrique S. Quintana-Ort´ ı 3 , and Alfredo Rem´ on 3 1 Fakult¨atf¨ ur Mathematik, Chemnitz University of Technology, D-09107 Chemnitz, Germany benner@mathematik.tu-chemnitz.de 2 Centro de C´alculo–Instituto de la Computaci´on, Universidad de la Rep´ ublica, 11.300–Montevideo, Uruguay pezzatti@fing.edu.uy 3 Depto. de Ingenier´ ıa y Ciencia de Computadores, Universidad Jaume I, 12.071–Castell´on,Spain {quintana,remon}@icc.uji.es Abstract. We investigate the numerical computation of the matrix sign function of large-scale dense matrices. This is a common task in vari- ous application areas. The main computational work in Newton’s itera- tion for the matrix sign function consits of matrix inversion. Therefore, we investigate the performance of two approaches for matrix inversion based on Gaussian (LU factorization) and Gauss-Jordan eliminations. The target architecture is a current general-purpose multi-core proces- sor connected to a graphics processor. Parallelism is extracted in both processors by linking sequential versions of the codes with multi-threaded implementations of BLAS. Our results on a system with two Intel Quad- Core processors and an nvidia Tesla C1060 illustrate the performance and scalability attained by the codes on this system. Keywords: Matrix sign function, hybrid platforms, GPUs, multi-core processors, linear algebra, high performance computing. 1 Introduction Consider a matrix A ∈ R n×n with no eigenvalues on the imaginary axis, and let A = T -1 J - 0 0 J + T, (1) be its Jordan decomposition, where the eigenvalues of J - ∈ R j×j /J + ∈ R (n-j)×(n-j) all have negative/positive real parts [1]. The matrix sign function of A is then defined as sign(A)= T -1 -I j 0 0 I n-j T, (2) H.X. Lin et al. (Eds): Euro-Par 2009 Workshops, LNCS 6043, pp. 132–139, 2010. c Springer-Verlag Berlin Heidelberg 2010