IEEE TRANSACTIONS ON MAGNETICS, VOL. 51, NO. 3, MARCH 2015 7001404 Multi-GPU Acceleration of Algebraic Multi-Grid Preconditioners for Elliptic Field Problems Christian Richter 1 , Sebastian Schöps 2 , and Markus Clemens 1 1 Chair of Electromagnetic Theory, Faculty of Electrical Engineering, Information Technology, Media Technology, Bergische Universität Wuppertal, Wuppertal 42119, Germany 2 Institut für Theorie Elektromagnetischer Felder, Graduate School of Computational Engineering, Technische Universität Darmstadt, Darmstadt 64285, Germany In this contribution, a multi-graphic processing unit (GPU) implementation of Krylov sub-space methods with algebraic multi-grid preconditioners is proposed. It is used to solve large linear systems stemming from ﬁnite element or ﬁnite difference discretizations of elliptic problems as they occur, e.g., in electrostatics. The distribution of data across multiple GPUs and the effects on memory and speed are discussed when using an approach that preserves the effects of ﬁne-grained parallelism with shared memory on the GPU while distributing data across multiple GPUs with minimal communication effort. Index Terms— Algebraic multi-grid (AMG), conjugate gradients (CGs), CUDA, multi-graphic processing unit (GPU), sparse matrix vector multilication (SpMV). I. I NTRODUCTION T HE simulation of realistic electromagnetic ﬁeld problems requires the numerical solution of partial differential equations. To this end, ﬁnite elements (FEs) are very popular, in particular for static and low frequency ﬁeld simulations. After applying space and time discretization and possibly a non-linear solver, e.g., the Newton–Raphson scheme, the prob- lem has become a large linear algebraic system of equations. In this paper, we discuss the acceleration of the solution proce- dure to these systems by sophisticated preconditioning of con- jugate gradient (CG) methods, i.e., using algebraic multi-grid (AMG) methods based on smoothed aggregation on graphic processing units (GPUs). For solving large linear systems, Krylov sub-space methods are a common approach [1]. Nowa- days the use of multi-core systems is standard to solve the problems in acceptable time. Recent research focuses on the use of GPUs as hardware accelerators. It has been shown that sparse matrix and vector operations can be implemented very efﬁciently on GPUs for general purpose applications [2] and in particular for electromagnetic applications [3]. The advan- tages of GPU acceleration of electromagnetic problems have been demonstrated for systems arising from ﬁnite differences schemes [4] as well as ﬁnite element method (FEM) [5]–[7]. These implementations, however, typically relied on simple preconditioning schemes such as the Jacobi, the incomplete LU, or Cholesky factorizations preconditioners. The GPUs can speed up calculations but have limited local memory that determines the maximal size of the problem solvable without the need for swapping processes. Therefore, the use of multiple GPUs becomes mandatory for large problems exceeding the memory of a single GPU. In addition, further acceleration can be achieved for CGs on multiple GPUs [8], even when Manuscript received May 24, 2014; revised August 4, 2014; accepted September 1, 2014. Date of current version April 22, 2015. Corresponding author: C. Richter (e-mail: christian.richter@uni-wuppertal.de). Color versions of one or more of the ﬁgures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identiﬁer 10.1109/TMAG.2014.2357332 considering the inﬂuence of communication between GPUs and the host [9]. The use of AMG schemes [11] has become a common approach for solving large-scale electromagnetic problems due to their practical efﬁciency and theoretical (near-) optimal asymptotic complexity. These schemes are often used as a preconditioners for Krylov sub-space methods, e.g., precondi- tioned CGs (PCGs) [1]. This paper is structured as follows. In Sections I-A and I-B, the problem formulation is introduced and implementational aspects of AMG are discussed. In Section II, the new multi- GPU CUSP [12] add-on is described in detail. Section III shows effects and limitations of taking into account multiple GPUs for solving electromagnetic problems. Finally, the con- clusions are drawn in Section IV. A. Problem Formulation This paper is focussed on the elliptic boundary value problems that are solved on a computational domain , that is -∇ · (ε( r )∇φ( r )) = f ( r ) (1) where ε is a spatially distributed material parameter, f a given ﬁeld source and φ the ﬁeld solution and adequate boundary conditions, which is in most cases a homogeneous Dirichlet constraint φ( r ) = 0 for  r ∈ ∂. Space discretization turns the problem above into a linear system of equations with a positive deﬁnite matrix. This is implemented in the in-house simulation code MEQSICO [13] that is capable of solving static and quasi-static electric and magnetic ﬁeld problems and coupled multi-physical problems with high-order FEM ansatz functions [14]. As a test example, an electrostatic problem is shown in Fig. 1 which is described by (1) where φ is the electrical potential and f is given by a prescribed electrical charge density or a boundary condition. B. AMG Method In this paper, the AMG method is employed as a precon- ditioner for an iterative CG solver. Thus, the overall process 0018-9464 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.