IEEE TRANSACTIONS ON MAGNETICS, VOL. 51, NO. 3, MARCH 2015 7001404
Multi-GPU Acceleration of Algebraic Multi-Grid
Preconditioners for Elliptic Field Problems
Christian Richter
1
, Sebastian Schöps
2
, and Markus Clemens
1
1
Chair of Electromagnetic Theory, Faculty of Electrical Engineering, Information Technology,
Media Technology, Bergische Universität Wuppertal, Wuppertal 42119, Germany
2
Institut für Theorie Elektromagnetischer Felder, Graduate School of Computational Engineering,
Technische Universität Darmstadt, Darmstadt 64285, Germany
In this contribution, a multi-graphic processing unit (GPU) implementation of Krylov sub-space methods with algebraic multi-grid
preconditioners is proposed. It is used to solve large linear systems stemming from finite element or finite difference discretizations
of elliptic problems as they occur, e.g., in electrostatics. The distribution of data across multiple GPUs and the effects on memory
and speed are discussed when using an approach that preserves the effects of fine-grained parallelism with shared memory on the
GPU while distributing data across multiple GPUs with minimal communication effort.
Index Terms— Algebraic multi-grid (AMG), conjugate gradients (CGs), CUDA, multi-graphic processing unit (GPU), sparse matrix
vector multilication (SpMV).
I. I NTRODUCTION
T
HE simulation of realistic electromagnetic field problems
requires the numerical solution of partial differential
equations. To this end, finite elements (FEs) are very popular,
in particular for static and low frequency field simulations.
After applying space and time discretization and possibly a
non-linear solver, e.g., the Newton–Raphson scheme, the prob-
lem has become a large linear algebraic system of equations.
In this paper, we discuss the acceleration of the solution proce-
dure to these systems by sophisticated preconditioning of con-
jugate gradient (CG) methods, i.e., using algebraic multi-grid
(AMG) methods based on smoothed aggregation on graphic
processing units (GPUs). For solving large linear systems,
Krylov sub-space methods are a common approach [1]. Nowa-
days the use of multi-core systems is standard to solve the
problems in acceptable time. Recent research focuses on the
use of GPUs as hardware accelerators. It has been shown that
sparse matrix and vector operations can be implemented very
efficiently on GPUs for general purpose applications [2] and
in particular for electromagnetic applications [3]. The advan-
tages of GPU acceleration of electromagnetic problems have
been demonstrated for systems arising from finite differences
schemes [4] as well as finite element method (FEM) [5]–[7].
These implementations, however, typically relied on simple
preconditioning schemes such as the Jacobi, the incomplete
LU, or Cholesky factorizations preconditioners. The GPUs
can speed up calculations but have limited local memory that
determines the maximal size of the problem solvable without
the need for swapping processes. Therefore, the use of multiple
GPUs becomes mandatory for large problems exceeding the
memory of a single GPU. In addition, further acceleration
can be achieved for CGs on multiple GPUs [8], even when
Manuscript received May 24, 2014; revised August 4, 2014; accepted
September 1, 2014. Date of current version April 22, 2015. Corresponding
author: C. Richter (e-mail: christian.richter@uni-wuppertal.de).
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TMAG.2014.2357332
considering the influence of communication between GPUs
and the host [9].
The use of AMG schemes [11] has become a common
approach for solving large-scale electromagnetic problems due
to their practical efficiency and theoretical (near-) optimal
asymptotic complexity. These schemes are often used as a
preconditioners for Krylov sub-space methods, e.g., precondi-
tioned CGs (PCGs) [1].
This paper is structured as follows. In Sections I-A and I-B,
the problem formulation is introduced and implementational
aspects of AMG are discussed. In Section II, the new multi-
GPU CUSP [12] add-on is described in detail. Section III
shows effects and limitations of taking into account multiple
GPUs for solving electromagnetic problems. Finally, the con-
clusions are drawn in Section IV.
A. Problem Formulation
This paper is focussed on the elliptic boundary value
problems that are solved on a computational domain , that is
-∇ · (ε( r )∇φ( r )) = f ( r ) (1)
where ε is a spatially distributed material parameter, f a given
field source and φ the field solution and adequate boundary
conditions, which is in most cases a homogeneous Dirichlet
constraint φ( r ) = 0 for r ∈ ∂. Space discretization turns
the problem above into a linear system of equations with a
positive definite matrix. This is implemented in the in-house
simulation code MEQSICO [13] that is capable of solving
static and quasi-static electric and magnetic field problems and
coupled multi-physical problems with high-order FEM ansatz
functions [14]. As a test example, an electrostatic problem is
shown in Fig. 1 which is described by (1) where φ is the
electrical potential and f is given by a prescribed electrical
charge density or a boundary condition.
B. AMG Method
In this paper, the AMG method is employed as a precon-
ditioner for an iterative CG solver. Thus, the overall process
0018-9464 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.