Auto-Generation of Parallel Finite-Differencing Code for MPI, TBB and CUDA D. P. Playne Computer Science, IIMS Massey University Auckland, New Zealand d.p.playne@massey.ac.nz K. A. Hawick Computer Science, IIMS Massey University Auckland, New Zealand k.a.hawick@massey.ac.nz Abstract—Finite-difference methods can be useful for solv- ing certain partial differential equations (PDEs) in the time domain. Compiler technologies can be used to parse an application domain specific representation of these PDEs and build an abstract representation of both the equation and the desired solver. This abstract representation can be used to generate a language-specific implementation. We show how this framework can be used to generate software for several parallel platforms: Message Passing Interface (MPI), Threading Building Blocks(TBB) and Compute Unified Device Architecture(CUDA). We present performance data of the automatically-generated parallel code and discuss the implica- tions of the generator in terms of code portability, development time and maintainability. Keywords-automatic code generation; MPI; TBB; CUDA; accelerators; portability; multi-platform. I. I NTRODUCTION Finite-difference methods are a well known technique for solving partial differential equations [1]. Although finite- element and other matrix-formulated methods are very popu- lar for irregular mesh problems [2], finite-difference methods continue to find use in computational simulations and gener- ally are straightforward to parallelise using geometric stencil methods of decomposition which attain good computational speedup [3]–[5]. Although finite-difference methods are quite feasible to hand-parallelise for low-order stencils when a small number of neighbouring cells is required for each calculation, in cases when higher-order calculus operations are employed [6] the codes become: very complex; hard to implement manually; and very difficult to verify since a small programming error concerning a data index may still lead to a numerically plausible solution that is hard to spot as being wrong. It is therefore very attractive to be able to employ automatic code generation [7] to this problem [8], [9]. In this paper we address two key issues associated with finite-difference methods on parallel computing. The first is the use of auto code-generation methods to specify partial differential equations as a high-level problem using calculus terminology and involves building a software tool-set to gen- erate the programming language source-code to implement the solvers. The second concerns the long-standing problem of portability across parallel programming platforms. Figure 1. An example three-dimensional Cahn-Hilliard system with a size of 256 3 . Ray-traced rendered from a simulation using automatically generated code. At the time of writing parallel computing is once again attracting great interest and attention as the world faces up to the problems of power consumption of CPUs and the non-continuance of the previous development trend [10] in CPU clock frequencies which used to double approximately every eighteen months in accordance with Moore’s law [11]. It is possible to bring parallelism to bear on many problems using hybrids of cluster-computing approaches; accelerator technologies such as general purpose graphical processing unit (GP-GPU); and the use of many threads within a conventional multi-core CPU. These are typified by software technologies such as the open standard Message Passing Interface (MPI) [12], [13]; NVIDIA’s Compute Unified Device Architecture (CUDA) [14]–[16] for GPUs; and Intel’s Thread Building Blocks (TBB) [17], [18] soft- ware for multi-threaded programming multi-core devices, respectively. It is however tedious, error prone and non-