Parallel 5 point SOR for solving the Convection Diusion equation using graphics processing units Y. Cotronis, E. Konstantinidis, M. A. Louka, N. M. Missirlis Department of Informatics and Telecommunications, University of Athens, Panepistimiopolis, 15784, Athens, Greece Abstract In this paper we study a parallel form of the SOR method for the numerical solution of the Convection Diusion equation suitable for GPUs using CUDA. To exploit the parallelism oered by GPUs we consider the fine grain parallelism model. This is achieved by considering the local relaxation version of SOR. More specifically, we use SOR with red black ordering with two sets of parameters ω ij and ω ij for the 5 point stencil. The parameter ω ij is associated with each red (i+j even) grid point (ij), whereas the parameter ω ij is associated with each black (i+j odd) grid point (ij). The use of a parameter for each grid point avoids the global communication required in the adaptive determination of the best value of ω and also increases the convergence rate of the SOR method [2]. We present our strategy and the results of our eort to exploit the computational capabilities of GPUs under the CUDA environment. Additionally, a parallel program utilizing manual SSE2 (Streaming SIMD Extensions 2) vectorization for the CPU was developed as a performance reference. The optimizations applied on the GPU version were also considered for the CPU version. Significant performance improvement was achieved with the three developed GPU kernel variations. Keywords: Iterative methods, SOR, R/B SOR, GPU computing, CUDA Subject classification : AMS(MOS), 65F10, 65N20, CR:5.13. 1. Introduction Although, typically the CPUs of computer systems have been used to solve computational problems, current trend is to ooad heavy computations to accelerators and particularly graphics processing units (GPUs). As commodity GPUs have increased computational power compared to modern CPUs they are proposed as a more ecient compute unit in solving scientific problems with large computational burden. Thus, application programming environments have been developed like the proprietary CUDA (Compute Unified Development Architecture) by NVidia [17, 13] and the OpenCL (Open Computing Language) [22] which is supported by many hardware vendors, including NVidia. CUDA environment is rapidly evolving and a constantly increasing number of researchers is adopting it in order to exploit GPU capabilities. It provides an elegant way for writing GPU parallel programs, by using similar syntax and rules based on C/C++ language, without involving other graphics APIs. In this paper we use GPUs for the numerical solution of Partial Dierential equations. In particular, we consider the solution of the second order convection diusion equation Δu f ( x, y) u x g( x, y) u y = 0 (1) Corresponding author Email addresses: cotronis@di.uoa.gr (Y. Cotronis), ekondis@di.uoa.gr (E. Konstantinidis), mlouka@di.uoa.gr (M. A. Louka), nmis@di.uoa.gr (N. M. Missirlis) Preprint submitted to Elsevier October 30, 2012