Technical Report CSTN-073 Data Parallel Three-Dimensional Cahn-Hilliard Field Equation Simulation on GPUs with CUDA D.P. Playne and K.A. Hawick Institute of Information and Mathematical Sciences Massey University – Albany, North Shore 102-904, Auckland, New Zealand Email: {d.p.playne,k.a.hawick}@massey.ac.nz Tel: +64 9 414 0800 Fax: +64 9 441 8181 February 2009 Abstract Computational scientiﬁc simulations have long used parallel computers to increase their performance. Re- cently graphics cards have been utilised to provide this functionality. GPGPU APIs such as NVidia’s CUDA can be used to harness the power of GPUs for purposes other than computer graphics. GPUs are designed for processing two-dimensional data. In previous work we have presented several two-dimensional Cahn-Hilliard simulations that each utilise diﬀerent CUDA memory types and compared their results. In this paper we ex- tend these ideas to three dimensions. As GPUs are not intended for processing three-dimensional data ar- rays, the performance of the memory optimisations is expected to change. Here we present several three- dimensional Cahn-Hilliard simulations to explore the challenges and the performance of the diﬀerent memory types in three-dimensions. The results show that the simulation design with the best performance in three- dimensions uses a diﬀerent memory type to the optimal two-dimensional simulation. Keywords: GPGPU; CUDA; Cahn-Hilliard; Data- Parallel. 1 Introduction With the release of several GPGPU APIs in recent years, utilising Graphical Processing Units or GPUs for scientiﬁc simulations has become increasingly pop- ular [1–9]. GPUs provide a great deal of computa- tional power compared to a traditional CPU. Programs that execute on a GPU can operate many times faster than their CPU counter-parts. However, this perfor- mance gain depends heavily on the ability to decom- pose the program into many threads that can execute independently of one another. High speed-up factors are also strongly aﬀected by the memory type and ac- cess patterns used [10]. This article extends our previ- ous research on GPU methods for simulating the Cahn- Hilliard (CH) equation [11] by extending the simulation to three-dimensions. In Sections: 2 and 3 we discuss the capabilities and ar- chitecture of GPUs and the role of NVidia’s CUDA [5] with speciﬁc focus on the GPU memory types it ex- poses. Sections: 4 and 5 provide a brief introduction to the Cahn-Hilliard ﬁeld equation and the basic method of decomposing a CH simulation onto a GPU. Sec- tion: 6 describes 5 kernel designs (A-E) for simulating the Cahn-Hilliard equation in three-dimensions. And ﬁnally a discussion of the simulations, their results and a summary are presented in Sections: 7 and 8. 2 Graphical Processing Units Graphical Processing Units or GPUs are specialised processors designed for calculating the complex graph- ics pipeline required for three-dimensional computer graphics in real-time. GPUs contain several multipro- cessors, each of which contain many stream processors (at the time of writing a typical high-end graphics card contains 240 stream processors). These stream proces- sors can execute instructions in parallel and provide GPUs with many times the computational power of a traditional CPU. GPUs also contain several diﬀerent types of memory that are optimised for speciﬁc tasks within the graph- ics pipeline. GPGPU libraries can utilise these types of memory for purposes other than those they were ini- tially intended for. Choosing the memory type with 1