INTERNATIONAL JOURNAL OF c 2013 Institute for Scientific NUMERICAL ANALYSIS AND MODELING, SERIES B Computing and Information Volume 4, Number 4, Pages 394–412 GPU COMPUTING FOR MESHFREE PARTICLE METHOD M. PANCHATCHARAM ⋆, , S. SUNDAR , V. VETRIVEL , A. KLAR , AND S. TIWARI Abstract. Graphics Processing Units (GPUs), originally developed for computer games, now provide computational power for scientific applications. A study on the comparison of compu- tational speed-up and efficiency of a GPU with a CPU for the Finite Pointset Method (FPM), which is a numerical tool in Computational Fluid Dynamics (CFD) is presented. As FPM is based on the point cloud, it is so expensive when the number of particles are in millions. We have demonstrated the application of the FPM using a single-GPU (Nvidia Tesla M2050) and Intel CPU (Dual Xeon). Importance of the GPU is realized by the FPM since GPU yields a computational speed-up of 70× for the Poisson equation with various boundary conditions. Key words. Finite Pointset Method(FPM), CUDA, GPU, Bi-CGSTAB. 1. Introduction Nowadays, computational methods and related hardware are really inseparable. The hardware architecture progress leads the numerical methods that can be used with a reasonable computational cost. To increase the computational ability of CPU, a large and expensive cache is integrated and a many-core design has been employed [10]. The small scale problems of CFD can be solved on a PC of multi- core CPU and shared-memory parallel programming. For large scale problem, a PC with few cores cannot offer enough computational capability, and a cluster with many CPUs (or cores) is needed. Nevertheless, the memory bottleneck which appears in the form of bandwidth limitation and fetching latency, has restricted the performance of the many-core systems. In the meantime, Graphics Processing Units (GPUs), having recently turned into general-purpose programmable units, can provide a different solution to the memory access problem. Initially driven by the gamer market, GPUs recently became suitable for high performance computing applications. A GPU is a multi-threaded, many core pro- cessors which was originally developed for graphics processing. However, in recent years, the so-called General Purpose GPU (GPGPU) has been used widely for computation in different fields because it has a high computing ability and a rela- tively low cost. The main advantage of GPUs is their ability to perform significantly more floating point operations (FLOPS) per unit time than a CPU. One of the mar- ket leaders, NVIDIA, developed a parallel computation architecture called CUDA (Compute Unified Device Architecture) [7]. CUDA is an extension of C language which allows us to program the NVIDIA GPUs in an easy way. Other than NVIDI- A, there are several ways to realize the GPGPU computing: Computer Graphics with OpenGL [6], OpenCL [12], Stream (ATI Corporation) [1]. But according to Du et al. [5], at this moment CUDA is more efficient on the GPU than Open- CL. Hence, in our study, the NVIDIA GPU with CUDA platform is chosen because CUDA is used in a variety of different fields of scientific computation such as graph- ics, biology, linear algebra, PDE solvers and computational physics. Especially in Received by the editors October 31, 2012 and, in revised form, August 31, 2013. 2000 Mathematics Subject Classification. 65Y05, 65Y20, 35Q30, 76D05. This research was supported by DAAD. 394