GPU Accelerated Fast FEM Deformation Simulation Youquan Liu 1,3 , Shaohui Jiao 3 , Wen Wu 2,5 , Suvranu De 4 1 Faculty of Science and Technology University of Macau, Macau, China youquanliu@hotmail.com 2 Faculty of Information Technology Macau University of Science and Technology, Macau, China wwu@must.edu.mo 3 State Key Lab of Computer Science, Institute of Software,Chinese Academy of Sciences, Beijing, China jsh@ios.ac.cn 4 Department of Mechanical Aerospace and Nuclear Engineering Rensselaer Polytechnic Institute, Troy, NY, USA des@rpi.edu 5 Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China wwu1@cse.cuhk.edu.hk Abstract—In this paper we present a general FEM (Finite Element Method) solution that enables fast dynamic deformation simulation on the newly available GPU (Graphics Processing Unit) hardware with compute unified device architecture (CUDA) from NVIDIA. CUDA-enabled GPUs harness the power of 128 processors which allow data parallel computations. Compared to the previous GPGPU, it is significantly more flexible with a C language interface. We not only implement FEM deformation computation algorithms with CUDA but also analyze the performance in detail. Our test results indicate that the GPU with CUDA enables about 4 times speedup for FEM deformation computation on an Intel(R) Core 2 Quad 2.0GHz machine with GeForce 8800 GTX. I. INTRODUCTION In graphics community, from 80s’ some pioneers like [1] et al. have started the physically based deformation simulation. After so many years, this area is still active since some problems still there, even though some very excellent progresses available. The tradeoff between performance and precision is always a headache everlasting problem. For a recent survey about the methods of deformation in computer graphics, readers can refer to [2, 3]. The introduction of the Graphics Processing Unit (GPU) provided a means for massive data-parallel computation on the PC. Besides traditional graphics rendering, it became possible to program general purpose GPUs (GPGPU) for a variety of data-intensive applications [4]. For deformation problems, James et al. [5] used vertex processor to calculate the modal synthesis. And Ranzuglia et al. [6, 7] used pixel processor to accelerate the mass-spring deformation framework. However, harnessing the power of the GPU remained tricky since the GPU could only be programmed through a graphics API, such as OpenGL or D3D, adding the overhead of an inadequate API to floating point applications. While GPU programs could gather information from any part of the DRAM, they were not as flexible in scattering the information to any part, making the GPU less flexible than the CPU. To overcome these problems, NVIDIA unveiled the Compute Unified Device Architecture (CUDA) [8] in November 2006 which allows the use of the C programming language to code algorithms to execute on the GPU. CUDA- enabled GPUs include data parallel cache, which allows 128 processor cores in the GeForce 8 Series GPUs. By opening up the GPU architecture, CUDA provides an ideal environment for the development of computation-intensive tasks that can take advantage of the massively parallel nature now available in the G8X series GPUs. This paper presents a very general solution to the FEM deformation algorithm, which is implemented using the CUDA to obtain some performance gains on PCs. And also it analyzes the bottleneck of the whole simulation in detail. Compared to another popular deformation method - mass- spring system, FEM (Finite Element Method) is more sophisticated and more close to its physics property, but certainly it is much slower. What’s more FEM can provide more precise results for engineering problems, such as structure analysis. In Section II the deformation implementation details are given, and then some comparisons and analysis between CPU and GPU are illustrated in Section III. And lastly, we present our conclusion and our future work in Section IV. II. GPU-ACCELERATED FEM DEFORMATION A. Dynamic FEM Deformation For dynamic problems, the motion of an object obeys the following law: + + = Mu Du Ku F   (1) where u is the 3n-dimensional nodal displacement vector, n is the total number of nodes in the object; M is the mass matrix; D is the damping matrix, here we apply Rayleigh damping Support was provided by NIH R01 EB005807 & the National Grant Fundamental Research of Science and Technology (973 Project: 2002CB312102) 606 978-1-4244-2342-2/08/$25.00 ©2008 IEEE.