GPU Accelerated Fast FEM Deformation Simulation
Youquan Liu
1,3
, Shaohui Jiao
3
, Wen Wu
2,5
, Suvranu De
4
1
Faculty of Science and Technology
University of Macau, Macau, China
youquanliu@hotmail.com
2
Faculty of Information Technology
Macau University of Science and
Technology, Macau, China
wwu@must.edu.mo
3
State Key Lab of Computer Science,
Institute of Software,Chinese Academy
of Sciences, Beijing, China
jsh@ios.ac.cn
4
Department of Mechanical Aerospace and Nuclear Engineering
Rensselaer Polytechnic Institute, Troy, NY, USA
des@rpi.edu
5
Department of Computer Science and Engineering,
The Chinese University of Hong Kong, Hong Kong, China
wwu1@cse.cuhk.edu.hk
Abstract—In this paper we present a general FEM (Finite
Element Method) solution that enables fast dynamic
deformation simulation on the newly available GPU (Graphics
Processing Unit) hardware with compute unified device
architecture (CUDA) from NVIDIA. CUDA-enabled GPUs
harness the power of 128 processors which allow data parallel
computations. Compared to the previous GPGPU, it is
significantly more flexible with a C language interface. We not
only implement FEM deformation computation algorithms with
CUDA but also analyze the performance in detail. Our test
results indicate that the GPU with CUDA enables about 4 times
speedup for FEM deformation computation on an Intel(R) Core
2 Quad 2.0GHz machine with GeForce 8800 GTX.
I. INTRODUCTION
In graphics community, from 80s’ some pioneers like [1]
et al. have started the physically based deformation
simulation. After so many years, this area is still active since
some problems still there, even though some very excellent
progresses available. The tradeoff between performance and
precision is always a headache everlasting problem. For a
recent survey about the methods of deformation in computer
graphics, readers can refer to [2, 3].
The introduction of the Graphics Processing Unit (GPU)
provided a means for massive data-parallel computation on the
PC. Besides traditional graphics rendering, it became possible
to program general purpose GPUs (GPGPU) for a variety of
data-intensive applications [4]. For deformation problems,
James et al. [5] used vertex processor to calculate the modal
synthesis. And Ranzuglia et al. [6, 7] used pixel processor to
accelerate the mass-spring deformation framework. However,
harnessing the power of the GPU remained tricky since the
GPU could only be programmed through a graphics API, such
as OpenGL or D3D, adding the overhead of an inadequate
API to floating point applications. While GPU programs could
gather information from any part of the DRAM, they were not
as flexible in scattering the information to any part, making
the GPU less flexible than the CPU.
To overcome these problems, NVIDIA unveiled the
Compute Unified Device Architecture (CUDA) [8] in
November 2006 which allows the use of the C programming
language to code algorithms to execute on the GPU. CUDA-
enabled GPUs include data parallel cache, which allows 128
processor cores in the GeForce 8 Series GPUs. By opening up
the GPU architecture, CUDA provides an ideal environment
for the development of computation-intensive tasks that can
take advantage of the massively parallel nature now available
in the G8X series GPUs.
This paper presents a very general solution to the FEM
deformation algorithm, which is implemented using the
CUDA to obtain some performance gains on PCs. And also it
analyzes the bottleneck of the whole simulation in detail.
Compared to another popular deformation method - mass-
spring system, FEM (Finite Element Method) is more
sophisticated and more close to its physics property, but
certainly it is much slower. What’s more FEM can provide
more precise results for engineering problems, such as
structure analysis.
In Section II the deformation implementation details are
given, and then some comparisons and analysis between CPU
and GPU are illustrated in Section III. And lastly, we present
our conclusion and our future work in Section IV.
II. GPU-ACCELERATED FEM DEFORMATION
A. Dynamic FEM Deformation
For dynamic problems, the motion of an object obeys the
following law:
+ + = Mu Du Ku F (1)
where u is the 3n-dimensional nodal displacement vector, n is
the total number of nodes in the object; M is the mass matrix;
D is the damping matrix, here we apply Rayleigh damping
Support was provided by NIH R01 EB005807 & the National Grant
Fundamental Research of Science and Technology (973 Project:
2002CB312102)
606 978-1-4244-2342-2/08/$25.00 ©2008 IEEE.