Comparison between Parallel and Distributed
Molecular Dynamics Simulations of Lennard-Jones
Systems
Vlad Baja
x
, Dorian Gorgan
x
, Titus Beu
xx
x
Computer Science Department, Technical University of Cluj-Napoca, Cluj-Napoca, Romania
vlad.baja@gmail.com, dorian.gorgan@cs.utcluj.ro
xx
Faculty of Physics, University Babes-Bolyai, Cluj-Napoca, Romania
titus.beu@phys.ubbcluj.ro
Abstract—This paper concerns mainly with parallel and dis-
tributed implementations of molecular dynamics simulations of
the Lennard-Jones potential model. The reported research work
studies and experiments different algorithms and parallelization
techniques for shared memory and message passing architectures,
and the programs are executed on single-core processors, multi-
core processors, GPU, and GPU cluster. The solution based on
efficient versions of the neighbor list algorithm and space division
technique is further discussed. The obtained speedups for multi-
core processor, GPU, and GPU cluster, relative to the single-core
processor implementation of the program, are analyzed, and the
advantages of the algorithms are highlighted.
I. I NTRODUCTION
Simulations are used to estimate the evolution of systems
that are too complex for an analytical solution. One application
of computer simulations in physics is molecular dynamics
simulations. They study the physical movements of atoms and
molecules that interact with each other. Because the number
of particles in such systems is very large, it is impossible
to find analytical solutions for the properties of the system.
In molecular dynamics, numerical methods are used to solve
this problem. However, long simulations generate cumulative
numerical integration errors. The errors can be minimized by
proper selection of the integration algorithm, parameters of
simulation algorithms and data representation, but they cannot
be eliminated entirely.
In molecular dynamics, the definition of a potential func-
tion is required. This potential function describes the model
of interaction between the particles (atoms and molecules).
Lennard-Jones potential is such a function, which approxi-
mates the interaction between a pair of neutral atoms and
molecules. This model can be used together with other models,
to study a more complex and realistic system with greater
accuracy.
The objective of this work is to study and find an efficient
algorithm for high fidelity molecular dynamics (MD) simula-
tions of Lennard-Jones systems and to compare the execution
time and speedup on different types of execution units: single-
core processor, multi-core processor, graphics processing unit
(GPU) and a GPU cluster. The program is called LJSimulator.
In chapter 2, other similar programs are briefly discussed;
chapter 3 contains details about the implementation of algo-
rithms and the parallelization techniques used in the LJSim-
ulator program. Chapter 4 contains details about testing and
the results obtained. In Chapter 5, conclusions are written.
II. RELATED WORKS
The work started from a simple molecular dynamics sim-
ulation program presented in [1], which models the Lennard-
Jones potential model and uses a simple version of neighbor
lists. In [2] it is described a data-parallel version of the
algorithm, that uses a space division technique to increase the
complexity of the algorithm to O(n).
Anderson et al. describe in [3] a general purpose molecular
dynamics simulator fully implemented in CUDA for graphics
processing units (GPUs), which is very similar in ideas with
the one in this work, but is implemented to run only on
a single GPU. The simulator was written from scratch, in
order to optimize the data structures and operations for the
GPU, and simulate N particles contained in a finite box with
periodic boundary conditions. The neighbor list algorithm and
its implementation are described in detail, pointing out the
optimizations that are done for the GPU. Several techniques to
sort the particles in order to maximize memory performance
are enumerated and their advantages and disadvantages are
pointed out. They tested the performance and measured the
average time for a simulation step as a function of the number
of particles.
Watanabe et al. present in [4] efficient implementations of
an MD simulator for Lennard-Jones systems. Optimizations
for specific CPU architectures are also discussed. The maxi-
mum number of particles that were simulated is 4.1 billion,
using 8192 MPI processes. The processor computation power
is compared it with the memory bandwidth and latency, and
with the communication latency.
III. I MPLEMENTATION CONSIDERATIONS
The research is driven to obtain high fidelity simulations of
a single molecular model, the Lennard-Jones potential model.
This model was chosen because it is simple to understand,
978-1-4673-2952-1/12/$31.00 ©2012 IEEE 349