Comparison between Parallel and Distributed Molecular Dynamics Simulations of Lennard-Jones Systems Vlad Baja x , Dorian Gorgan x , Titus Beu xx x Computer Science Department, Technical University of Cluj-Napoca, Cluj-Napoca, Romania vlad.baja@gmail.com, dorian.gorgan@cs.utcluj.ro xx Faculty of Physics, University Babes-Bolyai, Cluj-Napoca, Romania titus.beu@phys.ubbcluj.ro Abstract—This paper concerns mainly with parallel and dis- tributed implementations of molecular dynamics simulations of the Lennard-Jones potential model. The reported research work studies and experiments different algorithms and parallelization techniques for shared memory and message passing architectures, and the programs are executed on single-core processors, multi- core processors, GPU, and GPU cluster. The solution based on efficient versions of the neighbor list algorithm and space division technique is further discussed. The obtained speedups for multi- core processor, GPU, and GPU cluster, relative to the single-core processor implementation of the program, are analyzed, and the advantages of the algorithms are highlighted. I. I NTRODUCTION Simulations are used to estimate the evolution of systems that are too complex for an analytical solution. One application of computer simulations in physics is molecular dynamics simulations. They study the physical movements of atoms and molecules that interact with each other. Because the number of particles in such systems is very large, it is impossible to find analytical solutions for the properties of the system. In molecular dynamics, numerical methods are used to solve this problem. However, long simulations generate cumulative numerical integration errors. The errors can be minimized by proper selection of the integration algorithm, parameters of simulation algorithms and data representation, but they cannot be eliminated entirely. In molecular dynamics, the definition of a potential func- tion is required. This potential function describes the model of interaction between the particles (atoms and molecules). Lennard-Jones potential is such a function, which approxi- mates the interaction between a pair of neutral atoms and molecules. This model can be used together with other models, to study a more complex and realistic system with greater accuracy. The objective of this work is to study and find an efficient algorithm for high fidelity molecular dynamics (MD) simula- tions of Lennard-Jones systems and to compare the execution time and speedup on different types of execution units: single- core processor, multi-core processor, graphics processing unit (GPU) and a GPU cluster. The program is called LJSimulator. In chapter 2, other similar programs are briefly discussed; chapter 3 contains details about the implementation of algo- rithms and the parallelization techniques used in the LJSim- ulator program. Chapter 4 contains details about testing and the results obtained. In Chapter 5, conclusions are written. II. RELATED WORKS The work started from a simple molecular dynamics sim- ulation program presented in [1], which models the Lennard- Jones potential model and uses a simple version of neighbor lists. In [2] it is described a data-parallel version of the algorithm, that uses a space division technique to increase the complexity of the algorithm to O(n). Anderson et al. describe in [3] a general purpose molecular dynamics simulator fully implemented in CUDA for graphics processing units (GPUs), which is very similar in ideas with the one in this work, but is implemented to run only on a single GPU. The simulator was written from scratch, in order to optimize the data structures and operations for the GPU, and simulate N particles contained in a finite box with periodic boundary conditions. The neighbor list algorithm and its implementation are described in detail, pointing out the optimizations that are done for the GPU. Several techniques to sort the particles in order to maximize memory performance are enumerated and their advantages and disadvantages are pointed out. They tested the performance and measured the average time for a simulation step as a function of the number of particles. Watanabe et al. present in [4] efficient implementations of an MD simulator for Lennard-Jones systems. Optimizations for specific CPU architectures are also discussed. The maxi- mum number of particles that were simulated is 4.1 billion, using 8192 MPI processes. The processor computation power is compared it with the memory bandwidth and latency, and with the communication latency. III. I MPLEMENTATION CONSIDERATIONS The research is driven to obtain high fidelity simulations of a single molecular model, the Lennard-Jones potential model. This model was chosen because it is simple to understand, 978-1-4673-2952-1/12/$31.00 ©2012 IEEE 349