Journal of Computational Physics 321 (2016) 556–570 Contents lists available at ScienceDirect Journal of Computational Physics www.elsevier.com/locate/jcp Efficient implementation of the many-body Reactive Bond Order (REBO) potential on GPU Przemysław Tr˛ edak a, , Witold R. Rudnicki b,c , Jacek A. Majewski a a Faculty of Physics, University of Warsaw, ul. Pasteura 5, 02-093 Warsaw, Poland b Institute of Informatics, University of Białystok, ul. Konstantego Ciołkowskiego 1M, 15-245 Białystok, Poland c Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, ul. Pawi´ nskiego 5a, 02-106 Warsaw, Poland a r t i c l e i n f o a b s t r a c t Article history: Received 20 October 2015 Received in revised form 24 April 2016 Accepted 29 May 2016 Available online 2 June 2016 Keywords: GPU Reactive bond order potentials Molecular dynamics Efficient thread synchronization The second generation Reactive Bond Order (REBO) empirical potential is commonly used to accurately model a wide range hydrocarbon materials. It is also extensible to other atom types and interactions. REBO potential assumes complex multi-body interaction model, that is difficult to represent efficiently in the SIMD or SIMT programming model. Hence, despite its importance, no efficient GPGPU implementation has been developed for this potential. Here we present a detailed description of a highly efficient GPGPU implementation of molecular dynamics algorithm using REBO potential. The presented algorithm takes advantage of rarely used properties of the SIMT architecture of a modern GPU to solve difficult synchronizations issues that arise in computations of multi-body potential. Techniques developed for this problem may be also used to achieve efficient solutions of different problems. The performance of proposed algorithm is assessed using a range of model systems. It is compared to highly optimized CPU implementation (both single core and OpenMP) available in LAMMPS package. These experiments show up to 6x improvement in forces computation time using single processor of the NVIDIA Tesla K80 compared to high end 16-core Intel Xeon processor. © 2016 Elsevier Inc. All rights reserved. 1. Introduction Molecular dynamics (MD) is a class of algorithms that is widely used to study atomic scale properties of molecular sys- tems. It is used for example to study structure and dynamics of biological macromolecules such as proteins and nucleic acids or to understand how macroscopic properties of materials arise from their atomic structure. The general algorithm of MD is straightforward [1]. A system of interest is represented with atomic resolution each atom is represented by a material point. Atomic interactions are described by means of an interaction potential. Most often the interaction potential is classi- cal, without any direct consideration of the quantum effects. Therefore, the equations of motions are then derived using the classical or Langevin mechanics approach. Initial positions of atoms are assigned according to some experimental knowl- edge of the structure of the system. Atomic velocities are then assigned randomly in agreement with Maxwell–Boltzman distribution for a given temperature. Then, the equations of motion of the system are integrated numerically and interesting observables are collected along the trajectory of the system. The first applications of MD for realistic physical systems were studies of the Van der Waals fluids and solids in 1960s [2–4]. They were followed in 1970s by studies of water [5], small * Corresponding author. E-mail address: przemyslaw.tredak@fuw.edu.pl (P. Tr˛ edak). http://dx.doi.org/10.1016/j.jcp.2016.05.061 0021-9991/© 2016 Elsevier Inc. All rights reserved.