Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 2. Explicit Solvent Particle Mesh Ewald Romelia Salomon-Ferrer, Andreas W. Gö tz, Duncan Poole, Scott Le Grand, , and Ross C. Walker* ,,§ San Diego Supercomputer Center, University of California, San Diego, 9500 Gilman Drive MC0505, La Jolla, California 92093, United States NVIDIA Corporation, 2701 San Tomas Expressway, Santa Clara, California 95050, United States § Department of Chemistry and Biochemistry, University of California, San Diego, 9500 Gilman Drive MC0505, La Jolla, California 92093, United States * S Supporting Information ABSTRACT: We present an implementation of explicit solvent all atom classical molecular dynamics (MD) within the AMBER program package that runs entirely on CUDA-enabled GPUs. First released publicly in April 2010 as part of version 11 of the AMBER MD package and further improved and optimized over the last two years, this implementation supports the three most widely used statistical mechanical ensembles (NVE, NVT, and NPT), uses particle mesh Ewald (PME) for the long-range electrostatics, and runs entirely on CUDA-enabled NVIDIA graphics processing units (GPUs), providing results that are statistically indistinguishable from the traditional CPU version of the software and with performance that exceeds that achievable by the CPU version of AMBER software running on all conventional CPU-based clusters and supercomputers. We briey discuss three dierent precision models developed specically for this work (SPDP, SPFP, and DPDP) and highlight the technical details of the approach as it extends beyond previously reported work [Gö tz et al., J. Chem. Theory Comput. 2012, DOI: 10.1021/ct200909j; Le Grand et al., Comp. Phys. Comm. 2013, DOI: 10.1016/j.cpc.2012.09.022].We highlight the substantial improvements in performance that are seen over traditional CPU-only machines and provide validation of our implementation and precision models. We also provide evidence supporting our decision to deprecate the previously described fully single precision (SPSP) model from the latest release of the AMBER software package. 1. INTRODUCTION Classical molecular dynamics (MD) has been extensively used in atomistic studies of biological and chemical phenomena including the study of biological ensembles of proteins, amino acids, lipid bilayers, and carbohydrates. 113 With the develop- ment of new algorithms and the emergence of new hardware platforms, MD simulations have dramatically increased in size, complexity, and simulation length. In particular, graphics processing units (GPUs) have emerged as an economical and powerful alternative to traditional CPUs for scienti c computation. 1417 GPUs are present in most modern high- end desktops and are now appearing in the latest generation of supercomputers. When programmed correctly, software run- ning on GPUs can signicantly outperform that running on CPUs. This is due to a combination of high computational power, in terms of peak oating point operations, and high memory bandwidth. This combination makes GPUs an ideal platform for mathematically intense algorithms that can be expressed in a highly parallel way. On the downside, the inherent parallel nature of the GPU architecture necessitates a decrease in exibility and an increase in programming complexity in comparison to CPUs. The success and high demand for GPUs in the gaming and 3D image rendering industries has fueled the sustained development of GPUs for over two decades leading to extremely cost-eective hardware for scientic computations. The rst GPU with features specically targeted for scientic computation was released by NVIDIA in 2007 with a subsequent generation following a year later that provided the rst support for double precision oating point arithmetic. At the time of writing, NVIDIAs latest generation of GPUs are based on the Kepler GK104 and GK110 chips. These two chip designs, similar to earlier models, provide very dierent ratios for single vs double precision performance. The GK104 is targeted at algorithms that rely extensively on single precision, while the GK110 oers more extensive double precision performance. As discussed later, it is necessary to carefully tune the use of single and double precision oating point and ultimately xed precision arithmetic to achieve high perform- ance across these dierent hardware designs while not compromising the integrity of the underlying mathematics. There are a large number of scientic software packages that have been successfully ported to run on GPUs. 12,13,18 In the molecular dynamics eld there have been attempts to port major MD packages to GPUs. For a review of the progress, the reader is referred to the review article in ref 12. A number of widely used MD packages designed for the simulation of condensed phase biological systems exist that feature varying degrees of GPU support including NAMD, 19,20 AMBER, 21,22 Received: April 17, 2013 Article pubs.acs.org/JCTC © XXXX American Chemical Society A dx.doi.org/10.1021/ct400314y | J. Chem. Theory Comput. XXXX, XXX, XXXXXX