Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 2. Explicit Solvent Particle Mesh Ewald Romelia Salomon-Ferrer, † Andreas W. Gö tz, † Duncan Poole, ‡ Scott Le Grand, ‡,∥ and Ross C. Walker* ,†,§ † San Diego Supercomputer Center, University of California, San Diego, 9500 Gilman Drive MC0505, La Jolla, California 92093, United States ‡ NVIDIA Corporation, 2701 San Tomas Expressway, Santa Clara, California 95050, United States § Department of Chemistry and Biochemistry, University of California, San Diego, 9500 Gilman Drive MC0505, La Jolla, California 92093, United States * S Supporting Information ABSTRACT: We present an implementation of explicit solvent all atom classical molecular dynamics (MD) within the AMBER program package that runs entirely on CUDA-enabled GPUs. First released publicly in April 2010 as part of version 11 of the AMBER MD package and further improved and optimized over the last two years, this implementation supports the three most widely used statistical mechanical ensembles (NVE, NVT, and NPT), uses particle mesh Ewald (PME) for the long-range electrostatics, and runs entirely on CUDA-enabled NVIDIA graphics processing units (GPUs), providing results that are statistically indistinguishable from the traditional CPU version of the software and with performance that exceeds that achievable by the CPU version of AMBER software running on all conventional CPU-based clusters and supercomputers. We brieﬂy discuss three diﬀerent precision models developed speciﬁcally for this work (SPDP, SPFP, and DPDP) and highlight the technical details of the approach as it extends beyond previously reported work [Gö tz et al., J. Chem. Theory Comput. 2012, DOI: 10.1021/ct200909j; Le Grand et al., Comp. Phys. Comm. 2013, DOI: 10.1016/j.cpc.2012.09.022].We highlight the substantial improvements in performance that are seen over traditional CPU-only machines and provide validation of our implementation and precision models. We also provide evidence supporting our decision to deprecate the previously described fully single precision (SPSP) model from the latest release of the AMBER software package. 1. INTRODUCTION Classical molecular dynamics (MD) has been extensively used in atomistic studies of biological and chemical phenomena including the study of biological ensembles of proteins, amino acids, lipid bilayers, and carbohydrates. 1−13 With the develop- ment of new algorithms and the emergence of new hardware platforms, MD simulations have dramatically increased in size, complexity, and simulation length. In particular, graphics processing units (GPUs) have emerged as an economical and powerful alternative to traditional CPUs for scienti ﬁc computation. 14−17 GPUs are present in most modern high- end desktops and are now appearing in the latest generation of supercomputers. When programmed correctly, software run- ning on GPUs can signiﬁcantly outperform that running on CPUs. This is due to a combination of high computational power, in terms of peak ﬂoating point operations, and high memory bandwidth. This combination makes GPUs an ideal platform for mathematically intense algorithms that can be expressed in a highly parallel way. On the downside, the inherent parallel nature of the GPU architecture necessitates a decrease in ﬂexibility and an increase in programming complexity in comparison to CPUs. The success and high demand for GPUs in the gaming and 3D image rendering industries has fueled the sustained development of GPUs for over two decades leading to extremely cost-eﬀective hardware for scientiﬁc computations. The ﬁrst GPU with features speciﬁcally targeted for scientiﬁc computation was released by NVIDIA in 2007 with a subsequent generation following a year later that provided the ﬁrst support for double precision ﬂoating point arithmetic. At the time of writing, NVIDIA’s latest generation of GPUs are based on the Kepler GK104 and GK110 chips. These two chip designs, similar to earlier models, provide very diﬀerent ratios for single vs double precision performance. The GK104 is targeted at algorithms that rely extensively on single precision, while the GK110 oﬀers more extensive double precision performance. As discussed later, it is necessary to carefully tune the use of single and double precision ﬂoating point and ultimately ﬁxed precision arithmetic to achieve high perform- ance across these diﬀerent hardware designs while not compromising the integrity of the underlying mathematics. There are a large number of scientiﬁc software packages that have been successfully ported to run on GPUs. 12,13,18 In the molecular dynamics ﬁeld there have been attempts to port major MD packages to GPUs. For a review of the progress, the reader is referred to the review article in ref 12. A number of widely used MD packages designed for the simulation of condensed phase biological systems exist that feature varying degrees of GPU support including NAMD, 19,20 AMBER, 21,22 Received: April 17, 2013 Article pubs.acs.org/JCTC © XXXX American Chemical Society A dx.doi.org/10.1021/ct400314y | J. Chem. Theory Comput. XXXX, XXX, XXX−XXX