1 Performance of Direct GPU Computation of Small Angle Scattering Profile Konstantin Berlin *†‡ , Nail A. Gumerov , Ramani Duraiswami , David Fushman *† * Department of Chemistry and Biochemistry, Center for Biomolecular Structure and Organization, University of Maryland, College Park, MD 20742, USA Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742, USA Contact Email: kberlin@umd.edu Abstract—Small Angle Scattering (SAS) of X-rays or neutrons is an experimental technique that provides valuable structural information for biological macromolecules under physiological conditions and with no limitation on the molecular size. In order to refine molecular structure against experimental SAS data, ab initio prediction of the scattering profile must be recomputed hundreds of thousands of times, which involves the computation of the sinc kernel over all pairs of atoms in a molecule. The quadratic computational complexity of predicting the SAS profile limits the size of the molecules and and has been a major impediment for integration of SAS data into structure refinement protocols. In order to significantly speed up prediction of the SAS profile we present a general purpose graphical processing unit (GPU) algorithm, written in OpenCL, for the summation of the sinc kernel (Debye summation) over all pairs of atoms. This program is an order of magnitude faster than a parallel CPU algorithm, and faster than an FMM-like approximation method for certain input domains. We show that our algorithm is currently the fastest method for performing SAS computation for small and medium size molecules (around 50000 atoms or less). This algorithm is critical for quick and accurate SAS profile computation of elongated structures, such as DNA, RNA, and sparsely spaced pseudo-atom molecules. I NTRODUCTION Accurate characterization of biomolecular structures in so- lution is required for understanding their biological function and for therapeutic applications. Small-angle scattering (SAS) of X-rays and neutrons indirectly measures distribution of interatomic distances in a molecule [1]. Unlike high-resolution techniques, such as X-ray crystallography and solution NMR, SAS allows the study of molecules and their interactions under native physiological conditions and with essentially no limitation on the size of the system under investigation. Though providing a powerful and unique set of experimental structural constraints, SAS is limited in resolution, as well as the ambiguity in deconvolution of interatomic distances from experimental data. It does not provide enough data to derive a high resolution molecular structure, but due to the ease of collecting SAS data at various conditions and the unique scale of atomic distance information, it is an extremely promising complement to the high-resolution techniques [2]. Solution SAS studies have become increasingly popular, with the applications covering a broad range, including struc- ture refinement of biological macromolecules and their com- plexes [3], [4], [5], [6], and analysis of conformational ensem- bles and flexibility in solution [7], [8], [9], [10]. Finally, SAS is starting to be used in high-throughput biological applications [11], [12]. In order to use SAS for structural refinement, the scattering profile of the SAS experiment needs to be predicted ab initio from molecular structure, which requires computing all-pairs interactions of the atoms in the molecule (also referred to as an N-body problem). In addition, the profile must be recomputed hundreds of thousands of times in an iterative structure refinement algorithm, ensemble analysis [3], [13], or for thousands of different structures in a high-throughput method. For such applications serial computation of the scat- tering profile becomes prohibitive, even for smaller molecules. Several approximation algorithms have been proposed to speed up this computation [14], [15], [13], [16]. However, depending on the diameter of the molecule, the approximations can introduce significant errors (see [17] for theoretical limi- tations of these algorithm). Recently a hierarchical harmonic expansion method based on the fast multipole method (FMM) has been shown to have superior asymptotical performance than all previously proposed approximation methods [17], while maintaining any prescribed accuracy. However, the algo- rithm is very complex, which makes it difficult to implement and parallelize. Here we describe a parallelization of the direct computation of SAS profile onto readily affordable graphical processing cards (GPUs) that provides a dramatic improvement in the efficiency of the scattering profile prediction. While CUDA [18] GPU parallelization has been proposed for a similar prob- lem of powder diffraction [19], we extend GPU parallelization to SAXS/WAXS/SANS using an open GPU programming standard, OpenCL [20], that makes our software compatible with most modern desktop and laptop systems. We show that our GPU implementation is an order of magnitude faster than the parallelized CPU version, and faster than the parallelized CPU version of the hierarchical harmonic expansion.