Int J Parallel Prog DOI 10.1007/s10766-014-0319-4 Exploiting GPUs with the Super Instruction Architecture Nakul Jindal · Victor Lotrich · Erik Deumens · Beverly A. Sanders Received: 21 June 2013 / Accepted: 2 August 2014 © Springer Science+Business Media New York 2014 Abstract The Super Instruction Architecture (SIA) is a parallel programming envi- ronment designed for problems in computational chemistry involving complicated expressions defined in terms of tensors. Tensors are represented by multidimensional arrays which are typically very large. The SIA consists of a domain specific pro- gramming language, Super Instruction Assembly Language (SIAL), and its runtime system, Super Instruction Processor. An important feature of SIAL is that algorithms are expressed in terms of blocks (or tiles) of multidimensional arrays rather than indi- vidual floating point numbers. In this paper, we describe how the SIA was enhanced to exploit GPUs, obtaining speedups ranging from two to nearly four for computational chemistry calculations, thus saving hours of elapsed time on large-scale computations. The results provide evidence that the “programming-with-blocks” approach embodied in the SIA will remain successful in modern, heterogeneous computing environments. Keywords Parallel programming · Tensors · GPU · Domain specific language 1 Introduction A holy grail of parallel computing is to find ways to allow application programmers to express their algorithms at a convenient level of abstraction and obtain good per- formance when the program is executed. In addition, it is desirable to be able to easily port applications to new architectures as they become available. Of particular current interest are heterogeneous systems with accelerators. N. Jindal · B. A. Sanders (B ) Department of Computer and Information Science, University of Florida, Gainesville, FL, USA e-mail: sanders@cise.ufl.edu V. Lotrich · E. Deumens Department of Chemistry, University of Florida, Gainesville, FL, USA 123