Research Article – SACJ No. 53, August 2014 45 Determining the difficulty of accelerating problems on a GPU Dale Tristram, Karen Bradshaw Department of Computer Science, Rhodes University, P. O. Box 94, Grahamstown, South Africa ABSTRACT General-purpose computation on graphics processing units (GPGPU) has great potential to accelerate many scientific models and algorithms. However, since some problems are considerably more difficult to accelerate than others, ascertaining the effort required to accelerate a particular problem is challenging. Through the acceleration of three typical scientific problems, seven problem attributes have been identified to assist in the evaluation of the difficulty of accelerating a problem on a GPU. These attributes are inherent parallelism, branch divergence, problem size, required computational parallelism, memory access pattern regularity, data transfer overhead, and thread cooperation. Using these attributes as difficulty indicators, an initial problem difficulty classification framework has been created that aids in evaluating GPU acceleration difficulty. The difficulty estimates obtained by applying the classification framework to the three case studies correlate well with the actual effort expended in accelerating each problem. KEYWORDS: GPGPU, OpenCL, problem difficulty classification CATEGORIES: D.1.3, D.2.8 1 INTRODUCTION Accelerating scientific problems on graphics processing units (GPUs) can result in orders of magnitude speedup over CPU-based solutions. With Cray’s Titan, the world’s second fastest su- percomputer 1 , making extensive use of GPUs, more sci- entists are likely to be interested in using these devices to accelerate their models and algorithms. However, because of the way in which GPUs have been designed, some problems are considerably harder to accelerate than others. For scientists unfamiliar with the architecture and programming of GPUs, the distinction between easily accelerated problems and those that are very difficult (yet possible) to accelerate is likely to be unclear. For this reason, novice GPU programmers may be discour- aged if the performance achieved does not meet their expectations. One way of addressing this would be to create a problem difficulty classification system that could provide users with information on the level of problem difficulty and the kind of knowledge and optimisations necessary to achieve satisfactory speedup on a GPU. However, in order to create such a system, we first need to identify the problem attributes that are important in distinguishing the different levels of diffi- culty. This paper sets out to determine some of these attributes through the acceleration of three different problems and then validate them by applying the clas- sification framework to the problems accelerated. Email: Dale Tristram d.tristram@boost.za.net, Karen Brad- shaw k.bradshaw@ru.ac.za 1 http://www.top500.org/lists/2013/06/ Section 2 provides a brief overview of GPU com- puting. Sections 3, 4, and 5 detail the acceleration of a hydrological model, k -difference string matching, and a radix sort, respectively. Section 6 discusses the creation of the classification framework and its appli- cation to the accelerated problems. Finally, Section 7 concludes. 2 GPU COMPUTING There are a few key differences between the architec- tures of common GPUs and CPUs that must be under- stood in the context of general-purpose computation on graphics processing units (GPGPU). A brief overview of the processing and memory models of a GPU is presented for some insight into these differences, as well as a high-level overview of the GPU computing framework used in this study. 2.1 Graphics Processing Units Although a number of different GPU architectures exist, modern GPUs all share certain architectural similarities [1]. The AMD Radeon HD7970, hereafter referred to as the HD7970, is used as the reference GPU when explaining the general GPU processing and memory model. 2.1.1 GPU Processing Model One of the fundamental differences between GPUs and CPUs is the kind of processing that is prioritised, and consequently their respective number of processing units. Modern CPUs typically have between two and eight cores, and have been designed to maximise the