Research Article – SACJ No. 53, August 2014 45 Determining the diﬃculty of accelerating problems on a GPU Dale Tristram, Karen Bradshaw Department of Computer Science, Rhodes University, P. O. Box 94, Grahamstown, South Africa ABSTRACT General-purpose computation on graphics processing units (GPGPU) has great potential to accelerate many scientiﬁc models and algorithms. However, since some problems are considerably more diﬃcult to accelerate than others, ascertaining the eﬀort required to accelerate a particular problem is challenging. Through the acceleration of three typical scientiﬁc problems, seven problem attributes have been identiﬁed to assist in the evaluation of the diﬃculty of accelerating a problem on a GPU. These attributes are inherent parallelism, branch divergence, problem size, required computational parallelism, memory access pattern regularity, data transfer overhead, and thread cooperation. Using these attributes as diﬃculty indicators, an initial problem diﬃculty classiﬁcation framework has been created that aids in evaluating GPU acceleration diﬃculty. The diﬃculty estimates obtained by applying the classiﬁcation framework to the three case studies correlate well with the actual eﬀort expended in accelerating each problem. KEYWORDS: GPGPU, OpenCL, problem diﬃculty classiﬁcation CATEGORIES: D.1.3, D.2.8 1 INTRODUCTION Accelerating scientiﬁc problems on graphics processing units (GPUs) can result in orders of magnitude speedup over CPU-based solutions. With Cray’s Titan, the world’s second fastest su- percomputer 1 , making extensive use of GPUs, more sci- entists are likely to be interested in using these devices to accelerate their models and algorithms. However, because of the way in which GPUs have been designed, some problems are considerably harder to accelerate than others. For scientists unfamiliar with the architecture and programming of GPUs, the distinction between easily accelerated problems and those that are very diﬃcult (yet possible) to accelerate is likely to be unclear. For this reason, novice GPU programmers may be discour- aged if the performance achieved does not meet their expectations. One way of addressing this would be to create a problem diﬃculty classiﬁcation system that could provide users with information on the level of problem diﬃculty and the kind of knowledge and optimisations necessary to achieve satisfactory speedup on a GPU. However, in order to create such a system, we ﬁrst need to identify the problem attributes that are important in distinguishing the diﬀerent levels of diﬃ- culty. This paper sets out to determine some of these attributes through the acceleration of three diﬀerent problems and then validate them by applying the clas- siﬁcation framework to the problems accelerated. Email: Dale Tristram d.tristram@boost.za.net, Karen Brad- shaw k.bradshaw@ru.ac.za 1 http://www.top500.org/lists/2013/06/ Section 2 provides a brief overview of GPU com- puting. Sections 3, 4, and 5 detail the acceleration of a hydrological model, k -diﬀerence string matching, and a radix sort, respectively. Section 6 discusses the creation of the classiﬁcation framework and its appli- cation to the accelerated problems. Finally, Section 7 concludes. 2 GPU COMPUTING There are a few key diﬀerences between the architec- tures of common GPUs and CPUs that must be under- stood in the context of general-purpose computation on graphics processing units (GPGPU). A brief overview of the processing and memory models of a GPU is presented for some insight into these diﬀerences, as well as a high-level overview of the GPU computing framework used in this study. 2.1 Graphics Processing Units Although a number of diﬀerent GPU architectures exist, modern GPUs all share certain architectural similarities [1]. The AMD Radeon HD7970, hereafter referred to as the HD7970, is used as the reference GPU when explaining the general GPU processing and memory model. 2.1.1 GPU Processing Model One of the fundamental diﬀerences between GPUs and CPUs is the kind of processing that is prioritised, and consequently their respective number of processing units. Modern CPUs typically have between two and eight cores, and have been designed to maximise the