Comparing Many–Core Accelerator Frameworks * G. Haase, A. Kucher, and M. Liebmann Institute for Mathematics and Scientific Computing, University of Graz, 8061 Graz, Austria Feb. 15, 2012 Abstract GPUs as general purpose processors already are well adopted in scien- tific and high performance computing. Their steadily increasing success caused others than GPU hardware vendors to work on many–core proces- sors as hardware accelerators. With CUDA and OpenCL there are two frameworks available for GPU programming. Apart from potential com- patibility problems with the upcoming hardware, both frameworks share a common disadvantage. It is hard to program them efficiently and it can be even harder to maintain them in existing large applications. PGI Accelerator and HMPP Workbench are two frameworks with an abstract programming model, similar to OpenMP, that allow the porting of existing sequential codes by means of preprocessor directives. Depending on the target architecture, i.e., the hardware accelerator, a code generator uses these directives to generate hardware accelerator code. In this technical report we will present these frameworks and evaluate them in terms of performance and applicability. It will turn out, that PGI Accelerator and HMPP Workbench give similar performance results. The code generator of PGI Accelerator can perform a number of optimization strategies automatically, but HMPP Workbench is more sophisticated re- garding the spectrum of target architectures and the applicability to already existing codes. 1 Introduction In the last years, GPUs have significantly evolved from processing units dedicated to computer graphics to general purpose GPUs (gpGPUs), well applicable to many problems in scientific computing. By Nov. 2011, already three of the top five supercomputer systems worldwide are equipped with gpGPUs dedicated to high performance computing. 1 There are two major and well established programming interfaces for gpGPUs, the proprietary Compute Unified Device Architecture (CUDA) and the Open * This research was funded by the RP-7 (Cleansky) call: SP1-JTI-CS-2010-1-GRA-02-008 and by the Austrian Science Fund (FWF): F32-N18, by the RP-7 program: SP1-JTI-CS-2010-1 1 http://www.top500.org/ 1