Int J Parallel Prog
DOI 10.1007/s10766-016-0425-6
Automatic CPU/GPU Generation of Multi-versioned
OpenCL Kernels for C++ Scientific Applications
Rafael Sotomayor
1
· Luis Miguel Sanchez
1
·
Javier Garcia Blas
1
· Javier Fernandez
1
·
J. Daniel Garcia
1
Received: 3 September 2015 / Accepted: 31 March 2016
© Springer Science+Business Media New York 2016
Abstract Parallelism has become one of the most extended paradigms used to improve
performance. However, it forces software developers to adapt applications and coding
mechanisms to exploit the available computing devices. Legacy source code needs to
be re-written to take advantage of multi- core and many-core computing devices. Writ-
ing parallel applications in a traditional way is hard, expensive, and time consuming.
Furthermore, there is often more than one possible transformation or optimization that
can be applied to a single piece of legacy code. Therefore many parallel versions of
the same original sequential code need to be considered. In this paper, we describe an
automatic parallel source code generation workflow (REWORK) for parallel hetero-
geneous platforms. REWORK automatically identifies promising kernels on legacy
C++ source code and generates multiple specific versions of kernels for improving
C++ applications, selecting the most adequate version based on both static source code
and target platform characteristics.
Keywords OpenCL · C++ · Multi-versioning · Code generation
1 Introduction
Heterogeneous parallel platforms are composed by traditionally multi-core processors
and computing accelerator devices, such as GPUs and FPGAs. Those accelerators
B Luis Miguel Sanchez
lmsanche@inf.uc3m.es
Javier Garcia Blas
fjblas@inf.uc3m.es
J. Daniel Garcia
josedaniel.garcia@uc3m.es
1
University Carlos III of Madrid, Av. de la Universidad, 30, 28911 Leganes, Madrid, Spain
123