51 ACCELERATING SEMBLANCE COMPUTATIONS ON HETEROGENEOUS DEVICES USING OPENCL E. Borin, H. Cardoso da Silva, J.H. Faccipieri Jr., and M. Tygel email: edson@ic.unicamp.br, hercules.cardoso.silva1@gmail.com, jorge.faccipieri@gmail.com and tygel@ime.unicamp.br keywords: Semblance, CRS, CMP, GPU, OpenCL, HPC ABSTRACT The core of several seismic processing methods, such as the CRS and the CMP methods, is the com- putation of the traveltime and semblance functions. In this work we investigate the use of OpenCL to accelerate these computations on multicore CPUs, GPUs, and other hardware accelerators. Our experiments indicate that the OpenCL code is highly portable among different computing devices and the performance results suggests that GPUs are promising computing devices to accelerate the seismic processing methods that rely on high volumes of semblance computations. INTRODUCTION Several seismic processing techniques demand high amounts of data transfer, depending on the size of data to be considered, and also intensive computational power, depending on the operation complexity of the processes involved. In particular, imaging methods based on multiparametic traveltime stacking, such as the Common-Reflection-Surface (CRS) method, suffer from both difficulties. In fact, depending on three parameters in 2D case and eight in 3D, the computational cost associated with the CRS method renders its application unfeasible on large-scale seismic datasets routinely acquired by the oil industry. The estimation of the CRS parameters lies on computation of traveltime surfaces and associated sem- blance functions, which represent almost 100% of the computing time. Since these methods require a large amount of computation when processing real data, they are typically coded to be executed in parallel on clusters with multiple machines. Recent trends indicate that future computing systems will be composed by heterogeneous computing devices, including multicore Central Processing Units (CPUs), Graphical Processing Units (GPUs) and other hardware accelerators, such as the Intel Xeon Phi and Field Programmable Gate Arrays (FPGAs). However, in order to use the computing power available on these heterogeneous devices, existing programs will need to be adapted or, in some case, be completely rewritten using new programming frameworks. Ni and Yang (2012) used CUDA to accelerate the so-called 3D Output Imaging Scheme (CRS-OIS) method on GPUs and report that the code running on the GPU can be 10 to 220 times faster than the CPU when processing a synthetic data. When processing a real data, the GPU (Model c1060) is roughly 35 times faster than a CPU processing with only one of the cores. Marchetti et al. (2011) accelerated the search for the eight parameters on the CRS method using OpenCL to run the semblance and traveltime operations on a GPU. The authors reported that the GPU (Radeon HD 5870) is roughly 60 times faster than the CPU processing when processing a 3GB seismic data. Marchetti et al. (2010) used the Maxeler MaxCompiler tool to accelerate the CRS method using FPGAs. The authors reported that their solution is 200 to 230 times faster than the CPU when processing a seismic data from a land survey. In this work, we investigate how we can accelerate the computation of semblance operations using OpenCL, a parallel application program interface (API) designed to enable the same code to be executed