Programming CUDA-based GPUs to simulate two-layer shallow water flows Marc de la Asunci´on 1 , Jos´ e M. Mantas 1 , and Manuel J. Castro 2 1 Dpto. Lenguajes y Sistemas Inform´ aticos, Universidad de Granada 2 Dpto. An´ alisis Matem´ atico, Universidad de M´ alaga Abstract. The two-layer shallow water system is used as the numeri- cal model to simulate several phenomena related to geophysical flows such as the steady exchange of two different water flows, as occurs in the Strait of Gibraltar, or the tsunamis generated by underwater land- slides. The numerical solution of this model for realistic domains imposes great demands of computing power and modern Graphics Processing Units (GPUs) have demonstrated to be a powerful accelerator for this kind of computationally intensive simulations. This work describes an accelerated implementation of a first order well-balanced finite volume scheme for 2D two-layer shallow water systems using GPUs supporting the CUDA (Compute Unified Device Architecture) programming model and double precision arithmetic. This implementation uses the CUDA framewok to exploit efficiently the potential fine-grain data parallelism of the numerical algorithm. Two versions of the GPU solver are imple- mented and studied: one using both single and double precision, and another using only double precision. Numerical experiments show the efficiency of this CUDA solver on several GPUs and a comparison with an efficient multicore CPU implementation of the solver is also reported. 1 Introduction The two-layer shallow water system of partial differential equations governs the flow of two superposed shallow layers of immiscible fluids with different constant densities. This mathematical model is used as the numerical model to simulate several phenomena related to stratified geophysical flows such as the steady exchange of two different water flows, as occurs in the Strait of Gibraltar [4], or the tsunamis generated by underwater landslides [14]. The numerical resolution of two-layer or multilayer shallow water systems has been object of an intense research during the last years: see for instance [1–4, 14]. The numerical solution of these equations in realistic applications, where big domains are simulated in space and time, is computationally very expensive. This fact and the degree of parallelism which these numerical schemes exhibit suggest the design of parallel versions of the schemes for parallel machines in order to solve and analyze these problems in reasonable execution times. In this paper, we tackle the acceleration of a finite volume numerical scheme to solve two-layer shallow water systems. This scheme has been parallelized and