mathematics Article Cross-Platform GPU-Based Implementation of Lattice Boltzmann Method Solver Using ArrayFire Library Michal Takᡠc and Ivo Petráš *   Citation: Takᡠc, M.; Petráš, I. Cross-Platform GPU-Based Implementation of Lattice Boltzmann Method Solver Using ArrayFire Library. Mathematics 2021, 9, 1793. https://doi.org/10.3390/math9151793 Academic Editor: Panagiota Tsompanopoulou Received: 31 May 2021 Accepted: 26 July 2021 Published: 28 July 2021 Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affil- iations. Copyright: © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). Faculty BERG, Technical University of Košice, Nˇ emcovej 3, 042 00 Košice, Slovakia; michal.takac@tuke.sk * Correspondence: ivo.petras@tuke.sk; Tel.: +421-55-602-5194 Abstract: This paper deals with the design and implementation of cross-platform, D2Q9-BGK and D3Q27-MRT, lattice Boltzmann method solver for 2D and 3D flows developed with ArrayFire library for high-performance computing. The solver leverages ArrayFire’s just-in-time compilation engine for compiling high-level code into optimized kernels for both CUDA and OpenCL GPU backends. We also provide C++ and Rust implementations and show that it is possible to produce fast cross- platform lattice Boltzmann method simulations with minimal code, effectively less than 90 lines of code. An illustrative benchmarks (lid-driven cavity and Kármán vortex street) for single and double precision floating-point simulations on 4 different GPUs are provided. Dataset License: MIT Keywords: lattice Boltzmann method (LBM); computational fluid dynamics (CFD); parallel comput- ing; graphics processing unit (GPU) computing; ArrayFire library; numerical analysis 1. Introduction Popularity of the lattice Boltzmann method (LBM) has steadily grown since its incep- tion from lattice gas automata [1] more than three decades ago. The lattice gas automata are a type of cellular automaton used to simulate fluid flows and they were the precursor to the LBM. From lattice gas automata it is possible to derive the macroscopic Navier–Stokes equa- tions. A disadvantage of the lattice gas automata method is the statistical noise. Another problem is the difficulty in expanding the model to 3D case. Because of these reasons the LBM started to rise in early 1990s as an alternative procedure [2]. As a mesoscopic method, filling the gap between macroscopic Navier–Stokes solvers and microscopic molecular dynamics, it has been an important tool for numerical simulations of multi-component, multiphase flows [35], flows in porous media [6,7], turbulent flows [8], and lately for more complex flows of fluids, as for instance, Bose–Einstein condensate [9], interaction of (2+1)-dimensional solitons [10], or modeling of viscous quasi-incompressible flows [11]. Thanks to the computational simplicity of LBM and its spatial and temporal locality, it is naturally suited to parallel computing [12]. Recently, the increase in computational power and advances in general-purpose computing on GPUs (GPGPU) opened the door for real-time and interactive computational fluid dynamics (CFD) simulations [1317]. Together with the performance and speed of the LBM method, it is now possible to compute more than several hundreds of iterations per second which makes an interaction with the simulation in progress possible [18]. Getting instant feedback according to the change of various parameters in simulation gives researchers the ability to iterate faster toward the creation of accurate model, better understanding of underlying phenomena, or employing simulation within the control of industrial systems. It is, therefore, desirable to push the limits of execution speed of LBM simulations. Developers have to be careful with the memory limitations, even though GPUs provide high memory bandwidth, as LBM algorithms tend to consume large amounts of memory for storing the data. GPU architecture is designed for high Mathematics 2021, 9, 1793. https://doi.org/10.3390/math9151793 https://www.mdpi.com/journal/mathematics