Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). ICST-2020 Computing ψ-Caputo Fractional Derivative Values Using CUDA 10 Vsevolod Bohaienko [0000-0002-3317-9022] V.M. Glushkov Institute of cybernetics of NAS of Ukraine, Kyiv, Ukraine sevab@ukr.net Abstract. The paper addresses the issues of efficient GPU-implementation of ψ-Caputo fractional derivative values computation on NVIDIA GPU’s with compute capability 7.5 using CUDA 10 SDK on both CUDA and OpenCL lan- guages. We consider a three-dimensional time-fractional diffusion equation solved by a locally one-dimensional finite difference scheme. To compute non- local part of the derivative a rectangle rule quadrature is used and a summation algorithm of linear computational complexity is considered along with a con- stant complexity order approximating algorithm based on integral kernel expan- sion into series. For the approximating algorithm we present a computational scheme that uses NVidia GPU’s tensor cores. For both algorithms, we study the influence of the used scalar and vector data types on performance and accuracy. Studying the summation algorithm, comparing to the usage of 64-bit double- precision floating-point data type, the computations were ~2 times faster for 32- bit single-precision data type and ~3 times faster for 16-bit half-precision data type without significant loss of accuracy. For the approximated algorithm that was up to 5-times faster than the summation algorithm, the usage of low- precision data types slightly influence the performance reducing the accuracy during long-term simulations. The usage of vectorized operations in the approx- imation algorithm allowed up to 6-19% speed-up compared with non-vectorized implementations for a single-precision data type. The usage of tensor cores that operate with a half-precision data type allowed performing calculations 12% faster compared to the case when the same data type was used. Keywords: GPU algorithms, finite-difference method, diffusion equation, ψ-Caputo fractional derivative, tensor cores, data types, CUDA, OpenCL. 1 Introduction Memory effects in diffusion processes can be efficiently simulated using time- fractional differential equations [1-3]. Such equations contain the so-called fractional derivatives that are integral- differential operators. The need to numerically calculate integrals while solving time-fractional differen- tial equations increase the computational complexity order compared to the traditional differential equations.