Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0). ICST-2020
Computing ψ-Caputo Fractional Derivative Values Using
CUDA 10
Vsevolod Bohaienko
[0000-0002-3317-9022]
V.M. Glushkov Institute of cybernetics of NAS of Ukraine, Kyiv, Ukraine
sevab@ukr.net
Abstract. The paper addresses the issues of efficient GPU-implementation
of ψ-Caputo fractional derivative values computation on NVIDIA GPU’s with
compute capability 7.5 using CUDA 10 SDK on both CUDA and OpenCL lan-
guages. We consider a three-dimensional time-fractional diffusion equation
solved by a locally one-dimensional finite difference scheme. To compute non-
local part of the derivative a rectangle rule quadrature is used and a summation
algorithm of linear computational complexity is considered along with a con-
stant complexity order approximating algorithm based on integral kernel expan-
sion into series. For the approximating algorithm we present a computational
scheme that uses NVidia GPU’s tensor cores. For both algorithms, we study the
influence of the used scalar and vector data types on performance and accuracy.
Studying the summation algorithm, comparing to the usage of 64-bit double-
precision floating-point data type, the computations were ~2 times faster for 32-
bit single-precision data type and ~3 times faster for 16-bit half-precision data
type without significant loss of accuracy. For the approximated algorithm that
was up to 5-times faster than the summation algorithm, the usage of low-
precision data types slightly influence the performance reducing the accuracy
during long-term simulations. The usage of vectorized operations in the approx-
imation algorithm allowed up to 6-19% speed-up compared with non-vectorized
implementations for a single-precision data type. The usage of tensor cores that
operate with a half-precision data type allowed performing calculations 12%
faster compared to the case when the same data type was used.
Keywords: GPU algorithms, finite-difference method, diffusion equation,
ψ-Caputo fractional derivative, tensor cores, data types, CUDA, OpenCL.
1 Introduction
Memory effects in diffusion processes can be efficiently simulated using time-
fractional differential equations [1-3].
Such equations contain the so-called fractional derivatives that are integral-
differential operators.
The need to numerically calculate integrals while solving time-fractional differen-
tial equations increase the computational complexity order compared to the traditional
differential equations.