121
ISSN 0361-7688, Programming and Computer Software, 2019, Vol. 45, No. 3, pp. 121–132. © Pleiades Publishing, Ltd., 2019.
Russian Text © The Author(s), 2019, published in Programmirovanie, 2019, Vol. 45, No. 3.
DVM-Approach to the Automation of the Development
of Parallel Programs for Clusters
V. A. Bakhtin
a,b,
* and V. A. Krukov
a,b,
**
a
Keldysh Institute of Applied Mathematics, Russian Academy of Sciences,
Moscow, 125047 Russia
b
Moscow State University, Moscow, 119991 Russia
* e-mail: bakhtin@keldysh.ru
** e-mail: krukov@keldysh.ru
Received January 11, 2019; revised January 11, 2019; accepted January 11, 2019
Abstract—The DVM-approach to the development of parallel programs for heterogeneous computer clusters
with accelerators. The basic capabilities of DVM and SAPFOR , which automate the parallelization of appli-
cations, are discussed.
DOI: 10.1134/S0361768819030034
1. INTRODUCTION
The supercomputers in current use can be divided
into four types [1]:
− General purpose supercomputers. They are
intended for solving problems with good or medium
space–time localization of memory accesses. Space
localization implies that the data to be used by the
application shortly are located in memory closely (in
terms of addresses) to the addresses of the data that are
already used. Time localization implies that the data that
are currently used will be shortly used again. For many
problems solved using supercomputers, localization
decreases. For example, the work with densely filled
matrices is replaced by the work with sparse matrices.
− Capacity bandwidth supercomputers are
intended for solving problems with poor space–time
localization of memory and intensive memory access.
The typical problems are Big Data, simulation of the
operation of complex products and systems, and arti-
ficial intelligence.
− Reduced memory supercomputers. They have
low capacity memory with low latency and improved
performance compared with general purpose super-
computers. They are intended for signal and image
processing, data security, and deep learning.
− Compute oriented supercomputers are intended
for performing computations with good space–time
localization of memory accesses. They should have
large cache memory and may have low memory band-
width/performance balance. Typical problems require
the work with dense matrices. The performance of
such supercomputers is well evaluated using the Lin-
pack benchmark (the rating Top500 [2]).
Each type of supercomputers uses specific hard-
ware components. The general purpose supercomput-
ers are usually clusters the nodes of which use multi-
core processors. The capacity bandwidth supercom-
puters typically use mass-multithreaded and vector
microprocessors produced by Cray, NEC, or NUDT.
The reduced memory supercomputers are based on
programmable integrated circuits produced by Xilinx
and Altera and application-specific integrated circuits.
The compute oriented supercomputers typically use
GPUs manufactured by NVIDIA or AMD and mass
multicore processors Intel Xeon Phi.
The main problems in using so different supercom-
puters are the difficulty of application software devel-
opment; portability of software to supercomputers of
different types, the reliability of software executed on
thousands of nodes, tens of thousands of processor
cores, and accelerators, and the efficiency of parallel
software.
Presently, the following programming models are
used for the development of programs for high perfor-
mance computing on modern clusters: MPI (for map-
ping a program to cluster nodes), POSIX Threads (for
mapping a program to processor cores), and CUDA
and OpenCL (for mapping a program to GPU cores).
All these models require low-level programming.
To map a program to all parallelization levels, the
developer has to use a combination of these models,
e.g., MPI+POSIX Threads+CUDA. Technically, it is
easier to combine low-level programming models
implemented using libraries than to combine high-
level models implemented in high-level languages and
compilers. However, such programs are much more
difficult to develop, debug, maintain, and port to