121 ISSN 0361-7688, Programming and Computer Software, 2019, Vol. 45, No. 3, pp. 121–132. © Pleiades Publishing, Ltd., 2019. Russian Text © The Author(s), 2019, published in Programmirovanie, 2019, Vol. 45, No. 3. DVM-Approach to the Automation of the Development of Parallel Programs for Clusters V. A. Bakhtin a,b, * and V. A. Krukov a,b, ** a Keldysh Institute of Applied Mathematics, Russian Academy of Sciences, Moscow, 125047 Russia b Moscow State University, Moscow, 119991 Russia * e-mail: bakhtin@keldysh.ru ** e-mail: krukov@keldysh.ru Received January 11, 2019; revised January 11, 2019; accepted January 11, 2019 Abstract—The DVM-approach to the development of parallel programs for heterogeneous computer clusters with accelerators. The basic capabilities of DVM and SAPFOR , which automate the parallelization of appli- cations, are discussed. DOI: 10.1134/S0361768819030034 1. INTRODUCTION The supercomputers in current use can be divided into four types [1]: − General purpose supercomputers. They are intended for solving problems with good or medium space–time localization of memory accesses. Space localization implies that the data to be used by the application shortly are located in memory closely (in terms of addresses) to the addresses of the data that are already used. Time localization implies that the data that are currently used will be shortly used again. For many problems solved using supercomputers, localization decreases. For example, the work with densely filled matrices is replaced by the work with sparse matrices. Capacity bandwidth supercomputers are intended for solving problems with poor space–time localization of memory and intensive memory access. The typical problems are Big Data, simulation of the operation of complex products and systems, and arti- ficial intelligence. − Reduced memory supercomputers. They have low capacity memory with low latency and improved performance compared with general purpose super- computers. They are intended for signal and image processing, data security, and deep learning. − Compute oriented supercomputers are intended for performing computations with good space–time localization of memory accesses. They should have large cache memory and may have low memory band- width/performance balance. Typical problems require the work with dense matrices. The performance of such supercomputers is well evaluated using the Lin- pack benchmark (the rating Top500 [2]). Each type of supercomputers uses specific hard- ware components. The general purpose supercomput- ers are usually clusters the nodes of which use multi- core processors. The capacity bandwidth supercom- puters typically use mass-multithreaded and vector microprocessors produced by Cray, NEC, or NUDT. The reduced memory supercomputers are based on programmable integrated circuits produced by Xilinx and Altera and application-specific integrated circuits. The compute oriented supercomputers typically use GPUs manufactured by NVIDIA or AMD and mass multicore processors Intel Xeon Phi. The main problems in using so different supercom- puters are the difficulty of application software devel- opment; portability of software to supercomputers of different types, the reliability of software executed on thousands of nodes, tens of thousands of processor cores, and accelerators, and the efficiency of parallel software. Presently, the following programming models are used for the development of programs for high perfor- mance computing on modern clusters: MPI (for map- ping a program to cluster nodes), POSIX Threads (for mapping a program to processor cores), and CUDA and OpenCL (for mapping a program to GPU cores). All these models require low-level programming. To map a program to all parallelization levels, the developer has to use a combination of these models, e.g., MPI+POSIX Threads+CUDA. Technically, it is easier to combine low-level programming models implemented using libraries than to combine high- level models implemented in high-level languages and compilers. However, such programs are much more difficult to develop, debug, maintain, and port to