International Journal of Electrical and Computer Engineering (IJECE) Vol. 6, No. 2, April 2016, pp. 887~894 ISSN: 2088-8708, DOI: 10.11591/ijece.v6i2.9575 887 Journal homepage: http://iaesjournal.com/online/index.php/IJECE Exascale Message Passing Interface based Program Deadlock Detection Raed Al Dhubhani*, Fathy Eassa*, Faisal Saeed** * Faculty of Computing and Information Technology, King Abdul-Aziz University, KSA ** Faculty of Computing, Universiti Teknologi Malaysia, Malaysia Article Info ABSTRACT Article history: Received Oct 4, 2015 Revised Dec 27, 2015 Accepted Jan 16, 2016 Deadlock detection is one of the main issues of software testing in High Performance Computing (HPC) and also inexascale computing areas in the near future. Developing and testing programs for machines which have millions of cores is not an easy task. HPC program consists of thousands (or millions) of parallel processes which need to communicate with each other in the runtime. Message Passing Interface (MPI) is a standard library which provides this communication capability and it is frequently used in the HPC. Exascale programs are expected to be developed using MPI standard library. For parallel programs, deadlock is one of the expected problems. In this paper, we discuss the deadlock detection for exascale MPI-based programs where the scalability and efficiency are critical issues. The proposed method detects and flags the processes and communication operations which are potential to cause deadlocks in a scalable and efficient manner. MPI benchmark programs were used to test the proposed method. Keyword: Deadlock detection Exascale systems Message Passing Interface Copyright © 2016 Institute of Advanced Engineering and Science. All rights reserved. Corresponding Author: Raed Al Dhubhani, Faculty of Computing and Information Technology, King Abdul-Aziz University, KSA Email: raedsaeed@gmail.com 1. INTRODUCTION Exascale Computing is considered as one of the recent research topics in HPC computing area. Exascale computing refers to the capability to process 1 exaflop (10 18 floating point operations per second). The computation capability of the current supercomputers is in the petaflops level, where 1 petaflop is equivalent to 10 15 floating point operations per second. Manufacturing machines with this ambitious computation capability depends on using hundreds of millions of cores to achieve that computational target, which are expected to be in operation in 2020 [1]. The scientific and big data processing applications are planned to be run in these machines. So, one of the challenges is how to develop reliable applications for this parallel-based computation environment. MPI is a standard library which is frequently used in the HPC. It is a standard library for HPC, which is considered by [2] as the de facto standard for parallel programming in the HPC. According to [3], MPI provides a set of functions or commands which are used by the parallel programs to facilitate the communication between the processes in the runtime. The simple scenario of using the MPI library by a parallel program is achieved by using the MPI_Send operation by one process to send a message to another one in the same program, where the destination process receives the message using the MPI_Recvoperation. To provide a rich communication environment for the applications, MPI library provides two different types of communication: blocking and non-blocking communication. In the blocking communication, the sender and receiver must wait for the communication operations to match each other before theycan proceed to execute the next instruction.In the non-blocking communication, the sender and receiver can issue the operations of the communication –MPI_Isend and MPI_Irecv- and proceed directly to execute the next