International Journal of Electrical and Computer Engineering (IJECE)
Vol. 6, No. 2, April 2016, pp. 887~894
ISSN: 2088-8708, DOI: 10.11591/ijece.v6i2.9575 887
Journal homepage: http://iaesjournal.com/online/index.php/IJECE
Exascale Message Passing Interface based Program Deadlock
Detection
Raed Al Dhubhani*, Fathy Eassa*, Faisal Saeed**
* Faculty of Computing and Information Technology, King Abdul-Aziz University, KSA
** Faculty of Computing, Universiti Teknologi Malaysia, Malaysia
Article Info ABSTRACT
Article history:
Received Oct 4, 2015
Revised Dec 27, 2015
Accepted Jan 16, 2016
Deadlock detection is one of the main issues of software testing in High
Performance Computing (HPC) and also inexascale computing areas in the
near future. Developing and testing programs for machines which have
millions of cores is not an easy task. HPC program consists of thousands (or
millions) of parallel processes which need to communicate with each other in
the runtime. Message Passing Interface (MPI) is a standard library which
provides this communication capability and it is frequently used in the HPC.
Exascale programs are expected to be developed using MPI standard library.
For parallel programs, deadlock is one of the expected problems. In this
paper, we discuss the deadlock detection for exascale MPI-based programs
where the scalability and efficiency are critical issues. The proposed method
detects and flags the processes and communication operations which are
potential to cause deadlocks in a scalable and efficient manner. MPI
benchmark programs were used to test the proposed method.
Keyword:
Deadlock detection
Exascale systems
Message Passing Interface
Copyright © 2016 Institute of Advanced Engineering and Science.
All rights reserved.
Corresponding Author:
Raed Al Dhubhani,
Faculty of Computing and Information Technology,
King Abdul-Aziz University, KSA
Email: raedsaeed@gmail.com
1. INTRODUCTION
Exascale Computing is considered as one of the recent research topics in HPC computing area.
Exascale computing refers to the capability to process 1 exaflop (10
18
floating point operations per second).
The computation capability of the current supercomputers is in the petaflops level, where 1 petaflop is
equivalent to 10
15
floating point operations per second. Manufacturing machines with this ambitious
computation capability depends on using hundreds of millions of cores to achieve that computational target,
which are expected to be in operation in 2020 [1]. The scientific and big data processing applications are
planned to be run in these machines. So, one of the challenges is how to develop reliable applications for this
parallel-based computation environment.
MPI is a standard library which is frequently used in the HPC. It is a standard library for HPC,
which is considered by [2] as the de facto standard for parallel programming in the HPC. According to [3],
MPI provides a set of functions or commands which are used by the parallel programs to facilitate the
communication between the processes in the runtime. The simple scenario of using the MPI library by a
parallel program is achieved by using the MPI_Send operation by one process to send a message to another
one in the same program, where the destination process receives the message using the MPI_Recvoperation.
To provide a rich communication environment for the applications, MPI library provides two different types
of communication: blocking and non-blocking communication. In the blocking communication, the sender
and receiver must wait for the communication operations to match each other before theycan proceed to
execute the next instruction.In the non-blocking communication, the sender and receiver can issue the
operations of the communication –MPI_Isend and MPI_Irecv- and proceed directly to execute the next