1 st International Conference “From Scientific Computing to Computational Engineering” 1 st IC-SCCE Athens, 8-10 September, 2004 © IC-SCCE MATRIX – VECTOR MULTIPLICATION ON A CLUSTER OF WORKSTATIONS Theodoros Typou, Vasilis Stefanidis, Panagiotis Michailidis and Konstantinos Margaritis Parallel and Distributed Processing Laboratory (PDP Lab) University of Macedonia P.O. BOX 1591, 54006 Thessaloniki, Greece e-mail: {typou, bstefan, panosm, kmarg}@uom.gr, web page: http://zeus.it.uom.gr/pdp/ Keywords: Matrix – vector multiplication, Cluster of workstations, Message Passing Interface, Performance prediction model. Abstract. The multiplication of a vector by a matrix is the kernel operation in many algorithms used in scientific computation. This paper outlines four parallel matrix – vector multiplication implementations on a cluster of workstations. These parallel implementations are based on the dynamic master – worker paradigm. Furthermore, the parallel implementations are analyzed experimentally using the Message Passing Interface (MPI) library on two kinds of high performance cluster of workstations: homogeneous and heterogeneous. Also, we propose a general analytical prediction model that can be used to predict the performance of the matrix – vector implementation for two kinds of cluster (homogeneous and heterogeneous). The developed performance model has been checked and it has been shown that this model is able to predict the parallel performance accurately. 1 INTRODUCTION The matrix-vector multiplication is one of the most important computational kernels in scientific computing. Versions for serial computers have long been based on optimized primitives embodied in the kernels of standard software packages, such as LINPACK [3]. A stable and fairly uniform set of appropriate kernels well- suited to most serial machines makes these implementations hard to beat. This is not yet the case for parallel computers, where the set of primitives can so readily change from one machine to another, but the block algorithms of LAPACK [1] and ScaLAPACK [2] are one step in this direction. In this paper we study implementations for matrix – vector multiplication on a cluster of workstations and develop a performance prediction model for these implementations. While some studies of distributed matrix – vector multiplication have been made [4,5], we felt that a separate focused effort was required in our work for several reasons. First, we are developing four parallel matrix – vector multiplication implementations based on dynamic master – worker paradigm. Further, these implementations were executed using the Message Passing Interface (MPI) [6,7] library on two kinds of high performance cluster of workstations: homogeneous and heterogeneous. Second, we are developing a heterogeneous performance model for the four implementations that is general enough to cover performance evaluation of both homogeneous and heterogeneous computations in a dedicated cluster of workstations. So, we consider the homogeneous computing as a special case of heterogeneous computing. The rest of this paper is organized as follows: Section 2 briefly presents heterogeneous computing model and the metrics. Section 3 presents a performance prediction model for estimating the performance of the four matrix – vector implementations on a cluster of workstations. Section 4 discusses the validation of the performance prediction model with the experimental results. Finally, Section 5 contains our conclusions. 2 HETEROGENEOUS COMPUTING MODEL A heterogeneous network (HN) can be abstracted as a connected graph HN(M,C), where M={M 1 , M 2 ,…, M p } of heterogeneous workstations (p is the number of workstations). The computation capacity of each workstation is determined by the power of its CPU, I/O and memory access speed. C is standard interconnection network for workstations, such as Fast Ethernet or an ATM network, where the communication links between any pair of the workstations have the same bandwidth. Based on the above definition, if a cluster consists of a set of identical workstations, the cluster is homogeneous. 2.1 Metrics Metrics help to compare and characterize parallel computer systems. Metrics cited in this section are defined and published in previous paper [8]. They can be roughly divided into characterization metrics and performance