1
st
International Conference “From Scientific Computing to Computational Engineering”
1
st
IC-SCCE
Athens, 8-10 September, 2004
© IC-SCCE
MATRIX – VECTOR MULTIPLICATION ON A CLUSTER OF WORKSTATIONS
Theodoros Typou, Vasilis Stefanidis, Panagiotis Michailidis and Konstantinos Margaritis
Parallel and Distributed Processing Laboratory (PDP Lab)
University of Macedonia
P.O. BOX 1591, 54006 Thessaloniki, Greece
e-mail: {typou, bstefan, panosm, kmarg}@uom.gr, web page: http://zeus.it.uom.gr/pdp/
Keywords: Matrix – vector multiplication, Cluster of workstations, Message Passing Interface, Performance
prediction model.
Abstract. The multiplication of a vector by a matrix is the kernel operation in many algorithms used in
scientific computation. This paper outlines four parallel matrix – vector multiplication implementations on a
cluster of workstations. These parallel implementations are based on the dynamic master – worker paradigm.
Furthermore, the parallel implementations are analyzed experimentally using the Message Passing Interface
(MPI) library on two kinds of high performance cluster of workstations: homogeneous and heterogeneous. Also,
we propose a general analytical prediction model that can be used to predict the performance of the matrix –
vector implementation for two kinds of cluster (homogeneous and heterogeneous). The developed performance
model has been checked and it has been shown that this model is able to predict the parallel performance
accurately.
1 INTRODUCTION
The matrix-vector multiplication is one of the most important computational kernels in scientific computing.
Versions for serial computers have long been based on optimized primitives embodied in the kernels of
standard software packages, such as LINPACK [3]. A stable and fairly uniform set of appropriate kernels well-
suited to most serial machines makes these implementations hard to beat. This is not yet the case for parallel
computers, where the set of primitives can so readily change from one machine to another, but the block
algorithms of LAPACK [1] and ScaLAPACK [2] are one step in this direction. In this paper we study
implementations for matrix – vector multiplication on a cluster of workstations and develop a performance
prediction model for these implementations.
While some studies of distributed matrix – vector multiplication have been made [4,5], we felt that a separate
focused effort was required in our work for several reasons. First, we are developing four parallel matrix – vector
multiplication implementations based on dynamic master – worker paradigm. Further, these implementations
were executed using the Message Passing Interface (MPI) [6,7] library on two kinds of high performance cluster
of workstations: homogeneous and heterogeneous. Second, we are developing a heterogeneous performance
model for the four implementations that is general enough to cover performance evaluation of both
homogeneous and heterogeneous computations in a dedicated cluster of workstations. So, we consider the
homogeneous computing as a special case of heterogeneous computing.
The rest of this paper is organized as follows: Section 2 briefly presents heterogeneous computing model and
the metrics. Section 3 presents a performance prediction model for estimating the performance of the four matrix
– vector implementations on a cluster of workstations. Section 4 discusses the validation of the performance
prediction model with the experimental results. Finally, Section 5 contains our conclusions.
2 HETEROGENEOUS COMPUTING MODEL
A heterogeneous network (HN) can be abstracted as a connected graph HN(M,C), where
• M={M
1
, M
2
,…, M
p
} of heterogeneous workstations (p is the number of workstations). The computation
capacity of each workstation is determined by the power of its CPU, I/O and memory access speed.
• C is standard interconnection network for workstations, such as Fast Ethernet or an ATM network, where
the communication links between any pair of the workstations have the same bandwidth.
Based on the above definition, if a cluster consists of a set of identical workstations, the cluster is homogeneous.
2.1 Metrics
Metrics help to compare and characterize parallel computer systems. Metrics cited in this section are defined
and published in previous paper [8]. They can be roughly divided into characterization metrics and performance