Future Generation Computer Systems 18 (2001) 55–67
Modeling and improving locality for the sparse-matrix–vector
product on cache memories
D.B. Heras
∗
, V. Blanco, J.C. Cabaleiro, F.F. Rivera
Departamento de Electrónica y Computación, Univ. Santiago de Compostela, 15706 Santiago de Compostela, Spain
Abstract
A model for representing and improving the locality exhibited by the execution of sparse irregular problems is developed
in this work. We focus on the product of a sparse matrix by a dense vector (SpM × V ). We consider the cache memory as
a representative level of the memory hierarchy. Locality is evaluated through four functions based on two parameters called
entry matches and line matches. In order to increase the locality, two algorithms are applied: one based on the construction of
minimum spanning trees and the other on the nearest-neighbor heuristic. These techniques were tested and compared with
some standard ordering algorithms. © 2001 Elsevier Science B.V. All rights reserved.
Keywords: Sparse matrix; Memory hierarchy; Cache memory; Locality; Ordering algorithms
1. Introduction
The architectures in current computers use the con-
cept of memory hierarchy to improve data access and
consequently to reduce the increasing gap between
processor and memory speeds. The characteristics that
determine the behavior of a program in a particular
memory hierarchy are its temporal and spatial locality.
Thus, the analysis of the data locality and the quest
for its increase are fundamental questions when trying
to improve the performance [17].
It has been demonstrated that the use of memory
hierarchy to increase the speed at which the data are
provided to the CPU is effective in the execution of
dense numerical codes. However, due to its indirect
addressing, memory hierarchy is generally considered
inefficient for irregular codes [21]. The reason is the
unpredictable behavior of the irregular accesses in
∗
Corresponding author.
E-mail addresses: dora@dec.usc.es (D.B. Heras),
vicente.blanco@dec.usc.es (V. Blanco), caba@dec.usc.es
(J.C. Cabaleiro), fran@dec.usc.es (F.F. Rivera).
such codes as this implies a lot of work in the devel-
opment of special implementations of the codes. On
the other hand, the task of evaluating and improving
their locality is specially difficult on multiprocessors.
On these machines there is an additional level of
memory hierarchy that must have been considered:
the one created by the memory accesses to data placed
in remote memories [13,22,23].
In this work we focus on the behavior of sparse
linear algebra algorithms. In particular we closely
examine the product of a sparse matrix by a dense
vector (SpM × V ) as this is one of the basic kernels in
many more complex sparse algebra codes, for exam-
ple, in the solution of linear systems through iterative
methods [2].
A large number of algorithms for evaluating and
optimizing data locality can be found in the litera-
ture. In the case of dense codes, most approaches are
based on decreasing conflict misses by using blocking,
strip-mining or other restructuring techniques [24].
Some of these techniques have been applied to some
particular irregular codes as, for example, to different
0167-739X/01/$ – see front matter © 2001 Elsevier Science B.V. All rights reserved.
PII:S0167-739X(00)00075-3