Future Generation Computer Systems 18 (2001) 55–67 Modeling and improving locality for the sparse-matrix–vector product on cache memories D.B. Heras , V. Blanco, J.C. Cabaleiro, F.F. Rivera Departamento de Electrónica y Computación, Univ. Santiago de Compostela, 15706 Santiago de Compostela, Spain Abstract A model for representing and improving the locality exhibited by the execution of sparse irregular problems is developed in this work. We focus on the product of a sparse matrix by a dense vector (SpM × V ). We consider the cache memory as a representative level of the memory hierarchy. Locality is evaluated through four functions based on two parameters called entry matches and line matches. In order to increase the locality, two algorithms are applied: one based on the construction of minimum spanning trees and the other on the nearest-neighbor heuristic. These techniques were tested and compared with some standard ordering algorithms. © 2001 Elsevier Science B.V. All rights reserved. Keywords: Sparse matrix; Memory hierarchy; Cache memory; Locality; Ordering algorithms 1. Introduction The architectures in current computers use the con- cept of memory hierarchy to improve data access and consequently to reduce the increasing gap between processor and memory speeds. The characteristics that determine the behavior of a program in a particular memory hierarchy are its temporal and spatial locality. Thus, the analysis of the data locality and the quest for its increase are fundamental questions when trying to improve the performance [17]. It has been demonstrated that the use of memory hierarchy to increase the speed at which the data are provided to the CPU is effective in the execution of dense numerical codes. However, due to its indirect addressing, memory hierarchy is generally considered inefficient for irregular codes [21]. The reason is the unpredictable behavior of the irregular accesses in Corresponding author. E-mail addresses: dora@dec.usc.es (D.B. Heras), vicente.blanco@dec.usc.es (V. Blanco), caba@dec.usc.es (J.C. Cabaleiro), fran@dec.usc.es (F.F. Rivera). such codes as this implies a lot of work in the devel- opment of special implementations of the codes. On the other hand, the task of evaluating and improving their locality is specially difficult on multiprocessors. On these machines there is an additional level of memory hierarchy that must have been considered: the one created by the memory accesses to data placed in remote memories [13,22,23]. In this work we focus on the behavior of sparse linear algebra algorithms. In particular we closely examine the product of a sparse matrix by a dense vector (SpM × V ) as this is one of the basic kernels in many more complex sparse algebra codes, for exam- ple, in the solution of linear systems through iterative methods [2]. A large number of algorithms for evaluating and optimizing data locality can be found in the litera- ture. In the case of dense codes, most approaches are based on decreasing conflict misses by using blocking, strip-mining or other restructuring techniques [24]. Some of these techniques have been applied to some particular irregular codes as, for example, to different 0167-739X/01/$ – see front matter © 2001 Elsevier Science B.V. All rights reserved. PII:S0167-739X(00)00075-3