1
CONFLICT-FREE STRIDES FOR VECTORS IN MATCHED MEMORIES
MATEO VALERO, TOMÁS LANG, JOSÉ M. LLABERÍA,
MONTSE PEIRON, JUAN J. NAVARRO and EDUARD AYGUADÉ
Departament d’Arquitectura de Computadors, Universitat Politècnica de Catalunya,
Gran Capità s/n Mòdul D4, 08034 Barcelona, Spain
E-mail: mateo@ac.upc.es
ABSTRACT
Address transformation schemes, such as skewing and linear transformations, have been proposed to
achieve conflict-free access to one family of strides in vector processors with matched memories.
In this paper, we extend these schemes to achieve this conflict-free access for several families. The ba-
sic idea is to perform an out-of-order access to vectors of fixed length, equal to that of the vector reg-
isters of the processor. The hardware required is similar to that for the access in order.
Keywords: Conflict-free Access, Out-of-order Access, Parallel Memory Architecturess, Storage
Schemes, Temporal Distribution, Vector Access.
1. Introduction
To have a sufficient memory bandwidth, the memory of vector processors is organized
as several modules that can be accessed simultaneously. The memory is matched if the
number of memory modules is equal to the ratio between the memory cycle and the proces-
sor cycle, since in this case the peak memory throughput is one word per processor cycle.
However, to obtain this peak throughput, the request sequence has to be such that there are
no conflicts in the accesses. This is achieved, for example, with standard memory interleav-
ing and for vectors of odd strides. However, this is not the case for other strides, which has
motivated the proposal of other addressing schemes.
The two main address transformation schemes used to achieve conflict-free access to
other strides are skewing and linear transformations. These schemes were initially pro-
posed for array processors [1, 2] and later for multiprocessors [3], vector processors [4, 5],
and VLIW processors [6]. For vectors, they can provide conflict-free access to one family
of strides, where the family defined by x is the set of strides σ⋅2
x
with σ odd [7]. Moreover,
for the case in which different vectors are accessed with different strides, dynamic schemes
based on skewing [7] and on linear transformations [5] were proposed. Linear transforma-
tions have the advantage over skewing that the module number is simpler to compute.
Although out of the scope of this paper, it is worthwhile to mention that techniques have
also been proposed to improve efficiency for the cases in which conflict-free access is not
achieved. For the skewing and linear schemes mentioned above, peak memory throughput
can be obtained for x’ < x for long vectors by the use of buffers [4]. Moreover, schemes
based on linear transformations have been proposed to distribute randomly the modules
corresponding to consecutive addresses, so that the various strides do not produce cluster-
ing to memory modules [6, 8, 9]. Recently a proposal has been made [10] for an analytic
model that can be used to make comparisons among these methods. For both schemes, most
of the evaluations performed consider long vectors, so that the initial transient is not signif-
icant and the throughput is determined for the steady state. This throughput is evaluated as
Electronic version of an article published as :
Parallel processing letters, vol. 1, núm.2, 1991, p.95-102. DOI 10.1142/S0129626491000045
© World Scientific Publishing Company. http://www.worldscientific.com/doi/abs/10.1142/S0129626491000045