1 CONFLICT-FREE STRIDES FOR VECTORS IN MATCHED MEMORIES MATEO VALERO, TOMÁS LANG, JOSÉ M. LLABERÍA, MONTSE PEIRON, JUAN J. NAVARRO and EDUARD AYGUADÉ Departament d’Arquitectura de Computadors, Universitat Politècnica de Catalunya, Gran Capità s/n Mòdul D4, 08034 Barcelona, Spain E-mail: mateo@ac.upc.es ABSTRACT Address transformation schemes, such as skewing and linear transformations, have been proposed to achieve conflict-free access to one family of strides in vector processors with matched memories. In this paper, we extend these schemes to achieve this conflict-free access for several families. The ba- sic idea is to perform an out-of-order access to vectors of fixed length, equal to that of the vector reg- isters of the processor. The hardware required is similar to that for the access in order. Keywords: Conflict-free Access, Out-of-order Access, Parallel Memory Architecturess, Storage Schemes, Temporal Distribution, Vector Access. 1. Introduction To have a sufficient memory bandwidth, the memory of vector processors is organized as several modules that can be accessed simultaneously. The memory is matched if the number of memory modules is equal to the ratio between the memory cycle and the proces- sor cycle, since in this case the peak memory throughput is one word per processor cycle. However, to obtain this peak throughput, the request sequence has to be such that there are no conflicts in the accesses. This is achieved, for example, with standard memory interleav- ing and for vectors of odd strides. However, this is not the case for other strides, which has motivated the proposal of other addressing schemes. The two main address transformation schemes used to achieve conflict-free access to other strides are skewing and linear transformations. These schemes were initially pro- posed for array processors [1, 2] and later for multiprocessors [3], vector processors [4, 5], and VLIW processors [6]. For vectors, they can provide conflict-free access to one family of strides, where the family defined by x is the set of strides σ⋅2 x with σ odd [7]. Moreover, for the case in which different vectors are accessed with different strides, dynamic schemes based on skewing [7] and on linear transformations [5] were proposed. Linear transforma- tions have the advantage over skewing that the module number is simpler to compute. Although out of the scope of this paper, it is worthwhile to mention that techniques have also been proposed to improve efficiency for the cases in which conflict-free access is not achieved. For the skewing and linear schemes mentioned above, peak memory throughput can be obtained for x’ < x for long vectors by the use of buffers [4]. Moreover, schemes based on linear transformations have been proposed to distribute randomly the modules corresponding to consecutive addresses, so that the various strides do not produce cluster- ing to memory modules [6, 8, 9]. Recently a proposal has been made [10] for an analytic model that can be used to make comparisons among these methods. For both schemes, most of the evaluations performed consider long vectors, so that the initial transient is not signif- icant and the throughput is determined for the steady state. This throughput is evaluated as Electronic version of an article published as : Parallel processing letters, vol. 1, núm.2, 1991, p.95-102. DOI 10.1142/S0129626491000045 © World Scientific Publishing Company. http://www.worldscientific.com/doi/abs/10.1142/S0129626491000045