Journal of VLSI Signal Processing, 4, 7-25 (1992) 9 1992 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. A Method for Implementation of One-Dimensional Systolic Algorithms with Data Contraflow Using Pipelined Functional Units MIGUEL VALERO-GARCI'A, JUAN J. NAVARRO, JOS]~ M. LLABERIA, MATEO VALERO AND TOMAS LANG Dept. Arquitectura de Computadores, Univ. Politdcnica de Catalunya (UPC), Gran Capit~ s/n, Mbdul D4, 08034 Barcelona Spain Received July 9, 1991; Revised September 16, 1991. Abstract. In this paper we present a method to implement one-dimensional Systolic Algorithms with data contra- flow using Pipelined Functional Units. Some procedures are proposed which permit the systematic application of the method. The paper includes an example of application of the method to a one-dimensional systolic algorithm with data contraflow for QR decomposition. I. Introduction During the last ten years, a lot of attention has been paid to the problem of automatic design of Systolic Algorithms (SAs) [1]-[5]. A method for automatic design takes a specification of the computation to be performed and produces a specification of the SA. Since the final objective is to implement the SA in hardware, aspects such as limitations in the number of PEs, fault tolerance or communication bandwidth limitations should be taken into account in the design procedure. Within this context, this paper proposes a method to transform SAs for efficient implementation using Pipelined Functional Units (PFUs). PFUs allow the improvement of the throughput of a processor because of the possibility to initiate a new operation before the previous one has been completed (if there are not conflicts in the use of the stages of the pipeline) [6]. Therefore, PFUs are attractive from the point of view of the hardware implementation of any kind of algorithms and, in particular, SAs. The method proposed in this paper for implementa- tion of SAs using PFUs is based on the transformation of the original SA to adapt it to the hardware to be used. Although the method is general and can be applied to any kind of SAs, the procedures to determine the re- quired transformations have been particularized in this paper for the case of one-dimensional (1D) SA with data contraflow. This work is supported by the Ministry of Education of Spain (CICYT TIC 299/89). Figure la shows the general structure of a 1D SA with data contraflow. Every cell is connected with its two neighbors in both directions and a delay of one systolic cycle is associated with every link between cells. Therefore, the results of the operation performed by cell i in cycle k are available in cells i - 1 and i + 1 in cycle k + 1. The delays between cells are represented by black rectangles in figure la. Since every cell performs one useful operation every 2 cycles, it is said that the SA is 2-slow [7]. Figure la also shows a63 a53 a43 052 a33 a42 a32 a,41 a22 a31 a21 all (a) RT1 RT2 RT3 RT4 X X XX x XX X X XX X X X X X X 1(0,1)=4 I(1.2)=4 l(2.3)=3 l(3.4)=2 1(2,1)=3 1(3.2)=5 l(4,3)=4 1(5,4)=4 (b) Fig. 1. (a) General structure of a 1D SA with data contraflow and simple synchronization, and (b) reservation tables and timing con- straints for a given implementation of SA using PFUs.