Non-Consistent Dual Register Files to Reduce Register Pressure zy * zyx Josep Llosa, Mateo Valero and Eduard Ayguade Departament d’Arquitectura de Computadors Universitat Politkcnica de Catalunya Barcelona, SPAIN Abstract The continuous grow on instruction level paral- lelism offered by microprocessors requires a large reg- ister file and a large number of ports to access it. This paper presents the zyxwvutsrqpo non-consastent dual rrgrster file, an alternative implementation and management of the register file. Non-consistent dual register files support the bandwidth demands and the high register requirements, without penalizing neither access time nor implementation cost. The proposal is evaluated for software pipelined loops and compared against a unified register file. Empirical results show improve- ments on performance and a noticeable reduction of the density of memory traffic due to a reduction of the spill code. The spill code can in general increase the minimum initiation interval and decrease loop per- formance. Additional improvements can be obtained when the operations are scheduled having in mind the register file organization proposed in this paper. Keywords: VLIW and superscalar processors, soft- ware pipelining, register file organization, register al- location, spill code. 1 Introduction Current high-performance floating-point micropro- cessors try to maximize the exploitable parallelism by either heavily pipelining functional units[l] [2] or by making aggressive use of parallelism[3][4]. It is ex- pected that future high-performance microprocessors will make extensive use of both techniques. To effec- tively exploit this amount of available parallelism new processor organizations and scheduling techniques are required. Software pipelining[5] is a loop scheduling tech- nique that extracts parallelism from loops by overlap- ping the execution of several consecutive iterations. Finding the optimal solution is an NP-complete prob- lem and there exist several works that propose and evaluate different heuristic strategies to perform soft- ware pipelining[6][7]. The drawback of aggressive scheduling techniques such as software pipelining is that they increase regis- ter requirements compared to less aggressive and less effective scheduling techniques. In addition, increasing *This work was support-d by the Ministry of Education of Spain under the contract TIC zyxwvutsrq 880/92, by ESPRIT 6634 Ba sic Research Action (APPAKC) and by the CEPBA (European Center for Parallelism of Barcelona). 0-8186-6445-2/95 $04.00 zyxwvutsrqp 0 1995 IEEE either the stages of functional units or the number of functional units, which are the current trends in micro- processor design, tends to increase the number of reg- isters required by software pipelined loops[8][9]. When the number of registers required in a loop is larger than the available number of registers, spill code has to be introduced to reduce register usage. This spill code increases memory traffic and can reduce performance. Usually, registers are organized in a multiported register file as shown in Figure la. Each port of each functional unit has access to all the registers of the multiported register file. This register file organiza- tion can be expensive and increase processor cycle time when a large number of registers and ports are required. In order to reduce the complexity of the re ister file some microprocessors, such as the Power 2 RI, implement the register file with two register sub- files with the same number of registers, same number of write ports, but half the number of read ports into each register subfile (see Figure lb). This implemen- tation, which we name consistent dual register file, is totally transparent to the user/compiler because both register subfiles are consistent, i.e both store exactly the same value in the same registers. In this paper we modify the consistent dual regis- ter file organization so that each subfile can be ac- cessed independently of the other and store different values; this organization is shown in Figure IC and it is named non-consistent dual registerfile organization. Due to computational requirements, some values will be copied into both register subfiles as in the consis- tent dual register zyxwvu file organization; other values will be stored in just one of the register subfiles. In order to reduce the number of values stored in both regis- ter subfiles, and to balance the number of registers required in each subfile, we also evaluate the effective- ness of swapping operations. To evaluate the register file organization we have used a set of loops from the Perfect Club Benchmark suite [lo]. The outline of the paper is as follows. Section 2 in- troduces the architecture we are assuming and makes a brief overview of software pipelining, register alloca- tion and terminology associated with them. Section 3 presents the observations that motivated our proposal. In Section 4 the non-consistent register file organiza- tion is presented; an example loop is scheduled and it is used to show how our organization can reduce register requirements. Section 5 presents the experi- ments performed in order to evaluate the proposal and zyx 22