An Implementation of Loop Fusion Using Simple-SUIF John Bent University of Wisconsin, 1210 West Dayton Street, Madison, WI 53703 johnbent@cs.wisc.edu January 23, 2001 Abstract Loop fusion is a compiler optimization that merges the bodies of multiple loops into a single loop. Doing so in a serial program can reduce the loop overhead of the pro- gram as well as improve data locality and increase oppor- tunities for better cache utilization. Although not explored in this paper, it should be noted that loop fusion has been proven beneﬁcial as well in parallel programs [11]. 1 Introduction This project has been an even split between an implemen- tation project and a survey of some of the available litera- ture about the topic of loop fusion. Although the original intent was an implementation only project, I have done reading as necessary to learn more about the topic. I was originally motivated by a frustration I experienced dur- ing the implementation of the third project. While imple- menting invariant code motion and dead code elimination, I would occasionally construct test programs which con- tained so much redundancy and loop invariance that the resulting program would execute an empty loop. It was my original desire to eliminate these empty loops from the program entirely. However, due to the infrequency of empty loops within actual programs, I have broadened my focus and have added loop fusion to my original project three as well as the elimination of empty loops. My current compiler now correctly implements dead code elimination, useless as- signment, loop invariant code motion, dead loop elimina- tion and some opportunities for loop fusion. Although using Simple SUIF [5] greatly simpliﬁes the development of the compiler, we shall show that loop fu- sion remains a tricky problem. After discussing the im- plementation of dead loop elimination and of loop fusion, we will briefy consider some performance proﬁling using James Larus’ QPT tool [8]. Although we will see the de- sired performance beneﬁt to be gained by fusing loops, we will also consider why loop fusion should occasionally be avoided. 2 Dead Loop Elimination Although found only infrequently 1 in real programs, empty loops can occasionally be created by compiler opti- mizations such as useless code removal and invariant loop code motion. In these cases, the condition of the loop becomes essentially useless. A data ﬂow analysis of vari- able faintness [3] might be able to eliminate these useless instructions in the case in which the index is not live af- ter the loop. However, if the index variable is live after the loop, then loop elimination is impossible using faint variable analysis. To eliminate these useless loop condi- tion instructions requires both the ability to identify empty loops and knowledge of the eventual ﬁnal value of any live index variables. 2.1 Identifying Empty Loops In order to decide whether a loop 2 is empty, my algorithm ﬁrst identiﬁes all instructions which belong to the condi- tional portion of the loop. For a loop to be considered fusable, it must have an initial value which is incremented per iteration and compared against a ﬂag value. To ﬁnd loop condition instructions, I recognize that the compare instruction must be found at the back edge of the only node in the loop body that has a successor out of the loop. From this compare instruction, I ﬁnd both the ﬂag and the index variables for the loop. To distinguish between the ﬂag and the index variable I note that only the ﬂag should be deﬁned within the loop. In order to be a candidate for loop fusion, the index vari- able of the loop condition should not be deﬁned within the non-conditional body of the loop. Flowing backwards 1 Apologies to those programmers who use empty loops as an in- precise sleep. 2 Only natural [2] loops are considered. 1