Exploring the Structure of the Space of Compilation Sequences Using Randomized Search Algorithms † Keith D. Cooper Alexander Grosul Timothy J. Harvey Steve Reeves Devika Subramanian Linda Torczon Todd Waterman Rice University Houston, Texas, USA Abstract Modern optimizing compilers apply a fixed sequence of optimizations, which we call a compilation sequence, to each program that they compile. These compilers let the user modify their behavior in a small num- ber of specified ways, using command-line flags (e.g., -O1, -O2, ... ). For five years, we have been working with compilers that automatically select an appro- priate compilation sequence for each input program. These adaptive compilers discover a good compilation sequence tailored to the input program, the target machine, and a user-chosen objective function. We have shown, as have others, that program-specific se- quences can produce better results than any single universal sequence [1, 23, 7, 10, 21] Our adaptive compiler looks for compilation se- quences in a large and complex search space. Its typical compilation sequence includes 10 passes (with possible repeats) chosen from the 16 available—there are 16 10 or 1,099,511,627,776 such sequences. To learn about the properties of such spaces, we have studied subspaces that consist of 10 passes drawn from a set of 5 (5 10 or 9,765,625 sequences). These 10- of-5 subspaces are small enough that we can analyze them thoroughly but large enough to reflect impor- tant properties of the full spaces. This paper reports, in detail, on our analysis of several of these subspaces and on the consequences of those observed properties for the design of search algorithms. 1 Compilation Sequences Compilers operate by applying a fixed sequence of op- timizations, called a compilation sequence, to all pro- grams. The compiler writer must select ten to twenty optimizations from the hundreds that have been pro- † This work has been supported by the Los Alamos Com- puter Science Institute and by the National Science Foundation through grant CCR-0205303. posed in the literature; then the compiler writer must select an order in which they should execute. Choos- ing the right optimizations and the right order for one specific program is hard. The compiler writer must choose a set that works well for all programs. The compiler writer must choose a limited set of techniques to include in the default compilation sequence. A given optimization only improves pro- grams exhibiting the inefficiencies that it targets. For one program, the problem of optimization choice can be solved: pick techniques that address that code’s inefficiencies. The compiler’s “universal” sequence, however, must work well for all programs. Thus, these sequences tend to include optimizations that are broadly applicable rather than high-payoff tech- niques with a more narrow focus. 1 As Robison ob- serves: “Compile-time program optimizations are sim- ilar to poetry: more are written than are actually published in commercial compilers” [19]. The compiler writer must also pick an order in which to execute the optimizations. We have little theoretical understanding of the effect of a given com- pilation sequence on the properties of the compiled code that it produces. The interactions and inter- dependences between optimizations are complex and uncharacterized. Transformation a may create op- portunities for later application of b; alternately, it may eliminate opportunities for another transforma- tion c [18, 23, 6]. In fact, this behavior is also program specific; a’s ability to create or eliminate opportuni- ties depends on the presence of specific features in the code being compiled. To address these problems—compilation choice and compilation order—we have developed a new com- piler structure. Our adaptive compiler uses a program- 1 In practice, every compiler has economic limits: compile time, developer effort, calendar time before release. These con- straints limit the number of optimizations that will be imple- mented. In this constrained environment, more general tech- niques are the safe strategy.