Learning Heuristics for the Superblock Instruction Scheduling Problem Tyrel Russell, Abid M. Malik, Michael Chase, and Peter van Beek Cheriton School of Computer Science University of Waterloo, Waterloo, Canada Abstract— Modern processors have multiple pipelined func- tional units and can issue more than one instruction per clock cy- cle. This places a burden on the compiler to schedule the instruc- tions to take maximum advantage of the underlying hardware. Superblocks—a straight-line sequence of code with a single entry point and multiple possible exit points—are a commonly used scheduling region within compilers. Superblock scheduling is NP-complete, and is done sub-optimally in production compilers using a greedy algorithm coupled with a heuristic. The heuristic is usually hand-crafted, a potentially time-consuming process. In this paper, we show that supervised machine learning techniques can be used to semi-automate the construction of heuristics for superblock scheduling. In our approach, labeled training data was produced using an optimal superblock scheduler. A decision tree learning algorithm was then used to induce a heuristic from the training data. The automatically constructed decision tree heuristic was compared against the best previously proposed, hand-crafted heuristics for superblock scheduling on the SPEC 2000 and MediaBench benchmark suites. On these benchmark suites, the decision tree heuristic reduced the number of superblocks that were not optimally scheduled by up to 38%, and led to improved performance on some architectural models and competitive performance on others. Index Terms— Pipeline processors, compilers, heuristics design, machine learning, constraint satisfaction I. I NTRODUCTION Modern processors are pipelined and can issue more than one instruction per clock cycle. A challenge for the compiler is to ﬁnd an order of the instructions that takes maximum advantage of the underlying hardware without violating dependency and resource constraints. Depending upon the scope, there are two types of instruction scheduling: local and global instruction scheduling. In local instruction scheduling, the scheduling is done within a basic block 1 . Performing only local instruction scheduling can lead to under-utilization of the processor. This has stimulated substantial research effort in global instruction scheduling, where instructions are allowed to move across basic blocks. Figure 1 shows a control ﬂow graph (CFG)—an abstract data structure used in compilers to represent a program—consisting of ﬁve basic blocks. Instructions in basic block B 4 are independent of the instructions in basic blocks B 2 ,B 3 and B 5 . We can increase the efﬁciency of the code and the utilization of the processor by inserting instructions from B 4 into the free slots available in B 2 ,B 3 and B 5 . This is only possible if we schedule instructions in all basic blocks at the same time. Many regions have been proposed for performing global instruction scheduling. 1 See Section II for detailed deﬁnitions and explanations of terms in instruction scheduling. Fig. 1. A control ﬂow graph (CFG) with ﬁve basic blocks. Control enters through e 0 and can leave through e 1 , e 2 , e 3 or e 5 . The values w 1 , w 2 , w 3 and w 5 are exit probabilities. The most commonly used regions are traces [1], superblocks [2] and hyperblocks [3]. The compiler community has mostly targeted superblocks for global instruction scheduling because of their simpler implementation as compared to the other regions. Superblock scheduling is harder than basic block scheduling. In basic block scheduling, all resources are considered available for the basic block under consideration. In superblock scheduling, having multiple basic blocks with conﬂicting resource and data re- quirements, each basic block competes for the available resources [4]. The most commonly used method for instruction scheduling is list scheduling coupled with a heuristic. A number of heuristics have been developed for superblock scheduling. The heuristic in a production compiler is usually hand-crafted by choosing and testing many different subsets of features and different possible orderings—a potentially time-consuming process. For example, the heuristic developed for the IBM XL family of compilers “evolved from many years of extensive empirical testing at IBM” [5, p. 112, emphasis added]. Further, this process often needs to be repeated as new computer architectures are developed and as programming languages and programming styles evolve. In this paper, we show that machine learning can be used to semi-automate the construction of heuristics for superblock scheduling in compilers, thus simplifying the development and maintenance of one small but important part of a large, complex software system. Our approach uses supervised learning. In supervised learning, one learns from training examples which are labeled with the correct answers. More precisely, each training