An Empirical Study of Data Speculation Use on the Intel Itanium 2 Processor Markus Mock Department of Computer Science University of Pittsburgh Pittsburgh, PA 15260 Email: mock@cs.pitt.edu Ricardo Villamar´ ın Department of Computer Science University of Pittsburgh Pittsburgh, PA 15260 Email: rvillsal@cs.pitt.edu Jos´ e Baiocchi Department of Computer Science University of Pittsburgh Pittsburgh, PA 15260 Email: baiocchi@cs.pitt.edu Abstract— The Intel Itanium architecture uses a dedicated 32- entry hardware table, the Advanced Load Address Table (ALAT) to support data speculation via an instruction set interface. This study presents an empirical evaluation of the use of the ALAT and data speculative instructions for several optimizing compilers. We determined what and how often compilers generated the different speculative instructions, and used the Itanium’s hardware per- formance counters to evaluate their run-time behavior. We also performed a limit study by modifying one compiler to always generate data speculation when possible. We found that this aggressive approach significantly increased the amount of data speculation and often resulted in performance improvements, of as much as 10% in one case. Since it worsened performance only for one application and then only for some inputs, we conclude that more aggressive data speculation heuristics than those employed by current compilers are desirable and may further improve performance gains from data speculation. I. I NTRODUCTION For the past decade, one of the main sources of increased computer system performance has been found in instruc- tion level parallelism (ILP). However, exploiting additional processor resources faces two particular obstacles: control and data dependences. For example, without any form of speculation, execution cannot pass a conditional branch until the branch outcome has been determined, so that it can be difficult to keep all the functional units of the processor busy. To overcome the limitations on ILP imposed by control dependences, various forms of control speculation in the compiler and processor have been conceived. For example, computer architects have added speculative load instructions to the instruction set architecture (ISA) to enable the compiler to schedule a load before a conditional branch without changing program semantics. Moreover, control speculation in the form of speculative execution (i.e., via speculation hardware, such as branch predictors) is performed by nearly all high performance processors today. To overcome limitations imposed by data dependences, several data speculation approaches have been developed recently [1], [2], [3], [4]. In data speculation, a computation is performed specula- tively based on the value of some data item (e.g., a variable), that is not yet available. In particular, a memory load and dependent computations may be executed before a possibly aliased store. Consider the example in Figure 1. The original // other instructions ld8.a r6 = [r8];; // advanced st8 [r4] = r12 // other instructions load ld8 r6 = [r8];; add r5 = r6, r7;; add r5 = r6, r7;; // other instructions st8 [r18] = r5 st8 [r4] = r12 ... chk.a r6, recover back: st8 [r18] = r15 ... // somewhere else in program recover: ld8 r6 = [r8];; add r5 = r6, r7 br back (a) before data speculation (b) after data speculation Fig. 1. Data speculation example. The load into register r6 and the dependent load instruction are speculatively executed before the possibly aliased store. Before the loaded value is used, a special check instruction is used that branches to recovery code to redo the load and add in case the store wrote the same memory location. code performs a store, then loads a value from memory, uses the value to perform an addition, and finally stores the result back to memory. A data dependence between the first store and the subsequent load must be assumed unless the store and load are known to definitely access different memory locations. Therefore, the load instruction cannot be hoisted above the store to (1) hide the load latency, and (2) make use of possibly available machine resources before the store Data speculation works by breaking this data dependence and speculatively performing the load and dependent compu- tations. In the example, which shows Intel Itanium assem- bly code, a load of one operand of an add instruction is hoisted above a possibly aliased store using a special data- speculatively load instruction, called an advanced load. After the store, a special check instruction (chk.a) is used to determine if misspeculation occurred: if the store modified the same memory location accessed by the load, the check instruction branches to recovery code. The compiler-generated recovery code repeats the load and the add instruction and then branches back to resume normal execution. To detect misspeculation, the Intel Itanium processor [5] uses a dedicated hardware structure, the Advanced Load Address Table (ALAT), a 32-entry fully-associative table that