CVP: An Energy-Efficient Indirect Branch Prediction with Compiler-Guided Value Pattern Mingxing Tan, Xianhua Liu, Dong Tong, Xu Cheng Microprocessor Research and Development Center, Peking University, Beijing, China {tanmingxing, liuxianhua, tongdong, chengxu}@mprc.pku.edu.cn ABSTRACT Indirect branch prediction is becoming increasingly impor- tant in modern high-performance processors. However, pre- vious indirect branch predictors either require a significant amount of hardware storage and complexity, or heavily rely on the expensive manual profiling. In this paper, we propose the Compiler-Guided Value Pat- tern (CVP) prediction, an energy-efficient and accurate in- direct branch prediction via compiler-microarchitecture co- operation. The key of CVP prediction is to use the compiler- guided value pattern as the correlated information to hint the dynamic predictor. The value pattern reflects the pat- tern regularity of the value correlation, and thus significant- ly improves the prediction accuracy even in the case of deep pipeline stage or long memory latency. CVP prediction relies on the compiler to automatically identify the primary value correlation based on three high-level program substructures: virtual function calls, switch-case statements and function pointer calls. The compiler-identified information is then fed back to the dynamic predictor and is further used to hint the indirect branch prediction at runtime. We show that CVP prediction can be implemented in modern processors with little extra hardware support. Evaluations show that CVP prediction can significantly improve the prediction accuracy by 46% over the traditional BTB-based prediction, leading to the performance improve- ment of 20%. Compared with the state-of-the-art aggressive ITTAGE and VBBI predictors, CVP prediction can improve the performance by 5.5% and 4.2% respectively. Categories and Subject Descriptors C.1.0 [Processor Architectures]: General; C.5.3 [Computer System Implementation]: Microcom- puters—Microprocessors Keywords Indirect branch prediction, compiler-guided value pattern. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICS’12, June 25–29, 2012, San Servolo Island, Venice, Italy. Copyright 2012 ACM 978-1-4503-1316-2/12/06 ...$10.00. 1. INTRODUCTION In modern high-performance processors, accurate branch prediction can improve both the performance and energy efficiency. As indirect branches are widely used in modern applications [10, 21], an accurate indirect branch predictor is becoming an important component of modern processors [14]. Unfortunately, indirect branches are hard to predict be- cause each indirect branch may correspond to multiple tar- gets [18, 21]. As a special kind of indirect branches, function return instructions can be predicted using return-address- stack [19], but other indirect branches used for virtual func- tion calls, switch-case statements, and function pointer calls are hard to predict. As a result, indirect branch mispredic- tions usually make up a sizable fraction of overall branch mispredictions [18, 21]. We evaluate the Mispredictions Per Kilo Instructions (MPKI) for conditional branches, function returns and indirect branches 1 in a 4-issue, 16-stage proces- sor with a 4-way, 4K-entry Branch Target Buffer (BTB). Figure 1 shows the results. On average, the indirect branch MPKI accounts for 42% of total mispredictions. perlbmk perlbench gap gcc00 gcc06 eon sjeng crafty ixx richards AVG 0 2 4 6 8 10 12 14 Indirect branches Function returns Conditional branches Mispredictions Per Kilo Instructions (MPKI) Figure 1: MPKI for different kinds of branches. Previous indirect branch predictors mainly require a sig- nificant amount of extra hardware storage and complexity. History-based predictors such as TTC predictor [5], Cascade predictor [7] and ITTAGE predictor [28] generally require large extra storage; while value-based ARVI predictor [6] and VBBI predictor [11] heavily rely on the complex control logics or manual profiling. These predictors mainly focus on hardware-only approach, because they hope to be read- ily deployed into existing architectures and maintain bina- 1 In the rest of this paper, an “indirect branch” refers to a non-return unconditional indirect branch instruction. 111