Neural Methods for Dynamic Branch Prediction DANIEL A. JIM ´ ENEZ Rutgers University and CALVIN LIN The University of Texas at Austin This article presents a new and highly accurate method for branch prediction. The key idea is to use one of the simplest possible neural methods, the perceptron, as an alternative to the commonly used two-bit counters. The source of our predictor’s accuracy is its ability to use long history lengths, because the hardware resources for our method scale linearly, rather than exponentially, with the history length. We describe two versions of perceptron predictors, and we evaluate these predictors with respect to five well-known predictors. We show that for a 4 KB hardware budget, a simple version of our method that uses a global history achieves a misprediction rate of 4.6% on the SPEC 2000 integer benchmarks, an improvement of 26% over gshare. We also introduce a global/local version of our predictor that is 14% more accurate than the McFarling-style hybrid predictor of the Alpha 21264. We show that for hardware budgets of up to 256 KB, this global/local perceptron pre- dictor is more accurate than Evers’ multicomponent predictor, so we conclude that ours is the most accurate dynamic predictor currently available. To explore the feasibility of our ideas, we provide a circuit-level design of the perceptron predictor and describe techniques that allow our complex predictor to operate quickly. Finally, we show how the relatively complex perceptron predictor can be used in modern CPUs by having it override a simpler, quicker Smith predictor, providing IPC improvements of 15.8% over gshare and 5.7% over the McFarling hybrid predictor. Categories and Subject Descriptors: C.1.1 [Computer Systems Organization]: Processor Archi- tectures—Single data stream architectures General Terms: Performance Additional Key Words and Phrases: Branch prediction, neural networks 1. INTRODUCTION Modern computer architectures increasingly rely on speculation to boost instruction-level parallelism. For example, data that are likely to be read in D. A. Jim´ enez was supported by a fellowship from Intel Corporation. This research was supported in part by NSF CAREERS grant ACI-9984660, by ONR grant N00014-99-1-0402, and by the De- fense Advanced Research Projects Agency under contract F33615-01-C-1892. Authors’ addresses: D. Jim´ enez, Rutgers University, Department of Computer Science, 110 Frelinghuysen Rd., Piscataway, NJ, 08854; email: djimenez@cs.rutgers.edu; C. Lin, Department of Computer Sciences, Taylor 2.124, University of Texas, Austin TX, 78712; email: lin@cs.utexas.edu. Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. C 2002 ACM 0734-2071/02/1100-0369 $5.00 ACM Transactions on Computer Systems, Vol. 20, No. 4, November 2002, Pages 369–397.