The Impact of Unresolved Branches on Branch Prediction Scheme Performance Adam R. Talcott Wayne Yarnamoto Mauricig J. Serrano Roger C. Wood Electrical and Computer Engineering Dept. University of California, Santa Barbara Santa Barbara, CA 93106-5130 Abstract In this paper, we examine the benefits of the early resolution of branch instructions and the impact of unresolved branches on history-based branch prediction schemes by using two new metrics that are more revealing than branch prediction accuracy alone. We first briejly review a number of branch prediction schemes and introduce two new branch prediction scheme performance metrics. We then utilize these metrics to gauge the improvement in branch prediction scheme performance when only the outcomes of unresolved branches are predicted. Finally, we examine two approaches for handling multiple unresolved branches in history-based branch prediction schemes, and determine that prediction accuracy remains quite stable when older branch histories are used. 1. Introduction It is understood that the overlapped execution of instructions provided by pipelined processors may result in data or control hazards. The effect of these hazards is to stall the pipeline and reduce processor performance by lowering the instructions per cycle ratio (IPC). Control hazards tend to be more difficult to deal with because there is no way of knowing if the processor is fetching instructions from the correct address until both the outcome of the branch instruction and the address of the bmnch target have been determined. As it may take a few cycles to compute these results, the pipeline is generally stalled until they have been determined. To make matters worse, the cost associated with control hazards has become even larger recently as more processors make use of deeper pipelines (so-called superpipelines). This research was supported by the State of California and Apple Computer, Inc. via MICRO Grant #92-178. Mario Nemirovsky *National Semiconductor Corporation 2900 Semiconductor Drive, M.S. E-280 Santa Clara, CA 95052-8090 It has become a common technique for the processor to predict the outcome of the branch (whether it is taken or not taken) in time for the instructions at the branch target to be fetched without interrupting the flow of instructions into the pipeline. If the outcome of a branch may be predicted with a probability close to one, then there is no reason to stall the pipeline until the actual outcome is known. In this case, it is best to begin fetching, issuing and possibly executing instructions from the predicted branch target. The probability of correctly predicting the outcome of a branch should be high enough for the chances of flushing the pipeline, or (if speculative execution is used) rolling back the processor’s state, to be below an acceptable level. In many cases, the condition on which a conditional branch is based can be computed far in advance (e.g., a loop counter). Therefore, there is no need to predict the outcomes of such branches since their outcomes are already known. Branches whose outcomes have already been computed when the branch is fetched are referred to as resolved branches. A number of architectures have already been developed to take advantage of resolved branches. This is most easily done in architectures with special instructions that explicitly set condition codes or flags (such as RISC-1, the Astronautics ZS-1, IBM’s RISC System/6000, and the PowerPC 601). These architectures allow branch condition information to be communicated directly to the instruction fetch unit as soon as it is available and branch instructions using these conditions can be executed immediately in the fetch unit, with no need for prediction. On the other hand, architectures with implicitly set condition codes as side-effects of other instructions can seriously impede the ability to resolve branches early. Also, it is possible for architectures that use general purpose registers when branching to obtain the same performance effect by providing the instruction fetch unit 12 1063-6897/94 $03.00 @ 1994 IEEE