International Journal of Computer Applications (0975 8887) Volume 128 No.16, October 2015 6 Performance Analysis of Branch Prediction Unit for Pipelined Processors Nikhil Panwar Student M.Tech VLSI Design ACS Division, Centre for Development of Advanced Computing (C-DAC), Mohali, 160071, India Manjit Kaur Engineer ACS Division, Centre for Development of Advanced Computing (C-DAC), Mohali, 160071, India Gurmohan Singh Senior Engineer DEC Division, Centre for Development of Advanced Computing (C-DAC), Mohali, 160071, India ABSTRACT The branch predictor plays a crucial role in the achievement of effective performance in microprocessors with pipelined architectures. This paper analyzes performance of branch prediction unit for pipelined processors. A memory of 512 bytes is designed for storing instructions. A 32 byte memory is designed for branch target buffer (BTB). This memory is utilized for storing history of the branch instructions. A Finite State Machine (FSM) is designed for branch predictor unit. It consists of four states: strongly taken, weakly taken, weakly not taken and strongly not taken. Prediction is done based on the status of FSM. If the state of FSM is weakly taken or strongly taken, then predictor guesses it as a taken condition else it is assumed to be not taken condition. When the execution of branch instruction is done for the first time the BTB stores the address of current instruction and also the address where it jumps. After this the current status of the FSM is updated accordingly. The program is executed using a branch predictor unit and also without a branch predictor unit. The latency of both the processors with a branch prediction unit and without is branch prediction unit is computed and compared. The simulation results validates that with branch prediction unit latency is decreased. Keywords BTB, FSM, ILP, FPGA, Latency, Processor. 1. INTRODUCTION The performance of microprocessor architectures has doubled in every two to three years. The techniques used for high performance computing are Pipelining and Predictor. Pipelining is highly preferred in high performance embedded processors as it can increase instruction level parallelism. The processor can be broken into different stages while storing each intermediate stage by using pipelining. Pipelining can be applied for the execution of a number of instructions at a particular time. As a result the throughput, which is the number of instructions per second of the processor, is increased [2]. The pipelined instructions need to be examined carefully to understand the effects created by changes in control flow. For an instance four pipelined structures may be required in a pipelined structure namely, instruction fetch (IF), Instruction decode (ID), Execute (EX), and Write back (WB). Each instruction undergoes many stages of execution till the result of fed instructions is known in the process of pipelining. In each preceding stage of pipelining many instructions are being executed simultaneously [3]. When instructions are being fetched a delay occurs before the results of execution, this delay is caused by the conditional branches due to unavailability of the next fetch address and this delay creates ambiguity in case of branch instructions. The instructions are executed sequentially. Due to branch instructions the flow of instruction changes, therefore the fetching unit in the processor should have prior knowledge of the fact that which part of the instruction should be fetched first in order to utilize the pipelining stages contained in the branch instructions. In case of conditional branches two instructions can be followed. If the conditional branch is processed, the fetching of the next instruction is done from the address of the next consecutive instruction which is known as fall through instruction or the instruction is fetched from the target address which is known as target instruction. The branch problem arises since the conditional branch is required to wait for the condition to get resolved and the address of the next instruction is calculated before the next instruction is being fetched. This results in a delay in the processor. Due to these delays the processor performance is degraded. The processing is required to be stopped and the processor needs to wait till the direction of the branch is not discovered. This introduces stalls in the pipeline. The number of stalls is determined by calculating the number of stages in between fetching stage and that stage in which the branch was resolved. The performance problem can be removed by adopting a technique called Branch prediction [4]. Branch predictor helps to predict the path of a branch instruction before actually knowing its behaviour. Flow in the instruction pipeline will be improved using branch predictor. In modern microprocessors with pipelined architectures branch predictors play a crucial role in achieving high performance effectively [4]. Conditional jump instruction is used to implement two-way branching. In case when the conditional jump is considered to be as not taken the execution continues along the first branch of the code which comes immediately after the conditional jump. In case when the conditional jump is considered to be taken the execution jumps to the location in the memory of the program where the code for the second branch is stored. In pipelined structures clock cycles are shorter as the work required by each stage is not more. The processors are designed with multiple instruction pipelines which allow issuing of multiple instructions in each cycle. The processor should be supplied large number of instructions in order to use the pipelined stages efficiently. The decision to the fact that conditional jump is taken or not taken cannot be made until calculation has been made on the condition and also until conditional jump has passed execution stage in the instruction pipeline [5]. When branch prediction unit is not present the processor is required to wait for conditional jump instruction to pass through the execution stage before the next instruction is entered into the fetch stage in the pipeline. The branch predictor guesses whether the chances for conditional jump are more for being taken or not taken and thereby prevents