International Journal of Computer Applications (0975 – 8887) Volume 4 – No.4, July 2010 28 Role of Multiblocks in Control Flow Prediction using Parallel Register Sharing Architecture Rajendra Kumar P K Singh Dept. of Computer Science & engineering, Dept. of Computer Science & engineering Vidya College of Engineering MMM Engineering College, Meerut (UP), India Gorakhpur (UP), India ABSTRACT In this paper we present control flow prediction (CFP) in parallel register sharing architecture to achieve high degree of ILP. The main idea behind this concept is to use a step beyond the prediction of common branch and permitting the architecture to have the information about the CFG (Control Flow Graph) components of the program to have better branch decision for ILP. The navigation bandwidth of prediction mechanism depends upon the degree of ILP. It can be increased by increasing control flow prediction at compile time. By this the size of initiation is increased that allows the overlapped execution of multiple independent flow of control. The multiple branch instruction can also be allowed. These are intermediate steps to be taken in order to increase the size of dynamic window to achieve a high degree of instruction level parallelism exploitation. Keywords: CFP, ISB, ILP, CFG, Basic Block 1. Introduction Instruction Level Parallelism (ILP) is the methodology for execution of multiple instructions per cycle. It is now desirable to modern processors for better performance. It has been observed that ILP is greatly forced by branch instructions. Also it has been observed that branch prediction is employed with speculative execution [1]. However, inevitable branch misprediction compromises such a remedy. On the other hand branch prediction exposes high degree of ILP to scheduler by converting control flow into equivalent predicated instructions which are protected by Boolean source operands. The if- conversion [2] has been shown to be promising method for exploitation of instruction level parallelism in the presence of control flow. The if-conversion in the prediction is responsible for control dependency between the branches and remaining instructions creating data dependency between the predicate definition and predicated structures of the program. As a result the transformation of control flow becomes optimized traditional data flow and branch scheduling becomes reordering of serial instructions. The degree of instruction level parallelism can be increased by overlapping multiple program path executions. Some predicate specific optimization may also be performed as a supplement of traditional sequential computing approaches. The major questions regarding the if-conversion: what to if- convert and when to if-convert explore that the if-conversion should be performed early in the compilation stage. It has the advantage of classified optimization facilitation on the predicted instructions whereas a delay in if-conversion is scheduled in the time slots for better selection for code efficiency and target processor characteristics. The dynamic branch prediction is fundamentally is restricted to establishing a dynamic window because it can make local decision without any prior knowledge or of global control statement in the code. This short of knowledge creates several problems like (1) branch prediction and (2) its identity. It means the branch must be encountered by parallel register sharing architecture [12]. Due to normal branch prediction, a prediction can be made while the fetch unit fetches the branch instruction for their execution. 2. Related Work The fetch unit has a great role in prediction mechanism [2] in parallel register sharing architecture but Pan, So and Rahmeh (1992) [14], and Yeh Y. Patt (1993) [16] proposed some recent prediction mechanism that do not require the addresses of branches for prediction rather there is requirement of identity of each branch to be known so that the predicted target address can be obtained using either BTB [7] or by decoding branch instructions in parallel register sharing architecture. There are so many commercially available embedded processors that are capable to extend the set of base instructions for a specific application domain. A steady progress has been observed in tools and methodology for automatic instruction set extension for processors that can be configured to exploit ILP. It has been observed that the limited data bandwidth is available in the core processors. This creates a serious performance deadlock. Cong, Han and Zhiru Zhang (2005) [5] represents a very low cost architectural extension and a compilation technique responsible for data bandwidth problem. A novel parallel global register binding is also presented in [5] with the help of hash function algorithm. This leads to a nearly optimal performance speedup of 2% of ideal speedup. A compilation framework [1] is presented that allows a compiler to maximize the benefits of prediction. Steve Carr (1996) [15] shown how the weakness of traditional heuristics are exploited. Optimal use of loop cash is also explored to release the unnecessary pressure. A technique to enhance the ability of dynamic ILP processors to exploit the parallelism is introduced in [3]. A performance metric is presented in [15] to guide the nested loop optimization. This facilitates INSTRUCTION LEVEL PARALLELISM with loop as combined optimization. The impact of ILP processors on the performance of shared memory multiprocessors [17] with and without latency hiding optimizing software prefetching has been represented by Pai, Ranganathan, Shafi andAdve (1999). One of the critical goals in the code optimization for multiprocessor system on single chip architecture [4] is to minimize the number of off chip memory access. A strategy has been represented in [4] to reduce the number of off chip references due to shared data. Static techniques (for example, like trace scheduling [4, 6], predicated execution [9], super block and hyper block scheduling [3, 13], etc.) have been used to promote the impact of control dependencies. Lam Wilson (1992) [8] represents a study that shows the ILP processors which perform branch prediction and speculative execution. But it allows only a single flow of control that can extract a parallelism of only 7.0. The parallelism limit is increased to 13.05 if the ILP processors use the maximal of control