V. V Das, R. Vijaykumar et al. (Eds.): ICT 2010, CCIS 101, pp. 660–665, 2010. © Springer-Verlag Berlin Heidelberg 2010 Recent Trends in Superscalar Architecture to Exploit More Instruction Level Parallelism Ritu Baniwal and Kumar Sambhav Pandey Computer Science and Engineering Department NIT Hamirpur baniwalritu@gmail.com, kumar@nitham.ac.in Abstract. Today’s architectures are moving towards to exploit more and more parallelism. Instruction level parallelism (ILP) is where multiple instructions are executed simultaneously. Superscalar architecture was one of such evolu- tions. To exploit ILP superscalar processors fetch and execute multiple instruc- tions in parallel thereby reducing the clock cycles per instruction (CPI). ILP can be exploited either statically by the compiler or dynamically by the hard- ware. In this paper the basic superscalar approach and the improvements made to the superscalar architectures to exploit more parallelism in execution have been discussed. Keywords: Superscalar architectures, Instruction level parallelism, CPI. 1 Introduction Parallel processing is the need of today’s architectures. Parallel processing reduces the execution time taken by any program. The execution time taken by any program can be determined by three factors: First, the number of instructions executed. Sec- ond, number of clock cycles needed to execute each instruction and the third is the length of each clock cycle. Instruction-level parallelism (ILP) is where multiple in- structions from one instruction stream are executed simultaneously. ILP can be ex- ploited by: pipelined execution (overlapping of instructions), superscalar execution (fetch and execute multiple instructions per clock cycle) and out-of-order execution (in-order commit). Superscalar architectures have exploited instruction-level-parallelism (ILP). Super- scalar machines dynamically extracted ILP from a scalar instruction stream. Supersca- lar architectures fetched and executed multiple instructions simultaneously in one clock cycle reducing the number of clock cycles per instructions thus reducing the execution time. The CDC 6600 [9] used a degree of pipelining, but achieved ILP through parallel functional units. Another remarkable processor of the 1960s was the IBM 360/91 [3]. The 360/91 was deeply pipelined, and provided a dynamic instruction issuing mecha- nism, known as Tomasulo’s algorithm [10].