V. V Das, R. Vijaykumar et al. (Eds.): ICT 2010, CCIS 101, pp. 660–665, 2010.
© Springer-Verlag Berlin Heidelberg 2010
Recent Trends in Superscalar Architecture to Exploit
More Instruction Level Parallelism
Ritu Baniwal and Kumar Sambhav Pandey
Computer Science and Engineering Department
NIT Hamirpur
baniwalritu@gmail.com, kumar@nitham.ac.in
Abstract. Today’s architectures are moving towards to exploit more and more
parallelism. Instruction level parallelism (ILP) is where multiple instructions
are executed simultaneously. Superscalar architecture was one of such evolu-
tions. To exploit ILP superscalar processors fetch and execute multiple instruc-
tions in parallel thereby reducing the clock cycles per instruction (CPI). ILP
can be exploited either statically by the compiler or dynamically by the hard-
ware. In this paper the basic superscalar approach and the improvements made
to the superscalar architectures to exploit more parallelism in execution have
been discussed.
Keywords: Superscalar architectures, Instruction level parallelism, CPI.
1 Introduction
Parallel processing is the need of today’s architectures. Parallel processing reduces
the execution time taken by any program. The execution time taken by any program
can be determined by three factors: First, the number of instructions executed. Sec-
ond, number of clock cycles needed to execute each instruction and the third is the
length of each clock cycle. Instruction-level parallelism (ILP) is where multiple in-
structions from one instruction stream are executed simultaneously. ILP can be ex-
ploited by: pipelined execution (overlapping of instructions), superscalar execution
(fetch and execute multiple instructions per clock cycle) and out-of-order execution
(in-order commit).
Superscalar architectures have exploited instruction-level-parallelism (ILP). Super-
scalar machines dynamically extracted ILP from a scalar instruction stream. Supersca-
lar architectures fetched and executed multiple instructions simultaneously in one
clock cycle reducing the number of clock cycles per instructions thus reducing the
execution time.
The CDC 6600 [9] used a degree of pipelining, but achieved ILP through parallel
functional units. Another remarkable processor of the 1960s was the IBM 360/91 [3].
The 360/91 was deeply pipelined, and provided a dynamic instruction issuing mecha-
nism, known as Tomasulo’s algorithm [10].