c British Computer Society 2002 Multithreaded Processors THEO UNGERER 1 ,BORUT ROBI ˇ C 2 AND J URIJ ˇ SILC 3 1 University of Augsburg, Department of Computer Science, Augsburg, Germany 2 University of Ljubljana, Faculty of Computer and Information Science, Ljubljana, Slovenia 3 Joˇ zef Stefan Institute, Computer Systems Department, Ljubljana, Slovenia Email: Theo.Ungerer@informatik.uni-augsburg.de The instruction-level parallelism found in a conventional instruction stream is limited. Studies have shown the limits of processor utilization even for today’s superscalar microprocessors. One solution is the additional utilization of more coarse-grained parallelism. The main approaches are the (single) chip multiprocessor and the multithreaded processor which optimize the throughput of multiprogramming workloads rather than single-thread performance. The chip multiprocessor integrates two or more complete processors on a single chip. Every unit of a processor is duplicated and used independently of its copies on the chip. In contrast, the multithreaded processor is able to pursue two or more threads of control in parallel within the processor pipeline. Unused instruction slots, which arise from pipelined execution of single-threaded programs by a contemporary microprocessor, are filled by instructions of other threads within a multithreaded processor. The execution units are multiplexed between the threads in the register sets. Underutilization of a superscalar processor due to missing instruction-level parallelism can be overcome by simultaneous multithreading, where a processor can issue multiple instructions from multiple threads each cycle. Simultaneous multithreaded processors combine the multithreading technique with a wide- issue superscalar processor such that the full issue bandwidth is utilized by potentially issuing instructions from different threads simultaneously. This survey paper explains and classifies the various multithreading techniques in research and in commercial microprocessors and compares multithreaded processors with chip multiprocessors. Received 11 May 2001; revised 20 December 2001 1. INTRODUCTION VLSI technology will allow future microprocessors to have an issue bandwidth of 8–32 instructions per cycle [1, 2]. As the issue rate of future microprocessors increases, the compiler or the hardware will have to extract more instruction-level parallelism (ILP) from a sequential program. However, ILP found in a conventional instruction stream is limited. ILP studies which allow branch speculation for a single control flow have reported parallelism of around 7 instructions per cycle (IPC) with infinite resources [3, 4] and around 4 IPC with large sets of resources (e.g. 8 to 16 execution units) [5]. Contemporary high-performance microprocessors therefore exploit speculative parallelism by dynamic branch prediction and speculative execution of the predicted branch path to increase single-thread performance. Research into future microarchitectures—exemplified by the proposal of a superspeculative microprocessor [6]—has additionally looked at the prediction of data dependences, source operand values, value strides, address aliases and load values with speculative execution applying the predicted values [7, 8, 9, 10]. The superspeculative microarchitecture technique is applied to increase the performance of a single program thread by means of branch and value speculation techniques. Only instructions of a single thread of control are in execution. Multithreading pursues a different set of solutions by utilizing coarse-grained parallelism [11, 12, 13]. A multi- threaded processor is able to concurrently execute instruc- tions of different threads of control within a single pipeline. Depending on the architectural approach, multithreading is applied either to increase performance of a single program thread by implicitly utilizing parallelism which is more coarse-grained than ILP (so-called implicit multithreading) or to increase performance of a multiprogramming or multithreaded workload (so-called explicit multithreading). 1.1. Notion of a thread The notion of a thread in the context of multithreaded processors differs from the notion of software threads in multithreaded operating systems. In the case of a multithreaded processor a thread is always viewed as a hardware-supported thread which can be—depending on the specific form of multithreaded processor—a full program (single-threaded UNIX process), a light-weight process (e.g. a POSIX thread) or a compiler- or hardware-generated thread (subordinate microthread, microthread, nanothread etc.). Consequences for multithreaded processor design are as follows. The most common coarse-grained thread-level paral- lelism is to execute multiple processes in parallel. This implies that different logical address spaces have to be THE COMPUTER J OURNAL, Vol. 45, No. 3, 2002