- 1 - Low Power Instruction Fetch using Profiled Variable Length Instructions Mikael Collin and Mats Brorsson Dept. of Microelectronics and Information Technology KTH, Sweden {mikaelc, Mats.Brorsson}@imit.kth.se Abstract Computer system performance is highly dependent on high access rate and low miss rate in the instruction cache, which also have implications on energy consumed by fetching instructions. Simulation experiments on a small scalar processor typical for embedded systems show that up to 20% of the overall processor energy is consumed in the instruction fetch path and that as much as 23% of the execution time is spent on instruction fetch. One way to increase the instruction memory bandwidth is to fetch more instructions on each access without increasing the bus width. We propose an extension of the normal RISC style ISA. The ISA is augmented with instructions of variable length, yielding a higher information density, without compromising programmability. Based on extensive profiling of dynamic instruction usage, in terms of instruction types and arguments of a set of SPEC CPU2000 appli- cations, we present an extension scheme, using short, 8 and 16-bit instructions accompanied by lookup tables for used instruction argument combinations, that resides in the processor. In addition, we discuss introduced architectural extensions and implications experienced when enabling the fetch of four-byte wide chunks which can contain up to four instructions. Energy savings in instruction fetch and the rest of the processor are evaluated along with per- formance implications due to the property of variable length instructions using SimpleScalar and Wattch simulators. Our extension scheme with short instructions yields a 20-30% reduction in static memory usage, and simulations show that up to 60% of the dynamic executed instructions consist of short instructions. Throughout all executions, the programs experienced a reduction in instruc- tion cache miss-rate. The overall energy savings are up to 15% for the entire data path and mem- ory system, and up to 20% in the instruction fetch path alone. Key words: Instruction set architecture, low-energy architecture, cache memories 1 Introduction Energy has become a first-order computer system design parameter, ever as important as performance. Not only for hand-held or portable devices where battery life is important, but also for stationary com- puter systems since heat dissipation is becoming a rapidly growing problem. We describe in this paper a variable length instruction set extension to an ordinary RISC instruction set. The extension has been designed to reduce the energy consumption in the instruction fetch path of modern processors, in partic- ular for embedded applications. The performance gap between the processor core and the memory has traditionally been solved with one or more levels of cache memories. The instruction cache performance is often considered less prob- lematic under the assumption that the locality of instruction fetches is large enough so that instruction cache misses are not a problem for performance. However, given the enormous high instruction mem- ory bandwidth requirement, not only for modern high-end processors with GHz-plus clock frequencies but also for processors designated for embedded use [2], this is no longer true. For instance, Stark et al. conclude that instruction cache misses have a severe performance impact on the processor [15]. Their