1 of 22 The Performance Potential of Data Value Reuse Antonio González, Jordi Tubella and Carlos Molina Departament d’Arquitectura de Computadors Universitat Politècnica de Catalunya, Jordi Girona 1-3, Edifici D6, 08034 Barcelona, Spain e-mail: {antonio,jordit,cmolina}@ac.upc.es Abstract This paper presents a study of the performance limits of data value reuse. Two types of data value reuse are considered: instruction-level reuse and trace-level reuse. The former reuses instances of single instructions whereas the latter reuses sequences of instructions as an atomic unit. Two different scenarios are considered: an infinite resource machine and a machine with a limited instruction window. The results show that reuse is abundant in the SPEC applications. Instruction- level reuse may provide a significant speedup but it drops dramatically when the reuse latency is considered. Trace-level reuse has in general less potential for the unlimited window scenario but it is much more effective for the limited window configuration. This is because trace-level reuse, in addition to reduce the execution latency, increases the effective instruction window size, by avoiding the fetch and execution of sequences of instructions. Overall, trace-level reuse is shown to be a promising approach since it can provide speedups around 3 for a 256-entry instruction window and a realistic reuse latency. Keywords: Data value reuse, instruction-level reuse, trace-level reuse, instruction-level parallelism. 1. Introduction Data dependences 1 are one of the most important hurdles that limit the performance of current microprocessors. The amount of instruction-level parallelism (ILP) that processors may exploit is significantly limited by the serialization caused by data dependences. This limitation is more severe for integer codes, in which data dependences are more abundant. Some studies on the ILP 1. In this paper, data dependences refer to true dependences (output and anti-dependences are not included).