Prototyping Architectural Support for Program Rollback: An Application to Software Debugging Radu Teodorescu and Josep Torrellas University of Illinois at Urbana-Champaign Several recently-proposed architectural techniques require speculation over long program sections. Examples of such techniques are Thread-Level Speculation [2, 4, 8, 9], speculation on synchronization [7, 5], speculation on the values of invalidated cache lines [3], speculation on conforming to a memory consistency model [1], and speculation on the lack of software bugs [6, 10]. In all these cases, when speculation fails, the architecture has to provide a means to quickly and cleanly rollback the side effects of the speculative code. More specifically, as a thread executes speculatively, the architecture buffers the register and memory state that it generates. If and when the speculation is shown to be correct, the architecture quickly commits the speculative state. If, instead, the speculation is incorrect, the state is discarded and the program is rolled back to before the speculative execution. This paper reports on a processor and memory-hierarchy prototype based on FPGAs that models hardware for program rollback. The prototype implements register checkpointing and restoration, speculative state buffering in the L1 cache for later commit or discarding, and instructions for transitioning between speculative and non-speculative execution modes. We use the prototype to demonstrate how to use application rollback to help debug pro- duction code. The compiler inserts hints into the application to indicate regions of code that are “at risk”. These suspicious regions are then executed in speculative mode. If an external source detects a bug, the suspicious region is rolled-back and re-executed. Upon re-execution, the compiler can choose to enable more instrumentation that will help characterize the buggy code region thoroughly. For our prototype, we modified a synthesizable VHDL implementation of a 32-bit processor compliant with the SPARC V8 architecture. We map the modified processor to a Xilinx Virtex-II FPGA chip on a dedicated development board. We ran several applications on top of the Linux kernel. We show that with relatively simple hardware and minimum impact on performance we can enable lightweight, on-the-fly debugging of production code. Our measurements show that the hardware extensions increase the resource requirements of the processor, when targeting the FPGA technology, by less than 2.5%. We envision this hardware as part of a larger debugging infrastructure that includes com- piler and OS assistance to provide bug detection and characterization in production runs.