The Design Complexity of Program Undo Support in a General-Purpose Processor Radu Teodorescu and Josep Torrellas Department of Computer Science University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu 1 Introduction Several recently-proposed architectural techniques require speculation over long program sections. Ex- amples of such techniques are thread-level specula- tion [4, 6, 11, 12], speculation on collision-free syn- chronization [7, 10], speculation on the values of in- validated cache lines [5], speculation on conforming to a memory consistency model [3], and even specula- tion on the lack of software bugs [8, 13]. In these techniques, as a thread executes specula- tively, the architecture has to buffer the memory state that the thread is generating. Such state can potentially be quite large. If the speculation is shown to be cor- rect, the architecture commits the speculative state. If, instead, the speculation is shown to be incorrect, the speculative state is discarded and the program is rolled back to the beginning of the speculative execution. A common way to support these operations with low overhead is to take a checkpoint when entering speculative execution and buffer the speculative state in the cache. If the speculation is shown to be cor- rect, the state in the cache is merged with the rest of the program state. If the speculation is shown to be incorrect, the speculative state buffered in the cache is invalidated and the register checkpoint is restored. While the hardware needed for these operations has been discussed in many papers, it has not been imple- mented before. In fact, there is some concern that the hardware complexity may be too high to be cost effec- tive. In this paper, we set out to build such architectural support on a simple processor and prototype it using FPGA (Field Programmable Gate Array) technology. The prototype implements register checkpointing and restoration, speculative state buffering in the L1 cache for later commit or discarding, and instructions for transitioning between speculative and non-speculative execution modes. The result is a processor that can cleanly roll back (or “undo”) a long section of a pro- gram. We estimate the design complexity of adding the hardware support for speculative execution and roll- back using three metrics. The first one is the hardware overhead in terms of logic blocks and memory struc- tures. The second one is development time, measured as the time spent designing, implementing and testing the hardware extensions that we add. Finally, the third metric is the number of lines of VHDL code used to implement these extensions. For our prototype, we modified LEON2 [2], a syn- thesizable VHDL implementation of a 32-bit proces- sor compliant with the SPARC V8 architecture. We mapped the modified processor to a Xilinx Virtex-II FPGA chip on a dedicated development board. This allowed us to run several applications, including a ver- sion of Linux. Our measurements show that the complexity of sup- porting program rollback over long code sections is very modest. The hardware required amounts to an average of less than 4.5% of the logic blocks in the simple processor analyzed. Moreover, the time spent designing, implementing, and debugging the hardware support is only about 20% higher than adding write back support to a write-through cache. Finally, the VHDL code written to implement our hardware adds about 14.5% more code to the data cache controller, and 7.5% to the simple pipeline.