Fast Address-Space Switching on the StrongARM SA-1100 Processor Adam Wiggins and Gernot Heiser School of Computer Science & Engineering University of New South Wales Sydney 2052, Australia awiggins,gernot @cse.unsw.edu.au, http://www.cse.unsw.edu.au/ disy Abstract The StrongARM SA-1100 is a high-speed low-power pro- cessor aimed at embedded and portable applications. Its architecture features virtual caches and TLBs which are not tagged by an address-space identifier. Consequently, context switches on that processor are potentially very ex- pensive, as they may require complete flushes of TLBs and caches. This paper presents the design of an address-space man- agement technique for the StrongARM which minimises TLB and cache flushes and thus context switching costs. The basic idea is to implement the top-level of the (hardware- walked) page-table as a cache for page directory entries for different address spaces. This allows switching address spaces with minimal overhead as long as the working sets do not overlap. For small ( MB) address spaces fur- ther improvements are possible by making use of the Strong- ARM’s re-mapping facility. Our technique is discussed in the context of the L4 microkernel in which it will be imple- mented. 1. Introduction The StrongARM SA-1100 [5] is a high speed, low power processor based on the ARM architecture [6]. It is specifi- cally designed for portable and embedded systems. The de- sign is based around a first generation StrongARM core and has peripheral controllers (DRAM controller, serial ports, etc.) integrated into a single package. To achieve a high clock rate of 200MHz, the core makes use of a Harvard architecture featuring separate translation-lookaside buffers (TLBs) and caches for data and instruction streams. Being targeted at applications which traditionally do not use multitasking operating sys- tems, the design has minimised support for multiple ad- dress spaces. The caches in particular are virtually indexed and virtually tagged and the TLBs are not tagged with an address-space identifier. A context switch between threads belonging to different tasks (and thus address-spaces) implies a change of virtual address mappings and thus a change of page tables. On an architecture which does not tag the TLBs with an address- space identifier, this normally implies that the TLBs must be flushed, as they would contain incorrect translations for the thread being scheduled. This not only implies some di- rect overhead for invalidating all TLB entries, it also im- plies significant indirect costs, as the thread, once it starts executing, will experience a number of TLB misses until its working set is mapped. Each TLB miss requires a costly page table lookup (could be 20–100 cycles per TLB miss on the StrongARM). Similarly, virtual caches may contain stale data after a context switch, which would lead to incorrect execution of the threads. Unless the kernel is certain that no stale data exists in the caches, they must be flushed as well. Again, this has a direct cost of the flush operations, as well as an indirect cost, as the thread being scheduled starts with cold caches. The direct cost of cleaning (writing back) the data cache is particularly high since each line must be individu- ally cleaned. The purpose of this paper is to present an approach to providing fast address-space switches on the StrongARM, and discuss its proposed implementation in the L4 micro- kernel [1, 9, 10]. 2. StrongARM Virtual Memory Architecture In this section we summarise the StrongARM’s virtual memory architecture as far as is relevant to the topic of this paper. We describe general ARM features and note which features are specific to the StrongARM. 1