ISOS: Space Overlapping Based on Iteration Access Patterns for Dynamic Scratch-pad Memory Management in Embedded Systems Yanqin Yang 1, 3 , Zili Shao 2 , Linfeng Pan 1 , Minyi Guo 1 1. Dept. of Computer Science and Engineering, Shanghai Jiao-Tong Univ. Shanghai, China 2.Dept. of Computing, Hong Kong Polytechnic Univ., Hung Hom, Kowloon, Hong Kong 3. Dept. of Computer Science and technology, East China Normal Univ., Shanghai, China Email: yang-yq@sjtu.edu.cn, cszlshao@comp.polyu.edu.hk, guo-my@cs.sjtu.edu.cn Abstract Scratch-pad memory (SPM), a small fast software- managed on-chip SRAM (Static Random Access Memory), is widely used in embedded systems. With the ever-widening performance gap between processors and main memory, it is very important to reduce the serious off-chip memory access overheads caused by transferring data between SPM and off-chip memory. In this paper, we propose a novel compiler-assisted iteration-access-pattern-based space overlapping technique for dynamic SPM management (ISOS) with DMA (Direct Memory Access). In ISOS, we combine both SPM and DMA for performance optimization by exploiting the chance to overlap SPM space so as to further utilize the limited SPM space and reduce the number of DMA operations. We implement our technique based on IMPACT and conduct experiments using a set of benchmarks form DSPstone and Mediabench on the cycle-accurate VLIW simulator of Trimaran. The experimental results show that our technique achieves significant run-time performance improvement compared with the previous work. 1. Introduction The ever-widening performance gap between CPU and off-chip memory requires effective techniques to reduce memory accesses. To alleviate the gap, scratch- pad memory (SPM), a small fast software-managed on- chip SRAM (Static Random Access Memory), is widely used in embedded systems [1] with its advantages in energy and area [2-5]. A recent study [6] shows that SPM has 34% smaller area and 40% lower power consumption than the cache of the same capacity. As the cache typically consumes 25%-50% of the total energy and area of a processor, SPM can help significantly reduce the energy consumption for embedded processors. Embedded software is usually optimized for specific applications, so we can utilize SPM to improve performance and predictability by avoiding cache misses. With these advantages, SPM has become the most common SRAM in embedded processors. However, it posts a big challenge for compiler to fully explore SPM since it is completely controlled by software. To effectively manage SPM, two kinds of compiler-managed methods have been proposed: static method [2-5] and dynamic method [1, 7-9]. Basically, based on the static SPM management, the content in SPM is fixed and is not changed during the running time of applications. With the dynamic SPM management, the content of SPM is changed during the running time based on the behavior of applications. For dynamic SPM management, it is very important to select an effective approach to transfer data between off-chip memory and SPM. This is because the latency of off-chip memory access is about 10-100 times of that of SPM [1, 2, 7], and many embedded applications in image and video processing domains have significant data transfer requirements in addition to their computational requirements [8]. To reduce off- chip memory access overheads, the dedicated cost- efficient hardware, DMA (Direct Memory Access), is used to transfer data. The focus of this paper is on how to combine SPM and DMA in dynamic SPM management for optimizing loops that are usually the most critical sections in some embedded applications such as DSP and image processing. Our work is closely related to the work in [7, 9, 12-13]. In [7], DMA is applied for data transfer between SPM and off-chip memory . The same cost model using DMA for data transfer has been used in [9, 12-13] to accelerate data transfer between off-chip memory and SPM. However, most of the above work focuses on array allocation optimization in SPM without considering optimizing DMA transfer. Because SPM is a small on-chip memory, we cannot put all the necessary data at one time taking the power, size of embedded system into account. Therefore, multiple times of DMA transfer are needed for arrays The 9th International Conference for Young Computer Scientists 978-0-7695-3398-8/08 $25.00 © 2008 IEEE DOI 10.1109/ICYCS.2008.538 1360