ISOS: Space Overlapping Based on Iteration Access Patterns for Dynamic
Scratch-pad Memory Management in Embedded Systems
Yanqin Yang
1, 3
, Zili Shao
2
, Linfeng Pan
1
, Minyi Guo
1
1. Dept. of Computer Science and Engineering, Shanghai Jiao-Tong Univ. Shanghai, China
2.Dept. of Computing, Hong Kong Polytechnic Univ., Hung Hom, Kowloon, Hong Kong
3. Dept. of Computer Science and technology, East China Normal Univ., Shanghai, China
Email: yang-yq@sjtu.edu.cn, cszlshao@comp.polyu.edu.hk, guo-my@cs.sjtu.edu.cn
Abstract
Scratch-pad memory (SPM), a small fast software-
managed on-chip SRAM (Static Random Access
Memory), is widely used in embedded systems. With the
ever-widening performance gap between processors
and main memory, it is very important to reduce the
serious off-chip memory access overheads caused by
transferring data between SPM and off-chip memory.
In this paper, we propose a novel compiler-assisted
iteration-access-pattern-based space overlapping
technique for dynamic SPM management (ISOS) with
DMA (Direct Memory Access). In ISOS, we combine
both SPM and DMA for performance optimization by
exploiting the chance to overlap SPM space so as to
further utilize the limited SPM space and reduce the
number of DMA operations. We implement our
technique based on IMPACT and conduct experiments
using a set of benchmarks form DSPstone and
Mediabench on the cycle-accurate VLIW simulator of
Trimaran. The experimental results show that our
technique achieves significant run-time performance
improvement compared with the previous work.
1. Introduction
The ever-widening performance gap between CPU
and off-chip memory requires effective techniques to
reduce memory accesses. To alleviate the gap, scratch-
pad memory (SPM), a small fast software-managed on-
chip SRAM (Static Random Access Memory), is
widely used in embedded systems [1] with its
advantages in energy and area [2-5]. A recent study [6]
shows that SPM has 34% smaller area and 40% lower
power consumption than the cache of the same capacity.
As the cache typically consumes 25%-50% of the total
energy and area of a processor, SPM can help
significantly reduce the energy consumption for
embedded processors. Embedded software is usually
optimized for specific applications, so we can utilize
SPM to improve performance and predictability by
avoiding cache misses. With these advantages, SPM
has become the most common SRAM in embedded
processors. However, it posts a big challenge for
compiler to fully explore SPM since it is completely
controlled by software.
To effectively manage SPM, two kinds of
compiler-managed methods have been proposed: static
method [2-5] and dynamic method [1, 7-9]. Basically,
based on the static SPM management, the content in
SPM is fixed and is not changed during the running
time of applications. With the dynamic SPM
management, the content of SPM is changed during the
running time based on the behavior of applications. For
dynamic SPM management, it is very important to
select an effective approach to transfer data between
off-chip memory and SPM. This is because the latency
of off-chip memory access is about 10-100 times of
that of SPM [1, 2, 7], and many embedded applications
in image and video processing domains have
significant data transfer requirements in addition to
their computational requirements [8]. To reduce off-
chip memory access overheads, the dedicated cost-
efficient hardware, DMA (Direct Memory Access), is
used to transfer data. The focus of this paper is on how
to combine SPM and DMA in dynamic SPM
management for optimizing loops that are usually the
most critical sections in some embedded applications
such as DSP and image processing.
Our work is closely related to the work in [7, 9,
12-13]. In [7], DMA is applied for data transfer
between SPM and off-chip memory . The same cost
model using DMA for data transfer has been used in [9,
12-13] to accelerate data transfer between off-chip
memory and SPM. However, most of the above work
focuses on array allocation optimization in SPM
without considering optimizing DMA transfer.
Because SPM is a small on-chip memory, we cannot
put all the necessary data at one time taking the power,
size of embedded system into account. Therefore,
multiple times of DMA transfer are needed for arrays
The 9th International Conference for Young Computer Scientists
978-0-7695-3398-8/08 $25.00 © 2008 IEEE
DOI 10.1109/ICYCS.2008.538
1360