The case for Globally Irregular Locally Regular Algorithm Architecture Adequation Pierre Boulet and Ashish Meena Laboratoire d’Informatique Fondamentale de Lille Cite scientifique 59 655 Villanelle d’Asce Ceded, France Email: Pierre.Boulet@lifl.fr, Ashish.Meena@lifl.fr Abstract— In modern embedded systems, parallelism is a good way to reduce power consumption while respecting the real-time constraints. To achieve this, one needs to efficiently exploit the potential parallelism of the application and of the architecture. We propose in this paper a hybrid optimization method to improve the handling of repetitions in both the algorithm and the architecture. The approach is called Globally Irregular Locally Regular and consists in combining irregular heuristics and regular ones to take advantage of the strong points of both. I. I NTRODUCTION When designing Systems-on-Chip (SoCs) for embedded applications, one often has to deal with hard optimization problems. Dimensioning the hardware platform in such a way that the constraints of the system are satisfied is a challenging problem. Indeed, these constraints are functional, but also non functional such as real-time or power consump- tion related. This problem is called “Algorithm Architecture Adequation” [1]. In many computation intensive applications such as signal processing or consumer multimedia these requirements are particularly strong. Hopefully these applications exhibit paral- lelism and can benefit from multiprocessor architectures. The goal of this paper is to propose a better way to handle paral- lelism in the optimization heuristic. We advocate a Globally Irregular Locally Regular approach to this problem. All the point here is to exploit as well as possible the repetitions in the algorithm and in the hardware architecture. After some motivation and related work in Section II, we will present our proposal in Section III and present a small experiment to demonstrate the benefits of this approach in Section IV. We will finally conclude and propose some future works in Section V II. MOTIVATION A. Parallelism to reduce power consumption When designing applications for embedded systems, one of- ten wants to reduce power consumption. The usually admitted formula for power consumption of a SoC is [2]: αCV dd 2 f + I off V dd where f represents the frequency of the chip, V dd is the supply voltage. α is a coefficient that takes different values for logic and memory. From this formula we can deduce that parallelism is a good way to reduce power consumption. Indeed, for a given work amount, W and a given duration (or real-time constraint), τ , one can use a single fast processor of frequency f seq = W/τ or n slower processors of frequency f n = W/nτ . This is in an ideal world where the work can be split in n parts without overhead (communications, etc). As V dd can be lowered when f decreases and appears squared in the power consumption equation, using parallelism allows to decrease the frequency and thus the supply voltage and the power consumption of the chip. One of the main problems that reduces the efficiency of parallel algorithms is the overhead caused by communications. This is particularly true when considering clusters of work- stations or massively parallel processors. Thus, to program efficiently such computers, one needs coarse grain parallelism to be able to overlap communications by computations. This problem is much less crucial in SoCs because on chip communications are very fast, nearly as fast as computations. Communicating one data-element often takes only one or few cycles. Thus, to be able to overlap communications by computations, small grain computations may be enough. From these two remarks, we can conclude that parallelism (and even fine grain parallelism) is a promising solution to re- duce power consumption in SoCs. Some regular architectures have already appeared on SoCs [3], [4], [5], [6]. B. Available parallelism in applications There are two sorts of parallelism available in applications: control parallelism and data parallelism. The former is usually modelled using task graphs while the latter corresponds to loops in the code or repetitions in tools such as SynDEx [7]. Irregular heuristics such as list heuristics [8], [9], [10], [11], [12], [13] are good tools to exploit control parallelism. On the other hand, these heuristics don’t handle easily data parallelism. These heuristics are general, they can handle a very large class of problems but they may ignore some high level information that is available in their input (loop structure for example). On the other hand, regular heuristics such as the ones devel- oped in the automatic parallelism domain [14] are much more adapted to deal with data parallelism. These heuristics are well adapted to distribute loops onto regular architectures. Their