An ILP Formulation for Task Mapping and Scheduling on Multi-core Architectures Ying Yi, Wei Han, Xin Zhao, Ahmet T. Erdogan and Tughrul Arslan University of Edinburgh, The King's Buildings, Mayfield Road, Edinburgh, EH9 3JL, UK Abstract-Multi-core architectures are increasingly being adopted in the design of emerging complex embedded systems. Key issues of designing such systems are on-chip interconnects, memory architecture, and task mapping and scheduling. This paper presents an integer linear programming formulation for the task mapping and scheduling problem. The technique incorporates profiling-driven loop level task partitioning, task transformations, functional pipelining, and memory architecture aware data mapping to reduce system execution time. Experiments are conducted to evaluate the technique by implementing a series of DSP applications on several multi-core architectures based on dynamically reconfigurable processor cores. The results demonstrate that the proposed technique is able to generate high-quality mappings of realistic applications on the target multi-core architecture, achieving up to 1.3x parallel efficiency by employing only two dynamically reconfigurable processor cores. I. INTRODUCTION An important trend in embedded systems is the use of multi-core architectures to meet application’s functional and performance requirements. Multi-core designs offer high performance and flexibility, at the same time promise low-cost and power-efficient implementations. However, the semiconductor industry is still facing several other technological challenges with multi-core systems. Important issues in multi-core designs are the communication infrastructure, memory architecture, and task mapping and scheduling. In multi-core architectures, the performance of the entire system is affected by the execution order of tasks and communications. It is well known that task mapping and task scheduling are highly inter-dependent. Therefore the two issues need to be handled together in order to obtain efficient mapping and scheduling. Dynamic reconfigurable (DR) processor combines the flexibility of FPGAs with the programmability found in general purpose processors (CPUs/DSPs) in a unified and easy programming environment. It is a strong candidate for multi-core systems. In our proposed embedded multi-core platform which has several DR processors [1], the shared memory heavily affects the execution time and power consumption. The time of data transmission between different processors must be considered during scheduling such that the design result can conform to the real situation. In addition, in order to meet the system throughput constraints, the design is pipelined to construct more efficient architectures. Pipelining divides the design into concurrently executing stages, thus increasing the throughput. In multi-core architectures all parallel tasks in an application have the potential to be executed simultaneously. However the number of such tasks may exceed the number of available processors. Therefore task mapping is required to assign the parallel tasks to the available processors. In the past, task merging and task replication have been proposed with the goal of re-allocating tasks when performance bottlenecks are met. Since task merging requires more local memory and task replication needs more processors to implement the same task [2], a multi-core architecture which does not feature sufficient memory and processors will severely limit the available mapping options using the existing methodology. Application development on multi-core architectures requires the designer, or automated tool, to divide tasks between available processors and to determine data mappings for the required memory elements. A SystemC-based simulation framework for mapping an application to a platform and evaluating its performance has been presented in [3]. The authors in [4, 5] have introduced scheduling and mapping parallel applications onto an MPSoC platform. Mapping solutions for bus-based and NoC-based MPSoCs have been described in [6] and [7]. Some automated system-level mapping techniques for application development on network processors have also been proposed [8]. This paper addresses the problem of automated application mapping and scheduling on DR processor based multi-core architectures. An Integer Linear Program (ILP) based approach is proposed for loop level task partitioning, task mapping and pipelined scheduling while taking the communication time into account for embedded applications. The efficacy of the technique is demonstrated by a series of DSP applications. The paper is organized as follows: Section 2 introduces the target DR processor as well as the target multi-core architecture. Section 3 describes the task mapping methodology. Section 4 gives a more detailed description of the problem addressed in this paper. Section 5 describes the proposed ILP based approach to solve the problem. The experimental results are given in section 6 followed by conclusions in section 7. II. TARGET MULTI-CORE ARCHITECURE Some applications demand a closer interconnection between the participating processors to achieve the required performance. Such a communication can be realised using distributed shared register files. The target multi-core platform is designed for DSP applications, which typically have intensive computations and a stream of input data. The architecture described in a previous work [2] consists of a selectable number of DR processors, which communicate with a shared memory through a full crossbar network. This architecture has been extended and modified by incorporating the shared register file into the system memory architecture in order to support the loop level parallelism proposed in this 978-3-9810801-5-5/DATE09 © 2009 EDAA