Memory-Aware Task Scheduling with Communication Overhead Minimization for Streaming Applications on Bus-Based Multiprocessor System-on-Chips Yi Wang, Zili Shao, Henry C.B. Chan, Duo Liu, and Yong Guan Abstract—Inter-core communication introduces overheads in task schedules on Multiprocessor System-on-Chips (MPSoCs). Inter-core communication overhead not only negatively impacts the timing performance but also significantly degrades the memory usage for streaming applications running on MPSoC architectures. By minimizing inter-core communication overhead, a shorter period can be applied and system performance (e.g., throughput, memory usage) can be improved. In this paper, we focus on solving the problem of minimizing inter-core communication overhead for streaming applications on bus-based MPSoCs. The objective is to minimize inter-core communication overhead while minimizing the overall memory usage. To solve the problem, we first let tasks with intra-period data dependencies transform to inter-period data dependencies so as to overlap the execution of computation and inter-core communication tasks. By doing this, inter-core communication overhead can be effectively removed. To minimize the overall memory usage, we then perform schedulability analysis and obtain the bounds of the times needed to reschedule each task. Based on the schedulability analysis, we formulate the scheduling problem as an integer linear programming (ILP) model and obtain an optimal schedule. In addition, we propose a heuristic approach to efficiently obtain a near-optimal solution. We conduct experiments on a set of benchmarks from both real-life streaming applications and synthetic task graphs. The experimental results show that the proposed approach can significantly reduce the schedule length and improve the memory usage compared with the previous work. Index Terms—Real-time, task scheduling, memory-aware, inter-core communication, streaming applications, bus, MPSoC Ç 1 INTRODUCTION S TREAMING applications that process streams of data are often modeled as periodic dependent tasks, in which streams of data are communicated from task to task [2], [3]. Streaming applications are data intensive and highly parallelizable; therefore, they are very suitable to be executed on Multiprocessor System-on-Chip (MPSoC). To fully utilize the compute capability of MPSoCs, various techniques have been explored to increase parallelisms of streaming applications. However, this may cause a large amount of inter-core communications with considerable communication overheads. By minimizing inter-core com- munication overhead, a shorter period can be applied and system performance (e.g., throughput, and memory usage) can be improved. Therefore, it becomes an important research problem to effectively reduce inter-core commu- nication overhead for streaming applications on MPSoCs. Streaming applications often have firm real-time re- quirements. The communication overhead poses a chal- lenge for bus-based multi-core hard real-time systems, since most of the existing theoretically optimal scheduling techniques on multi-core architectures assume zero cost for inter-core communications. For streaming applications running on MPSoC architectures, fairly large buffers are needed to hold the intermediate processing results between tasks. As a result, the total size of the buffer arrays usually accounts for a significant portion of the application binary memory footprint [4]. Minimizing the inter-core communi- cation overhead can significantly reduce the overall mem- ory usage, which would be of great value in the resource constrained embedded multiprocessor systems. Since the task assignment of computation and inter-core communication tasks will directly influence the memory usage, we jointly reschedule both computation and inter- core communication tasks. The objective is to generate an optimal task schedule with the maximum application throughput while minimizing the overall memory usage. In our technique, we let a limited number of tasks re- schedule into earlier periods (the newly-added preproces- sing step is called prologue). After transforming intra-period data dependencies into inter-period data dependencies, the execution of computation tasks and that of inter-core com- munication tasks in each period can be overlapped and the inter-core communication overhead can be effectively removed. To the best of our knowledge, this is the first work that aims to minimize inter-core communication . Y. Wang, Z. Shao, H.C.B. Chan, and D. Liu are with the Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong. E-mail: cszlshao@comp.polyu.edu.hk. . Y. Guan is with the College of Computer and Information Management, Capital Normal University, Beijing, China. Manuscript received 31 Jan. 2013; revised 5 June 2013; accepted 13 June 2013. Date of publication 1 July 2013; date of current version 13 June 2014. Recommended for acceptance by J. Flich. For information on obtaining reprints of this article, please send e-mail to: reprints@ieee.org, and reference the Digital Object Identifier below. Digital Object Identifier no. 10.1109/TPDS.2013.172 1045-9219 Ó 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 25, NO. 7, JULY 2014 1797