978-1-4799-5944-0/14/$31.00 c 2014 IEEE Automatic Cache Partitioning and Time-triggered Scheduling for Real-time MPSoCs Gang Chen 1 , Biao Hu 1 , Kai Huang 1,2 , Alois Knoll 1 , Kai Huang 3 , Di Liu 4 , Todor Stefanov 4 1 Tech. Univ. Muenchen TUM, 2 Sun Yat-sen University, 3 Zhejiang University, 4 Leiden University {cheng,hub,huangk,knoll}@in.tum.de, huangk@vlsi.zju.edu.cn, {d.liu,t.p.stefanov}@liacs.leidenuniv.nl Abstract—Shared cache in modern multi-core systems has been considered as one of the major factors that degrade system predictability and performance. How to manage the shared cache for real-time multi-core systems in order to optimize the system performance while guaranteeing the system predictability is still an open issue. In this paper, we present a framework that can exploit cache management for real-time MPSoCs. The framework supports dynamic way-based cache partitioning at hardware level, building task-level time-triggered reconfigurable-cache MPSoCs. It automatically determines time- triggered schedule and cache configuration for each task to improve the system performance while guarantee the real- time constraints. We evaluate the proposed framework with respect to different numbers of cores and cache modules and prototype the constructed MPSoCs on FPGA. Experiment results based on FPGA implementation demonstrate the effectiveness of the proposed framework over the state-of-the-art cache management strategies when tested 27 benchmark programs on the constructed MPSoCs. I. I NTRODUCTION Computing systems are increasingly moving towards multi- core platforms. To alleviate the high latency of the off-chip memory, multi-processor system-on-chip (MPSoC) architec- tures are typically equipped with hierarchical cache subsys- tems. For instance, ARM Cortex-A15 series [4] use small L1 caches for individual cores and a relatively large L2 cache shared among different cores. Due to this inherent complex cache hierarchy, the analysis of shared cache subsystem has received much attention [14], [17], [31], in recent years. The main problem of cache hierarchy is that the behavior of shared cache is hard to predict and analyze statically [1], [14] in MPSoCs. For instance, a task running on one core may evict useful L2 cache space, which is used by another task in another core. These inter-core cache interferences will cause an increase in the miss rate [34], leading to a corresponding decrease in performance. In addition, inter- core cache interferences are extremely difficult to analyze accurately [14], thus resulting in difficulty of estimating the worst-case execution time (WCET) of the application program. Therefore, how to tackle the shared cache in the context of real-time systems is still an open issue [1], [34] and the difficulty actually prohibits an efficient use of the MPSoCs for real-time systems. For instance, to resolve the predictability problem for MPSoCs, avionics manufacturers usually turn off all cores but one for their highly safety-critical subsystems [31]. The work in [17] also report that inter-core cache interferences on a state-of-the-art quad-core processor increased the task completion time by up to 40%, compared to when it runs alone in the system. Being aware of this, this paper studies the problem of how to use the shared cache in a predictable and efficient manner under real-time requirements with the existence of cache interference. This work has been partly funded by German BMBF projects ECU (grant number: 13N11936) and Car2X (grant number: 13N11933). To address this problem, most of the state-of-the-art tech- niques [17], [27], [31] on the multi-core cache management for real-time systems use page-coloring, i.e., a software cache partitioning approach in the OS level, to partition the cache by sets. The problem for page-coloring based techniques is the significantly large timing overhead when performing recoloring. This timing overhead on the one hand prohibits a frequent change of the colors of pages [18], on the other hand makes color changes of tasks whose execution time is less than the page-change overhead not worthy. To tackle these problems, we consider task-level schedule-aware cache partitioning and implement cache partitioning in our customized reconfigurable cache hardware component with minimal timing overhead. Combining real-time task scheduling and cache size alloca- tion is however more involved. On the one hand, the WCET of a task depends on the allocated cache size. On the other hand, the maximal cache budget that can be assigned to a task depends on the cache sizes occupied by other tasks that are currently running on the other cores, i.e., depending on the scheduler. Furthermore, the performance of tasks may have different sensitivity to the assigned cache size. In principle, the task scheduling and the cache size allocation interrelate to each other with respect to the system performance, such as cache misses and energy consumption [30]. Therefore, a sophisticated framework is needed to find the best trade-off between them in order to improve the system performance. This paper tackles schedule-aware cache managment scheme for real-time MPSoCs. We present an integrated framework to exploit and verify the interactions between the task scheduling and the shared L2 cache interference. For a given set of tasks and a mapping of the tasks on an MPSoC, our approach can generate a fully deterministic time-triggered non-preemptive schedule and a set of cache configurations during the compilation time. During runtime, the cache is reconfigured according to offline computed configurations. The generated schedule and the cache configurations together minimize the cache miss of the cache subsystem while preventing deadline misses and cache overflows. With a customized reconfigurable cache component and share-clock multi-port timer component, our framework can generate MPSoCs with different numbers of cores and different cache modules (different cache configurations with respect to cache lines, size, and associativity) and prototype on Altera FPGA. The contributions of our work are as follows: • We proposed an integrated cache management frame- work that improves the execution predictability for real- time MPSoCs. The proposed framework can automat- ically generate fully deterministic time-triggered non- preemptive schedule and cache configurations to optimize system performance under real-time constraints. • We developed a parameterized reconfigurable cache