PREPRINT: "H. Kashif and H. Patel, "Buffer Space Allocation for Real-Time Priority-Aware Networks," in proceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), 2016, pp. 1-12." Buffer Space Allocation for Real-Time Priority-Aware Networks Hany Kashif Electrical and Computer Engineering University of Waterloo Waterloo, Ontario, Canada. Email: hkashif@uwaterloo.ca Hiren Patel Electrical and Computer Engineering University of Waterloo Waterloo, Ontario, Canada. Email: hiren.patel@uwaterloo.ca Abstract—In this work, we address the challenge of incor- porating buffer space constraints in worst-case latency analysis for priority-aware networks. A priority-aware network is a wormhole-switched network-on-chip with distinct virtual chan- nels per priority. Prior worst-case latency analyses assume that the routers have infinite buffer space allocated to the virtual chan- nels. This assumption renders these analyses impractical when considering actual deployments. This is because an implementa- tion of the priority-aware network imposes buffer constraints on the application. These constraints can result in back pressure on the communication, which the analyses must incorporate. Con- sequently, we extend a worst-case latency analysis for priority- aware networks to include buffer space constraints. We provide the theory for these extensions and prove their correctness. We experiment on a large set of synthetic benchmarks, and show that we can deploy applications on priority-aware networks with virtual channels of sizes as small as two flits. In addition, we propose a polynomial time buffer space allocation algorithm. This algorithm minimizes the buffer space required at the virtual channels while scheduling the application sets on the target priority-aware network. Our empirical evaluation shows that the proposed algorithm reduces buffer space requirements in the virtual channels by approximately 85% on average. I. I NTRODUCTION Chip-multiprocessors (CMPs) provide a solution to the in- creasing computational requirements of software applications. Network-on-chips (NoCs) provide an efficient and scalable interconnect for the components of the CMP [1]. CMPs are generally designed to optimize the average case performance of applications. Real-time applications require additional guar- antees to meet their timing constraints. Various NoC imple- mentation schemes have been proposed to accommodate real- time applications [2], [3]. Recent research also focuses on worst-case latency (WCL) analysis techniques that provide real-time guarantees in NoCs [3], [4], [5]. An example of an implementation that provides timing guarantees for real-time applications, is run-time arbitration. This scheme allows contention for the network resources un- like other resource reservation schemes such as time-division multiplexing (TDM). Router arbiters deterministically resolve the contention at run-time. WCL analysis techniques consider contention when computing worst-case bounds for the network communication. Priority-aware networks are an example of run-time arbitration schemes [3], [6], [7]. Recent WCL analysis techniques, including flow-level anal- ysis (FLA) [3] and stage-level analysis (SLA) [4] have been developed for priority-aware networks with flit-level preemp- tion. Priority-aware networks employ wormhole switching [8] and virtual channel (VC) resource allocation [9]. These tech- niques reduce the required buffer space by handling packets at the flit level, and allowing multiple flit buffers (virtual chan- nels) to access the same physical channel. However, priority- aware networks are susceptible to chain-blocking (blocked flits spanning multiple routers). Chain-blocking creates back- pressure in the priority-aware network, which eventually leads to blocking of the computation and communication tasks. Although we know that the blocking of communication tasks due to back-pressure affects its WCL, recent analyses do not account for blocking due to back-pressure. This is because they assume infinite buffer space in the VCs. This assumption makes it difficult to deploy applications on realized implemen- tations. Although a recent work [10] introduces buffer space analysis it computes the necessary buffer space requirements to not suffer chain-blocking. In practice, however, the platform dictates the buffer spaces available per VC, and deploying an application on such a platform may result in chain-blocking. This chain-blocking is not accounted for in prior analyses. Consequently, in this work, we address the problem of constructing a WCL analysis that incorporates buffer space restrictions. This analysis incorporates the chain-blocking ef- fect in the analysis. We extend the most recent WCL analysis, SLA [4], with buffer space restrictions, which we call SLA+. We experiment with a large set of synthetic benchmarks with 400000 different configurations. This allows us to stress-test the proposed analysis. We show that SLA+ can schedule task sets on priority-aware networks with buffer sizes as small as two flits, while still improving schedulability over prior analyses [3]. It is important to limit design costs through buffer space reduction [11], [12]. Therefore, given a buffer space constraint for each router in the NoC, we propose a polynomial time algorithm for allocating the buffer space between the router’s VCs. The algorithm reduces buffer space requirements, on average, by 85% and 89% compared to prior buffer analyses on priority-aware networks. The rest of the paper is organized as follows. Section II presents the related work, and Section III presents the nec- essary background to appreciate the proposed work. In Sec- tion IV, we present SLA with buffer constraints followed by a buffer space allocation algorithm in Section V. We present experimental evaluation in Section VI, and make concluding statements in Section VII.