Topology Exploration and Buffer Sizing for Three-Dimensional Networks-on-Chip Alexandros Bartzas, Kostas Siozios, Dimitrios Soudris School of Electrical and Computer Engineering – National Technical University of Athens, 15780 Zografou, Greece 1. Introduction Future integrated systems will contain billion of transis- tors [9], composing tens to hundreds of IP cores. These IP cores, implementing emerging complex multimedia and net- work applications, should be able to deliver rich multimedia and networking services. An efficient communication among these IP cores (e.g., efficient data transfers) can be achieved through utilization of the available resources. An architecture that is able to accommodate such a high number of cores, sat- isfying the need for communication and data transfers, is the Network-on-Chip (NoC) architecture [2, 5]. Furthermore, the emerging three-dimensional (3D) integra- tion and process technologies allow the design of multi-level Integrated Circuits (ICs). As illustrated in [8], this creates new design opportunities in NoC design. In order to satisfy the demands of emerging systems for scaling, performance and functionality 3D integration is a way to accommodate these demands [3]. For example, a considerable reduction can be achieved in the number and length of global interconnect using three-dimensional integration. On deciding whether to choose a two-dimensional (2D) or 3D NoC as an architecture it is shown in [1,4] that 3D NoCs are advantageous, providing bet- ter performance. In this work we present an exploration methodology design- ing alternative 3D NoC architectures. We define as 3D NoCs these architectures composed of many layers, where each layer is a two-dimensional NoC grid, where the grids are the same for all the layers, composed of elements of the same type(s). The main objective of the methodology is to derive to hetero- geneous 3D NoC topologies with a mix of 2D and 3D routers and vertical link interconnection patterns that perform best to the incoming traffic. Furthermore, the combination of priority- based QoS and buffer sizing techniques is proposed for the first time. The starting point of the proposed methodology is an al- ready optimized mapping [7] which is 32% better than other solutions. In this way additional improvements in latency and energy consumption can be achieved. Moreover, the proposed methodology is applied in computationally intensive applica- tions, e.g. DSP, mapped into 2D and 3D NoC mesh architec- tures. The cost factors we consider are: a) energy consumption; b) average packet latency and i) total switch block area. 2. Methodology An overview of the proposed methodology is shown in Fig- ure 1. In order to perform the exploration for alternative topolo- gies of 3D NoC architectures, we have used as a basis the Worm Sim NoC Simulator [6] that utilizes wormhole switch- ing. As it is shown in Figure 1 now the simulator supports 3D Figure 1. An overview of the exploration method- ology of alternative topologies for 3D Networks- on-Chip. NoC architectures (3D Mesh and 3D Torus) and vertical link in- terconnection patterns [1]. The 3D architectures to be explored may have a mix of two- and three-dimensional routers, ranging from very few 3D routers to only 3D routers (100% vertical in- terconnection link presence). The output of the simulation is a log file containing the relevant cost factors we evaluate, such as overall latency, average latency per packet and the energy breakdown of the NoC, providing numbers for link energy con- sumption, crossbar and router energy consumption etc. From these energy figures we calculate the total energy consumption of the 3D NoCs. In order to steer the exploration we are based on different patterns (as they were presented in [1]. The pro- posed 3D NoCs can be constructed by placing a number of identical two-dimensional NoCs on individual layers, provid- ing communication by inter-layer vias among vertically adja- cent routers. This means that the position of silicon vias is ex- actly the same for each layer. Hence, the router configuration is extended to the third dimension, while the structure of the in- dividual logic blocks (IP cores) remains unchanged. Moreover, when real applications are used, instead of synthetic traffic, the priority assignment and buffer sizing algorithms are employed (after the application mapping phase) in order to achieve opti- mizations regarding the latency and the energy consumption of