On-Chip Dynamic Programming Networks Using 3D-TSV Integration Ra’ed Al-Dujaily † , Terrence Mak † , Kuan Zhou ‡ , Kai-Pui Lam ★ , Yicong Meng ★★ , Alex Yakovlev † , Chi-Sang Poon ★★ † School of Electrical, Electronic & Computer Engineering, Newcastle University, UK {raaed.aldujaily, terrence.mak, alex.yakovlev}@newcastle.ac.uk ‡ Department of Electrical and Electronic Engineering, New Hampshire University, USA ★ Department of Systems Engineering, The Chinese University of Hong Kong, Hong Kong ★★ Harvard-MIT Health Sciences and Technology, Massachusetts Institute of Technology, USA Abstract—Recent technological advances in three-dimensional (3D) semiconductor fabrication have provided a promising plat- form for realizing densely interconnected multicore, multiproces- sor, and networks-on-chip (NoC) based systems. As the on-chip complexity grows signiﬁcantly with the number of computational, control, and communication units, design considerations and the provision for efﬁcient run-time resources management in large-scale system becomes critical. We have developed an on- chip distributed dynamic-programming (DP) network [3] [5] for a range of applications including optimal paths planning [6], dynamic routing [5] and deadlock detection [2]. This paper presents a design of DP-network, implemented in a fully stacked 3-layer three-dimensional (3D) through-silicon via (TSV) 150 nm CMOS technology through MIT Lincoln Lab [1]. The vertical inter-unit communication is achieved by means of TSV, and the mesh interconnection provides a natural minimal area overhead associated with this communication. The prototype circuit mea- sures 2mm×2mm. Test results demonstrated the effectiveness of such a DP-network for deadlock detection and the computational delay is less than 9 ns for detecting deadlock from a large- scale network. This work provides promising results for future networks-on-chip application using 3D embedded DP-network. Index Terms—Networks-on-chip, 3D IC, dynamic program- ming, deadlock detection, performance analysis. I. I NTRODUCTION Due to technology scaling, global interconnects becomes the performance hindrance to current and future very-large- scale integration (VLSI) systems design. The Network-on- Chip (NoC) [10] [11] architectural paradigm emerges to tackle the long interconnects problem and enables a scalable solution to integrate a large number of intellectual property (IP) cores in a single silicon chip. Each of the IP cores can be an implementation of processor cores, memory modules, DSP blocks and embedded reconﬁguration modules . NoCs bring remarkable improvement in performance, ﬂexibility, scalability, and power efﬁciency over conventional bus-based interconnection schemes. The merit of NoCs has been widely studied. The recent Intel 80-tile chip fabricated using 65-nm technology that delivers 1.28 TFLOP peak performance has been demonstrated using a NoC based on-chip communication architecture [7]. Recently, the possibility of three-dimensional (3D) integra- tion provides a new dimension to exploit novel geometric integration of silicon dies [8]. A variety of vertical cross-die interconnection techniques are developed. For example, the through-silicon via (TSV) [9] of 3D-IC connecting multiple die/wafer layers in a single chip provides opportunities to increase the integration capacity and also reduces the global interconnects lengths. Also, 3D chip is capable of integrating different technological scales and/or different technological compartments such as CMOS logics, Memory, analogue sen- sors and even micro-electro-mechanical systems (MEMS) by implementing them over multiple die layers. Particularly, overall performance of multi-core systems can be signiﬁcantly enhanced in a 3D architecture over conven- tional 2D implementations. For example, our cycle accurate simulation results demonstrate the performance improvements of 3D-NoC in term of throughput and network saturation point when compared to the 2D-NoC (See Fig. 1 1 ). Fig. 1- b shows the performance gain of the 3D NoCs over 2D NoCs. The improvement increases rapidly when the number of tiles increases. The account for throughput and saturation packet injection rate improvements is due to the smaller hops count in 3D NoCs when compared to 2D networks [12]. Thus using 3D-NoCs a substantial performance improvements can be achieved over conventional 2D implementations. Although NoCs demonstrates a noticeable performance im- provement in 3D networks, system complexity also grows signiﬁcantly with the tightly coupled vertical interconnects. Signiﬁcant design efforts and considerations for run-time man- agement including thermal effects, routing dynamics, power dissipation and hot-spot management become necessary. We have developed an on-chip distributed dynamic-programming (DP) network [3] [5] for a range of applications including optimal paths planning [6], dynamic routing [5] and deadlock detection [2]. In this paper, we review the techniques of 1 The saturation PIR calculated at the point the increase in applied load does not result in a linear increase in throughput [16]. Fig. 1-a illustrate the calculation of the throughput and saturation PIR improvements for 128 nodes interconnected as a mesh 2D-NoC (16 ×8) and mesh 3D-NoC(8 ×4 ×4). The result captured based on a mesh NoC topology with 4 and 6 channels router architecture for the 2D and 3D NoCs respectively and xy routing algorithm for 2D and xyz routing for 3D NoCs. The routers vertical channels characteristics of the 3D NoCs are assumed to have same as the 2D NoCs channels.