A novel DRAM architecture as a low leakage alternative for SRAM caches in a 3D interconnect context. Anselme Vignon, Stefan Cosemans, Wim Dehaene K.U. Leuven ESAT - MICAS Laboratory Kasteelpark Arenberg 10, Leuven, Belgium anselme.vignon@esat.kuleuven.be Pol Marchal, Marco Facchini IMEC Kapeldreef 75, B-3001 Leuven, Belgium pol.marchal@imec.be Abstract—This paper presents a DRAM architecture that improves the DRAM performance/power trade-off to increase their usability on low power chip design using 3D interconnect technology. The use of a finer matrix subdivision and buffering the bitline signal at the localblock level allows to reduce both the energy per access and the access time. The obtained performances match those of a typical low power SRAM, while achieving a significant area and static power reduction compared to these memories. The 128 kb memory architecture proposed here achieves an access time of 1.3 ns for a dynamic energy of less than 0.2 pJ per bit. A localized refresh mechanism allows gaining a factor of 10 in static power consumption associated with the cell, and a factor of 2 in area, when compared with an equivalent SRAM. I. CONTEXT As feature size reduces, on-chip memory design is becoming more and more challenging. Reducing the typical dimensions and the supply voltage for SRAM memories degrades the cell stability [1]. The stability is degraded further by intra- die variations which lead in addition to increased average power consumption. Several solutions have been investigated to reduce this issue, from changing the cell topology [2] [3] [4] to modifying the peripheral architecture [5]. However, these solutions increase the memory area and thus compromise scaling. Embedded DRAM (eDRAM) has been proposed for large memory arrays. eDRAM clock speed and access time have been improved to match the SRAM typical behavior [6]. However, using eDRAM requires to integrate more dense capacitors in the logic technology process, and thus needs costly additional process steps. 3D interconnect enables the use of heterogeneous technolo- gies on the same chip. 3D vias are typically smaller and have less parasitic capacitance than off-chip connections [7]. In addition, they can be spread across the chip. This reduces the routing energy, and increases the number of available connections between two stacked dies. These advantages allow to provide a better bandwidth- energy trade off for the routing between two stacked dies than between two packaged dies. A possible application of 3D interconnect is to separate the logic core of a system from the Fig. 1. Global architecture - WL/BL subdivision Local_Address Block_address Global_SA Mux GBL data_out LWL receiver Local SA 32x32 cells x16 x16 GWL memory it requires. Such systems have already been studied in [8] [9], with stacks of an SRAM matrix on top of a logic layer. It is also possible to stack DRAM on top of a logic layer. This solution offers numerous other advantages compared to packaged DRAM, including simpler inputs/outputs protocol, and can solve the terminations and clock synchronisation issues by using shorter connections. This allows using conven- tional DRAM instead of SRAM or embedded DRAM for the largest memories in SOC, bringing a higher density compared to SRAM, without the need to integrate dedicated capacitors in the logic process, as for eDRAM. However, traditional DRAM is outperformed by SRAM in several domains. The typical access time of a DRAM is still higher than for an SRAM, and the access energy per bit is higher. This makes conventional DRAM not suited for high activity caches, where dynamic access energy consumption and delay are critical. 978-3-9810801-5-5/DATE09 © 2009 EDAA