Robust Sub-Powered Asynchronous Logic Jiaoyan Chen #1 , Arnaud Tisserand #2 , Emanuel Popovici #3 , Sorin Cotofana #1 Department of Computer Engineering, TU Delft, Delft, the Netherlands #1 CNRS, IRISA, INRIA, Univ. Rennes 1, Lannion, France #2 Department of Electrical and Electronic Engineering, University College Cork, Cork, Ireland #3 AbstractWhile MOSFET technology scaling provides substantial advantages in terms of Integrated Circuits (ICs) speed and energy consumption those are coming at the expense of a higher sensitivity to process, voltage, and temperature (PVT) variations. To alleviate this lack of robustness, which became a critical issue in advanced deep sub-micron technologies, many mechanisms have been proposed at all abstraction levels from device and circuit up to architecture and application software. Among those, a natural solution is to rely on asynchronous logic design style as by its nature is less sensitive to delay variations, which are the "de facto" PVT variations consequence. Several asynchronous logic families have been introduced as follows: (i) Single-rail energy effective logic but still time-sensitive as it relies on delay elements and (ii) Dual-rail robust but more power hungry logic. In this paper we introduce a robust asynchronous logic family which does not rely on timing assumptions and/or delay elements and can operate with sub-powered devices. The key element behind our proposal is a simplified completion detection mechanism which makes it substantially more energy effective when compared with other dual-rail approaches. A 32-bit Ripple Carry Adder (RCA) is implemented in 65nm and 45nm CMOS process to evaluate the practicability of our approach. Firstly, the Optimal Energy Point (OEP) of the proposed RCA is investigated by scaling VDD from 0.4V to 0.2V (50mV interval), where the OEP occurs at 0.25V for both technologies. Secondly, while comparing the energy consumption with the corresponding single-rail benchmark at its OEP in 65nm process, 30% (34 fJ for 65nm) and 40% (54fJ for 45nm after scaling) energy savings are achieved respectively. More impressive (10x better) energy efficiency and reasonable performance are obtained over dual-rail counterparts. At last, process variations concerned Monte Carlo simulation is executed to demonstrate the robustness of our methodology as well to explore the response of OEP, which remains unchanged at 0.25V. Keywordsasynchronous logic; low power; robustness; near/sub-threshold; process variation I. INTRODUCTION High demand for low power electronic products pushes the development of power-saving technologies and techniques to new boundaries. Among them, voltage scaling techniques have been one of the most effective and straightforward methods to reduce the power/energy consumption in digital Integrated Circuits (ICs) [1]. In the voltage scaling family of techniques, sub-threshold logic which lowers the supply voltage to near or below the threshold voltage (V th ) of MOSFET has been proven to achieve significant power/energy efficiency of at least one order of magnitude reduction in ICs [2,3]. Several works based on this type of technique have been proposed during last few years [4-7]. However, the penalty in performance is also significant. This guides also our work, namely to find the best trade-off between power savings and performance. Meanwhile, as the MOS transistor sizes reach tens of nanometers, Process, Voltage, and Temperature (PVT), etc, complicate the timing analysis/validation in synchronous designs. Moreover, the clock tree has already been considered as one of the major energy optimization bottleneck in synchronous circuits (SYNC). With respect to the above- mentioned concerns, asynchronous logic (ASYNC) provides an interesting alternative solution. The nature of self-timing and clock-less principle makes ASYNC more tolerant to PVT variations and potentially lower power/energy consumption than its synchronous counterparts. These features are exploited to aid the design of large SOC using mixed logic (Globally Asynchronous Locally Synchronous or GALS) [8]. The research in ASYNC designs operating at Ultra-Low Voltage (ULV) supply has drawn attention over last few years. ASYNC is believed to have more advantages with voltage scaling. The drain current of MOSFETs is more dominated by diffusion current, which is exponentially sensitive to PVT variations while the fixed rate of a global clock signal can become less practicable. In ASYNC, there are mainly two protocols, namely bundle-data (single-rail) and dual-rail each of them having its own advantages and disadvantages [8]. For bundle-data, static CMOS gates can be used, thus it is easy to implement using Hardware Description Languages (HDLs) and/or CAD tools. Also, due to the single-rail property, the bundle-data protocol has higher power efficiency. On the other hand, the dual-rail family does not require delay elements that are necessary in bundle-data, which makes the dual-rail protocol more tolerant to PVT variations and hence results in more reliable circuits. In [9], an ASYNC bundled-data pipelines called MOUSETRAP using local clockgenerators, i.e., replica delay, placed closely to the logic blocks, is proposed. The close placement topology can provide a tracking ability between the delay replica and logic circuits. In other words, both the delay elements and the logic circuits suffer similar variations (such as temperature). Therefore MOUSETRAP is robust to systematic variations. However, when it comes to random PVT variations, the tracking ability of the bundle-data pipelines becomes vulnerable especially at ULV [10]. An improved version is called soft MOUSETRAP with a wider capturing window in latches, which allows latches to capture