This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE JOURNAL OF SOLID-STATE CIRCUITS 1 A 1.02-pJ/b 20.83-Gb/s/Wire USR Transceiver Using CNRZ-5 in 16-nm FinFET Armin Tajalli , Senior Member, IEEE, Mani Bastani Parizi, Member, IEEE, Dario Albino Carnelli, Chen Cao, Kiarash Gharibdoust , Davide Gorret, Amit Gupta, Christopher Hall, Ahmed Hassanin, Klaas L. Hofstra, Brian Holden, Ali Hormati, John Keay, Yohann Mogentale, Victor Perrin, John Phillips, Sumathi Raparthy, Amin Shokrollahi, Fellow, IEEE , David Stauffer, Richard Simpson, Andrew Stewart, Giuseppe Surace, Omid Talebi Amiri, Emanuele Truffa, Anton Tschank, Roger Ulrich, Christoph Walter, and Anant Singh Abstract— An energy-efﬁcient (1.02 pJ/b) and high-speed (20.83 Gb/s/wire, 417 Gb/s/mm) link for ultra-short reach (USR) applications (up to 6-dB channel loss at the Nyquist frequency of 12.5 GHz) is presented. Correlated non-return to zero (CNRZ) signaling with low sensitivity to inter-symbol interference (ISI) has been developed to improve the link budget. In addition to high pin efﬁciency (5b6w: 5 bits over 6 wires), the proposed signaling method provides very good resistance against common-mode and crosstalk noise sources, allowing for dense routing. A very wide- band (1.3 GHz) jitter tracking mechanism has been employed to reduce the sensitivity of the system to random and deterministic jitter and relax design constraints on transmitter. A slicer with low kick-back noise and a circuit topology well matched to the continuous-time linear equalizer (CTLE) has been designed to provide both high input sensitivity and Process, supply Voltage, and Temperature (PVT) variations tolerance. The link operates with more than 22-ps (42.5% UI) eye opening at BER = 1E-15. Calibration loops are running in background for quadra- ture mismatch error correction, clock and data alignment (CDA), and offset removal. Index Terms— Clock forwarding, correlated non-return to zero (CNRZ), CNRZ-5, correlated NRZ, energy efﬁciency, inter- symbol interference (ISI) ratio, ISI sensitivity, ISI, multi-chip module (MCM), multi-wire signaling, NRZ, orthogonal multi- wire signaling, pin efﬁciency, SerDes, transceiver, ultra-short reach (USR), wideband PLL, wireline. I. I NTRODUCTION H IGH-SPEED and low-power data movement is one of the most crucial problems in high-performance com- puting (HPC) systems. The performance of many advanced Manuscript received August 22, 2019; revised October 31, 2019 and December 16, 2019; accepted December 16, 2019. This article was approved by Associate Editor Brian Ginsburg. This work was supported by Kandou Bus. (Corresponding author: Armin Tajalli.) Armin Tajalli was with Kandou Bus, 1015 Lausanne, Switzerland. He is now with the Electrical and Computer Engineering Department, The University of Utah, Salt Lake City, UT 84112 USA (e-mail: armin.tajalli@utah.edu). Mani Bastani Parizi, Dario Albino Carnelli, Chen Cao, Davide Gorret, Amit Gupta, Christopher Hall, Ahmed Hassanin, Klaas L. Hofstra, Brian Holden, Ali Hormati, John Keay, Yohann Mogentale, Victor Perrin, John Phillips, Sumathi Raparthy, Amin Shokrollahi, David Stauffer, Richard Simpson, Andrew Stewart, Giuseppe Surace, Omid Talebi Amiri, Emanuele Truffa, Anton Tschank, Roger Ulrich, Christoph Walter, and Anant Singh are with Kandou Bus, 1015 Lausanne, Switzerland, and also with Kandou Bus, Northampton, U.K. Kiarash Gharibdoust is with EM Microelectronics, Marin, Switzerland. Color versions of one or more of the ﬁgures in this article are available online at http://ieeexplore.ieee.org. Digital Object Identiﬁer 10.1109/JSSC.2019.2962655 Fig. 1. Advanced MCM structure, in which USR link is used to move data. applications, such as Machine Learning (ML), Artiﬁcial Intel- ligence (AI), and autonomous vehicles, depend on the efﬁ- ciency and speed of communication among different units in a heterogeneous computing system [1]. Recently, multi-chip module (MCM) architecture has been exploited to simultaneously improve yield and reduce the overall cost. Depicted in Fig. 1, MCM technology enables integrating multiple dies fabricated in various process tech- nologies with different functionalities on a common substrate. In addition to energy efﬁciency and performance, yield and its associated cost implications are becoming more of a concern for large-size chips that can be mitigated using MCM tech- nology. High-speed data movement inside advanced modular multi-die package and system-in-package (SiP) applications is a key enabling technology to keep pace with Moore’s law [2] and substantially improve speed and energy efﬁciency of the next-generation HPC systems. Currently, there is signiﬁcant research focused on increas- ing the communication capacity for extremely short reach (XSR) and ultra-short reach (USR) applications. To improve the data transfer bandwidth (BW), TSMC has introduced chip-on-wafer-on-substrate (CoWos) technology, transferring 8 Gb/s/wire with 0.56-pJ/b consumption over 500-μm chan- nels [3]. In addition to a low level of dissipation, this link achieves a very high BW density (1.6 Tb/s/mm) using 40-μm bump pitch. As another example, a multi-chip architecture has 0018-9200 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.