A 8x5 Gb/s Source-Synchronous Receiver with Clock Generator Phase Error Correction Ankur Agrawal, Pavan Kumar Hanumolu* and Gu-Yeon Wei Harvard University, Cambridge, MA 02138, *Oregon State University, Corvallis, OR 97331 Abstract— This paper describes the design and implementation of a 8x5Gb/s source-synchronous receiver in a 0.13μm CMOS technology. The receiver employs a cascaded-DLL architecture that avoids filtering of the jitter on the received clock to enhance jitter tolerance bandwidth. A technique is proposed to correct phase spacing mimatch in DLLs that reduces the error standard deviations by more than 40% and improves receiver timing margins. I. I NTRODUCTION The need for high I/O bandwidth in multi-chip digital systems has led to the widespread use of parallel links. These links are generally source synchronous, with a clock sent along with the data signals for receiver timing recovery. As data rates increase, successful data recovery in the presence of jitter requires precise positioning of the sampling clock. Receivers need to perform per-pin skew compensation [1] while preserving the correlation in the jitter between the transmitted clock and data. Source-synchronous receivers often use multi-phase clock generators to drive phase interpolators [2]. Multiple clock phases are also required when interleaved samplers are em- ployed to easily accomodate high off-chip data-rates. Phase locked loops (PLL) or delay locked loops (DLL) can be used to generate multi-phase clocks. While the phase filtering action of a PLL reduces the jitter correlation between the incoming clock and data, DLLs are susceptible to systematic and random phase offsets and mismatch that can significantly reduce timing margins and degrade achievable data rates. If these phase errors can be corrected, DLLs are a better choice than PLLs for multi-phase clock generation in source- synchronous receivers. This paper presents a cascaded-DLL architecture for re- ceivers that avoids any phase filtering in the path of the received clock and incorporates techniques to correct for phase spacing errors in DLLs. It requires neither phase interpolators nor the distribution of multi-phase clocks over long on-chip wires. The next section describes the trade-offs between using DLLs and PLLs in source-synchronous receivers. II. SOURCE SYNCHRONOUS RECEIVER DESIGN CONSIDERATIONS Fig. 1 shows the architecture of a general source- synchronous transceiver. The transmitter sends a parallel word of data along with a clock to the receiver. To save clock power and avoid jitter amplification, often the frequency of the transmitted clock is stepped down and a multiplying PLL Fig. 1. General Source Synchronous Transceiver or DLL is used in the receiver to step the frequency back up. This clock is then distributed to each of the receiver slices using either clock buffers or passive distribution [2], [3]. The receiver slices need to perform skew compensation to correct for flight time variations over the PCB traces. If multiple data bits are transmitted in each RxClk cycle, the receivers also need to generate multiple clock phases to sample the incoming data. A single PLL can be used for multi-phase clock generation and clock de-skew [4]. Alternatively, a combination of a DLL and a phase interpolator can be used to perform the two tasks independent of each other. PLLs have a low-pass transfer function from the phase of the reference clock to the output clock and, thus, filter out the middle and higher frequency jitter on the received clock. On the other hand, DLLs have a nearly all-pass phase transfer function, and are able to preserve the correlation between the jitter on the incoming data and received clock, resulting in good jitter tolerance over a wide frequency range. Recently reported designs [2], [5] have avoided the use of PLLs in the path of the received clock to achieve wide jitter tracking bandwidth. Our test chip, the details of which shall be discussed in the following sections, enables a direct comparison between the jitter tracking bandwidths of PLL and DLL based timing recovery. Fig. 2 plots the jitter tolerance curves for BER < 10 -9 for 2 different configurations of the test chip. The “DLL- only” case is the typical configuration of the test chip (as described in Section III), where a quarter rate clock is directly fed to the local recievers. In the “PLL/DLL” case, a sub-rate clock is first multiplied on-chip using a PLL to the desired frequency. The jitter tolerance bandwidth for the DLL-only case is >100 MHz and is limited by the difference in the on-chip path length between the data and clock signals. In