ISSCC 2003 / SESSION 4 / CLOCK RECOVERY AND BACKPLANE TRANSCEIVERS / PAPER 4.3 4.3 A Second-Order Semi-Digital Clock Recovery Circuit Based on Injection Locking M.-J. Edward Lee 1 , William J. Dally 1,2 , John Poulton 1 , Trey Greer 1 , John Edmondson 1 , Ramin Farjad-Rad 1 , Hiok-Tiaq Ng 1 , Rohit Rathi 1 , Ramesh Senthinathan 1 1 Velio Communications, Milpitas, CA 2 Stanford University, Stanford, CA Clock recovery circuits are among the most critical components in communication systems. A dual-loop architecture, in which the frequency synthesizer and the clock aligner are separated, has been used extensively due to the conflicting needs to sup- press jitter accumulation and filter noisy input [1]. Among dif- ferent frequency synthesis architectures, a multiplying delay- locked loop (MDLL) is advantageous when a clean reference clock is available since the oscillator noise is accumulated only over one reference clock cycle before being reset by the clean source [2]. However, this instantaneous correction produces large cycle-to-cycle jitter and duty-cycle distortion (jointly called clock distortion hereafter) for the downstream clock and data paths. The clock aligner, however, requires a low bandwidth for input jitter filtering. Its control loop is often implemented as a first order system using digital circuits that are flexible, easy to implement, and robust against noise. However, a first-order sys- tem is limited in its ability to filter input jitter and track fre- quency offset simultaneously. Furthermore, for an infinite phase range the timing vernier is often implemented using multi-phase interpolation that is expensive in terms of area and power [1]. In this paper, a clock recovery circuit that overcomes these limita- tions is described. Figure 4.3.1 shows the top-level architecture consisting of a fre- quency synthesizer, a jitter-filtering timing vernier and a second- order digital phase controller. To reduce clock distortion, the out- put of the MDLL is injected into a slave replica oscillator, acting as a first-order low-pass filter on the phase error. This is shown on the right of Fig. 4.3.1, where I c is the current bias for the delay element and I m is the maximum current bias for the injection devices. The injection coefficient is defined as I m /(I c +I m ) and is roughly the amount of corrected phase per injection divided by the phase error between the master and slave oscillators. I 0 is described in the next paragraph and is assumed to be I m for ease of understanding. Since injection occurs at the multiplied fre- quency, the error correction bandwidth is made high to suppress jitter accumulation. Yet, any high-frequency jitter, such as refer- ence clock injection, is attenuated and spread out over multiple clock cycles. For example, if the clock frequency is 1GHz and the injection strength is 1/10, the error correction bandwidth is about 20MHz and the clock distortion is attenuated by 90%. The injection strength must be strong enough to cover the lock range of the slave oscillator over the statistical variation of devices. Injection locking is also used in the timing vernier to vary the phase of the slave oscillator with respect to the master oscillator. In Fig. 4.3.1, I 0 can be decreased (increased) to advance (delay) the slave oscillator with respect to the master oscillator. When the strength of B 1 (B 0 ) reaches I m , B 0 (B 1 ) can be inverted to fur- ther advance (delay) the slave oscillator. A full 360 O phase adjust- ment range is achieved. The master and slave oscillators are identical with slightly different connections to ensure frequency matching. For example, the master oscillator also contains injec- tion devices for reference clock injection and both the master and slave oscillators have identical buffers at the outputs for equal loading. For clarity, these devices are not shown in Fig. 4.3.1. A phase control unit accepts binary early and late indications from the phase detector, performs some filtering, and generates the appropriate current bias for the timing vernier. Previous implementations of the phase control unit are first-order in that they simply count the number of early (late) and delay (advance) the clock phase when a threshold is reached. This implementa- tion trades off frequency tolerance with input jitter filtering. With a counter size of N, the phase lag between the optimal sam- ple point (where early crosses late) and the clock, assuming uni- formly distributed jitter, is (∆fxJPN)/(2d), where ∆f is the fre- quency offset, P is the number of phase steps per unit interval (UI), d is the edge density, and J is the p-p input jitter. On the other hand, the counter size also affects the amount of phase wander due to insufficient input filtering. The phase wander probability with a uniformly distributed input jitter of 0.5UI using various phase counter sizes is shown in Fig. 4.3.2, calcu- lated using a Markov chain. A larger counter size leads to a smaller phase wander but a larger phase lag. To overcome this limitation, a frequency control loop is introduced that advances or delays the clock continuously, as shown in Fig. 4.3.3, where the frequency of bclk is half of the bit rate. The up and down sig- nals of the frequency control loop are added to those from the phase control loop. A frequency generator produces three pulsed signals whose frequencies divided by P are 122ppm, 61ppm and 30.5ppm of the bit rate. A saturating counter selects these sig- nals to produce the desired frequency. This circuit allows 240ppm frequency offset but the largest frequency offset the phase counter sees is 30.5ppm. This enables the utilization of a larger phase counter to reduce the phase wander without com- promising the frequency tolerance. The frequency pre-counter size and phase counter size are programmable for different applications. This circuit, implemented in a 0.18μm CMOS technology with a 1.8V supply, is used in several high bandwidth communication devices containing as many as 140 3.125Gb/s serial I/Os with per lane clock and data recovery (CDR). The CDR and 1:8 deserial- izer consume 80mW at 3.125Gb/s in the worst case and occupy an area of 1mm by 160μm. Figure 4.3.4 shows the jitter tolerance of the CDR with and without the second-order frequency loop at 2.5Gb/s. The transceiver is running at a 200ppm frequency offset with a 23b pseudo-random bit sequence. The improvement at low and high frequencies with the second-order loop is 0.1-0.2UI, consistent with the phase lag at 200ppm. The fluctuation between 500kHz and 3MHz is due to the peaking behavior of the second-order loop. Figure 4.3.5 shows the timing vernier step sizes over four different lanes. The measurement is taken over a full 800ps clock cycle (two bit times) that contains 128 steps. The maximum step size is 22ps and the minimum -5ps. The large peaks and the negative steps on the plot are due to the binary encoding of the timing vernier current bias. It has subsequently been changed to thermometer encoding to reduce this problem. Figure 4.3.6 shows the jitter transfer from the reference clock to the output for an MDLL with and without the slave oscillator. The reference clock frequency is 125MHz and the multiplication factor is 8. The addition of the slave oscillator creates a 20MHz pole in the jitter transfer, indicating that the injection strength is about 1/10. This implies that the clock distortion is reduced by 90%. References [1] K.-Y. K. Chang et al., “A 2Gb/s/pin Asymmetric Serial Link,” Proc. IEEE Symposium of VLSI Circuits, pp. 216-217, June 1998. [2] R. Farjad-rad, et al., “A 0.2-2GHz 12mW Multiplying DLL for Low- Jitter Clock Synthesis in Highly-Integrate Data-Communication Chips,” Digest of Technical Papers, IEEE ISSCC, February 2002. • 2003 IEEE International Solid-State Circuits Conference 0-7803-7707-9/03/$17.00 ©2003 IEEE