UNCORRECTED PROOF Design and analysis of optimal adaptive de-jitter buffers q Gagan L. Choudhury a, * , Robert G. Cole b,1 a AT&T Labs, Room D5-3C21, Middletown, NJ 07748, USA b AT&T Labs, 330 St Johns, 2nd Floor, Havre de Grace, MD 21078, USA Received 8 August 2003; accepted 8 August 2003 Abstract In order to transfer voice or some other application requiring real-time delivery over a packet network, we need a de-jitter buffer to eliminate delay jitters. An important design parameter is the depth of the de-jitter buffer since it influences two important parameters controlling voice quality, namely voice-path delay and packet loss probability. In this paper, we propose and study several schemes for optimally adjusting the depth of the de-jitter buffer. In addition to de-jitter-buffer depth adjustments within a call, the initial value and rates of changes of the de-jitter buffer depth are allowed to depend on the class of the call and are adaptively adjusted (upwards or downwards) for every new call based on voice-path delay and packet loss probability measurements over one or more previous calls. Parameter adjustments are geared towards either (a) minimizing voice-path delay while maintaining a packet loss probability objective, or (b) maximizing R-factor, an objective measure of voice quality that depends both on the voice-path delay and the packet loss probability. Using simulation models and measured packet delay traces, it is shown that adaptive schemes perform better than static ones and adaptive schemes with learning perform better than ones without learning. q 2003 Published by Elsevier B.V. Keywords: Adaptive de-jitter buffer algorithm with learning; Voice-call quality; End-to-end delay; Packet loss probability; Call classification 1. Introduction A major challenge in transporting voice, video or more generally any application requiring real-time delivery over a packet network (using IP, ATM or some other packet-based protocol), is dealing with the delay jitter introduced by the packet network. Since real-time presentations cannot tolerate delay jitter, a de-jitter buffer needs to be used to eliminate it. In this paper, we will mainly consider voice calls although some of the work would also apply to a more general real-time delivery. An important design parameter is the depth of the de-jitter buffer since it influences two important parameters controlling voice quality, namely end- to-end voice-path delay and packet loss probability. The de- jitter-buffer depth is the maximum amount of time a packet spends in the de-jitter buffer before being played out. If it is too small, then many packets would miss the play-out deadline thereby increasing the packet loss probability. On the other hand, if it is too large, then the end-to-end voice-path delay would increase. The key challenge is to choose a de-jitter-buffer depth that is a happy middle ground between too much packet loss and too much voice-path delay. A second aspect is static versus adaptive adjustment of play-out instant. In a static scheme, the play-out instant is set once and for all at the arrival of the first packet of the call. In an adaptive scheme, the play-out instant may be shifted during the call based on the arrival instants of previous packets and thereby can improve the delay or packet loss behavior. However, each time the play-out instant is shifted, it is necessary to either inject silence or drop packets and thereby impact the voice call quality. For this reason, it may be preferable to use a static scheme in some cases since it truly eliminates the delay jitter. A compromise between a static and an adaptive approach is to adaptively compute an ideal play-out instant with the arrival of every packet but use a static play-out instant (thereby avoiding delay jitters) for most of the call. The static play- out instant is synchronized to the adaptive ideal play-out instant at a few selected points in the call thereby limiting the impact on voice-call quality. For calls with voice activity detection and silence suppression, the ideal 0140-3664/$ - see front matter q 2003 Published by Elsevier B.V. doi:10.1016/j.comcom.2003.08.018 Computer Communications xx (0000) xxx–xxx www.elsevier.com/locate/comcom q Presented in part at SPIE’s ITCOM-2002 Conference in July, 2002. 1 Tel./fax: þ1-410-939-8732 * Corresponding author. Tel.: þ 1-732-420-3721; fax: þ1-732-368-1919. E-mail addresses: gchoudhury@att.com (G.L. Choudhury), rgcole@ att.com (R.G. Cole). COMCOM 2448—2/12/2003—11:38—SUREKHA—87760— MODEL 5 ARTICLE IN PRESS 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112