VirtualQueue: A Technique for Packet Voice Stream Reconstruction Norival Figueira nfigueir @ nortelnetworks .com Bay Architecture Lab Nortel Networks 4401 Great America Parkway Santa Clara, CA 95054 Abstract Statistical multiplexing in packet-switched networks creates problems for packetized voice streams by introducing variable delays on delivered packets. The resulting jitter needs to be $1- tered so that received voice packets can be reconstructed as a continuous stream at the receiver. One common approach to reconstruction is to play back the received voice data after a delay offset from the departure time at the source of the packet stream. While the added delay helps jilter jittec one cannot introduce too much delay, otherwise, interactiveness suffers. This paper presents a new technique tojind the necessary delay offset (or playback delay) to recreate the original voice data stream. This technique gives the user control over the fraction of packets that should arrive in time to be played back so that the added playback delay can be effectively minimized. 1. Introduction A significant technical problem created by the integration of voice and data in a packet-switched network is the recon- struction of voice data at a receiver as a continuous stream [2,5, 8,9]. This is generally done by playing back the voice data after a delay offset from the departure time at the source of the voice data stream. This delay offset is called the playback delay, and it is an estimate for the upper bound on delay a packet may experience. If the network imposes a bounded delay, a fixed play-back delay may suffice. A fixed play-back delay could be set to a value large enough so that all packets arrive in time to be played back, but small enough so that humans would not be disturbed by the added delay. Some techniques have been proposed to guarantee upper bounds for delay in packet-switched networks [ 1, 3, 4, 6, 10, 113. Unfortunately, these guarantees are not yet available as a common service in present networks. Therefore (at least until such services become available), an alternative approach is to define a play-back delay that meets some more modest performance goal. For example, the user may be willing to accept some audio quality degradation in exchange for a smaller play-back delay. Thus, instead of trying to play all the received voice data, the play-back delay would be set to a value such that the likelihood that some minimum frac- This work was supported by grants from NASA and rhe UC MICRO program. Joseph Pasquale pasquale@cs.ucsd.edu Computer Systems Laboratory Dept. of Computer Science and Engineering University of California, San Diego La Jolla, CA 92093-01 14 tion of all packets would amve in time to be played is maxi- mized. Packets that amved late would simply be discarded. We present an improved technique to find the necessary play-back delay to effectively reconstruct the original voice data stream. The technique allows the user to select the fraction of packets that should arrive in time to be played. The selected performance is then used in the calculation of the play-back delay. Our technique is characterized as tolerant and adaptive (regarding determination of the play-back delay), according to the characterization space of real-time networking applications by Clark et a1 [I]. 2. Model of Delivery Consider a system where audio data is generated by a source, packetized, sent over a network, received at the destina- tion site, and sent to the audio device queue to be played. In this model, the delay d, of packet n includes all delays the packet experiences from its generation time to its arrival time at the audio device queue. The audio device queue arrival time a, of packet P, is related to the delay d, and generation time g, by a, = g , + d,,. The audio device queue interamval time of packet n (n 5 1) is then defined by in = a, - a,-l or in = T, + d,, - d,-,, where T, is the inter-generation time. In general, each packet arrives with a different delay. The variation in this delay is called delay jitter. As the packets arrive, the receiver process has to reconstruct the received stream of packets into a continuous stream of audio samples, and send it to the audio device to be played. To do this, a target play-back delay must be found to compensate for the delay jit- ter. Each arriving packet is delayed by the difference between the desired play-back delay and the delay experienced by the packet. The playback time of a packet is then defined as the time the packet will be played. Packets already delayed by more than the target play-back delay are said to be late and are generally discarded. Late packets cause the audio device to starve, creating silent events. 3. The VirtualQueue Technique VirtualQueue provides information on the amount of time a packet is late or ahead with respect to the (implicitly) calcu- lated play-back delay. To simplify the presentation, we assume that the packet voice stream is ordered, without packet losses, and continuous (i.e., silent intervals are not eliminated from the packet stream; however, the VirtualQueue technique is usable 0-7695-0253-9/99$10.00 0 1999 IEEE 312