Audio Engineering Society Convention Paper Presented at the 124th Convention 2008 May 17–20 Amsterdam, The Netherlands The papers at this Convention have been selected on the basis of a submitted abstract and extended precis that have been peer reviewed by at least two qualified anonymous reviewers. This convention paper has been reproduced from the author’s advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio Engineering Society, 60 East 42 nd Street, New York, New York 10165-2520, USA; also see www.aes.org. All rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the Journal of the Audio Engineering Society. Adaptive Playout for VoIP based on the Enhanced Low Delay AAC Audio Codec Jochen Issing 1 , Nikolaus F¨ arber 1 , and Manfred Lutzky 1 1 Fraunhofer IIS, Erlangen, 91058, Germany Correspondence should be addressed to Jochen Issing (jochen.issing@iis.fraunhofer.de) ABSTRACT The MPEG-4 Enhanced Low Delay AAC (AAC-ELD) codec extends the application area of the Advanced Audio Coding (AAC) family towards high quality conversational services. Through the support of the full audio bandwidth at low delay and low bit rate, it offers excellent support for enhanced VoIP applications. In this paper we provide a brief overview of the AAC-ELD codec and describe how its codec structure can be exploited for IP transport. The overlapping frames and excellent error concealment make it possible to use frame insertion/deletion in order to adjust the playout time to varying network delay. A playout algorithm is proposed which estimates the jitter on the network and adapts the size of the de-jitter buffer in order to minimize buffering delay and late loss. Considering typical network conditions and the same average delay, it is shown that the playout algorithm can reduce the loss rate by more than one magnitude compared to fixed playout. 1. INTRODUCTION Voice over IP (VoIP) has been widely adopted in the past few years and begins to play a dominant role in todays telephone infrastructure. Besides cost reduc- tion, VoIP has the great potential to significantly im- prove speech quality through advances in compres- sion technology. Current VoIP applications mainly rely on speech codecs with relatively low audio qual- ity, limited to 3.5-7 kHz audio bandwidth. With the upcoming standardization of low delay perceptual audio codecs, like AAC-ELD, a new quality level can be achieved through full 22 kHz audio bandwidth, multi-channel support, and low content dependency. This new class of audio codecs fulfills the delay and bit rate requirements for conversational services and builds the basis for a new application area, termed Audio Communication . Considering the transmission over a best-effort IP