116 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 6, NO. 2, MARCH 1998 Design and Description of CS-ACELP: A Toll Quality 8 kb/s Speech Coder Redwan Salami, Member, IEEE, Claude Laﬂamme, Jean-Pierre Adoul, Fellow, IEEE, Akitoshi Kataoka, Member, IEEE, Shinji Hayashi, Takehiro Moriya, Member, IEEE, Claude Lamblin, Dominique Massaloux, St´ ephane Proust, Peter Kroon, Fellow, IEEE, and Yair Shoham, Member, IEEE Abstract— This paper describes the 8 kb/s speech coding al- gorithm G.729 which has been recently standardized by ITU-T. The algorithm is based on a conjugate-structure algebraic CELP (CS-ACELP) coding technique and uses 10 ms speech frames. The codec delivers toll-quality speech (equivalent to 32 kb/s ADPCM) for most operating conditions. This paper describes the coder structure in detail and discusses the reasons behind certain design choices. A 16-b ﬁxed-point version has been developed as part of Recommendation G.729 and a summary of the subjective test results based on a real-time implementation of this version are presented. Index Terms—Analysis-by-synthesis, speech coding. I. INTRODUCTION S INCE 1990, Study Group 15 (SG15) of the ITU-T has been involved in a standardization process for a speech coding algorithm at 8 kb/s. The main applications for this coder are 1) personal communication systems (PCS), 2) digital satellite systems, and 3) other applications such as packetized speech and circuit multiplexing equipment. The speech quality produced by this coder should be equivalent to that of 32 kb/s ADPCM (G.726) for most operating conditions. These conditions include clean and noisy speech, multiple encodings, level variations and nonspeech inputs. The intended wireless applications require that the coder is robust against channel errors. These errors could be either random or bursty, and the coder should be able to withstand them without introducing major annoying effects. Moreover, if the radio channels suffer from long fades, and complete frames are lost, the decoder should be able to conceal these missing frames with a minimal loss in speech quality. Two candidate algorithms were submitted: one from NTT [1]–[3] and the other from France Telecom CNET/University of Sherbrooke [4]. Both candidates were equivalent to (or better than) 32 kb/s ADPCM in most test conditions; however, they failed some conditions. At the March 1994 meeting of SG15, both proponents agreed to join their efforts to Manuscript received March 21, 1996; revised March 26, 1997. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. W. Bastiaan Kleijn. R. Salami, C. Laﬂamme, and J.-P. Adoul are with the Department of Electrical Engineering, University of Sherbrooke, P.Q., Canada J1K 2R1. A. Kataoka, S. Hayashi, and T. Moriya are with NTT, Tokyo, Japan. C. Lamblin, D. Massaloux, and S. Proust are with France Telecom CNET, Lannion, France. P. Kroon and Y. Shoham are with Bell Laboratories, Lucent Technologies, Murray Hill, NJ 07974 USA (e-mail: kroon@research.bell-labs.com). Publisher Item Identiﬁer S 1063-6676(98)01691-5. produce a coder that combines the best features of both algorithms, and to undertake further research to meet all performance requirements. At this time, AT&T joined these algorithmic optimization efforts. A ﬂoating-point version of the resulting coder was tested in January 1995, and it was accepted at the ITU-T meeting in February 1995. In the ﬁnal recommendation the algorithm is speciﬁed in terms of 16- b ﬁxed-point arithmetic. This version was tested in October 1995, and the recommendation was accepted for ratiﬁcation in November 1995 [5]. In this paper, we describe the important aspects of the algorithm, which is referred to as conjugate-structure algebraic CELP (CS-ACELP). Additional information can be found in [6]–[10]. The complete algorithm, including ANSI-C source code, can be found in [11]. This paper is organized as follows. In Section II we describe the coding algorithm in detail. In Section III we describe features of this coder that were included to increase the robustness against transmission errors. Section IV reports on the performance, and Section V discusses implementation aspects. Finally, the conclusions are given in Section VI. II. DESCRIPTION OF THE CS-ACELP SPEECH CODER The coder is based on a code-excited linear prediction (CELP) coding model [12]. In this model the locally decoded signal is compared against the original signal and the coder parameters are selected such that the mean-squared weighted error between the original and reconstructed signal is mini- mized. The CS-ACELP coder is designed to operate with an appropriately bandlimited signal sampled at 8000 Hz. The input and output samples are represented using 16-b linear PCM. The coder operates on frames of 10 ms, using a 5 ms look-ahead for linear prediction (LP) analysis. This results in an overall algorithmic delay of 15 ms. The encoding principle is shown in Fig. 1. After processing the 16-b input samples through a 140 Hz highpass ﬁlter, tenth-order LP analysis is performed, and the LP parameters are quantized in the line spectral pair (LSF) domain [13] with 18 b [7]. The input frame is divided into two subframes of 5 ms each. The use of subframes allows better tracking of the pitch and gain parameters and reduces the complexity of the codebook searches. The quantized and unquantized LP ﬁlter coefﬁcients are used for the second subframe while in the ﬁrst subframe interpolated LP ﬁlter coefﬁcients are used. For each subframe 1063–6676/98$10.00  1998 IEEE