998 IEEE TRANSACTIONS ON AUDIO, SPEECH, ANDLANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Cascaded Trellis-Based Rate-Distortion Control Algorithm for MPEG-4 Advanced Audio Coding Cheng-Han Yang and Hsueh-Ming Hang Abstract—In this paper, a few low-complexity and high-perfor- mance rate-distortion control algorithms for MPEG-4 Advanced Audio Coding (AAC) are proposed. One key element in producing good quality compressed audio particularly at medium and low rates is a high performance rate-distortion controller in the audio encoder. Although the trellis-based rate-distortion con- trol algorithms previously proposed can achieve a praiseworthy performance, their computational complexity is extremely high. Therefore, for practical applications, it is very desirable to achieve a similar performance at a much lower complexity. Two types of techniques are proposed in this paper to reduce the computational burden of the trellis-based algorithms. One is splitting a very heavy calculation stage into two sequential steps with much less computation. The other is reducing the candidates in the trellis for parameter search. Together, when applicable, our approach achieves a similar coding performance (audio quality) but requires less than 1/1000 complexity in computation. Index Terms—Advanced audio coding (AAC), audio coding, rate-distortion control, trellis-based search. I. INTRODUCTION I N THE last decade, analog audio has been gradually re- placed by high-ﬁdelity digital audio. Moreover, to meet the demand of efﬁcient transmission and storage of digital audio for diversiﬁed multimedia applications, many high-ef- ﬁcient audio coding schemes have been developed, such as MPEG-1/2/4 audio coding standards and Dolby AC-3 [1]. The MPEG-4 advanced audio coding (AAC) is one of the most re- cent-generation audio coders speciﬁed by the ISO/IEC MPEG standards committee [2]. The core part of the MPEG-4 AAC is based on the MPEG-2 AAC technology. The MPEG-4 AAC features a number of additional coding tools and coder conﬁgu- rations comparing to MPEG-2 AAC [3], [4]. Consequently, the MPEG-4 AAC is a very efﬁcient audio compression algorithm aiming at a wide variety of different applications, such as Internet, wireless, and digital broadcast arenas. One critical element contributing to a good AAC encoder is selecting two sets of coding parameters properly, the scale factor (SF) and Huffman codebook (HCB) in the rate-distortion (R-D) loop. Because encoding these coding parameters is in- terband dependent in AAC, the proper choice of their values to maximize the coding performance becomes a difﬁcult problem. Manuscript received April 11, 2004; revised February 25, 2005. This work was supported by the National Science Council, Taiwan, R.O.C., under Grant NSC-91-2219-E-009-011. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Ravi P. Ramachandran. The Authors are with the Department of Electronics Engineering, Na- tional Chiao Tung University, Hsinchu 300, Taiwan, R.O.C. (e-mail: u8911831.ee89g@nctu.edu.tw; hmhang@mail.nctu.edu.tw). Digital Object Identiﬁer 10.1109/TSA.2005.857789 Two-loop search (TLS) [5] is a commonly known R-D control algorithm, which is also used in the MPEG-4 AAC veriﬁca- tion model (VM) [6]. VM is the encoder software developed by the MPEG committee to verify the coding syntax. However, as pointed out by [7] and [8], the poor choice of coding param- eters in the TLS algorithm is one shortcoming of the current MPEG-4 AAC VM and, therefore, its compression efﬁciency is lower than expected particularly at low rates. Two trellis-based high-performance R-D control algorithms for AAC are proposed by [7] and [8]. One distinct feature of these R-D control algorithms, as comparing to TLS, is that both bit rate and distortion are controlled simultaneously and the in- terband relationship of coding parameters, SF and HCB, is also counted in choosing their values. These R-D control algorithms are formulated as Viterbi search through the trellis diagram [9], [10] to ﬁnd the optimal coding parameters and, therefore, are called trellis-based optimization. As discussed in [8], the sub- jective quality of the trellis-based optimization scheme is sig- niﬁcantly better than that of TLS. However, its computational complexity is extremely high and thus it is not suitable for prac- tical applications, such as real-time encoding with power con- straint. Therefore, it is very desirable to achieve a similar per- formance at a much lower complexity. In this paper, two types of techniques are introduced to speed up the trellis-based optimization procedure. In the ﬁrst type of fast algorithms, we break the combined SF and HCB parameter selection stage into two sequential steps and thus call it cas- caded trellis-based optimization. In the second type of fast al- gorithms, by observing the audio signal characteristics and sta- tistics we develop a few rules that can reduce signiﬁcantly the number of candidates in the trellis. These two techniques are fairly independent. Together, the overall computational com- plexity is dramatically reduced while the coding performance degradation is small. The organization of this paper is as follows. In Section II, a brief overview of the typical MPEG-4 AAC encoder is pro- vided. The proposed cascaded trellis-based R-D control algo- rithm and its variations are described in Section III. The pro- posed fast trellis search schemes are described in Section IV. The complexity analysis of the proposed R-D control algorithms and the simulation results with quality evaluation are summa- rized in Section V. II. OVERVIEW ON AAC ENCODER The block diagram of a typical MPEG-4 AAC encoder is shown in Fig. 1. The time-domain audio signals are ﬁrst converted to their frequency-domain representation (spectral 1558-7916/$20.00 © 2006 IEEE