998 IEEE TRANSACTIONS ON AUDIO, SPEECH, ANDLANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006
Cascaded Trellis-Based Rate-Distortion Control
Algorithm for MPEG-4 Advanced Audio Coding
Cheng-Han Yang and Hsueh-Ming Hang
Abstract—In this paper, a few low-complexity and high-perfor-
mance rate-distortion control algorithms for MPEG-4 Advanced
Audio Coding (AAC) are proposed. One key element in producing
good quality compressed audio particularly at medium and low
rates is a high performance rate-distortion controller in the
audio encoder. Although the trellis-based rate-distortion con-
trol algorithms previously proposed can achieve a praiseworthy
performance, their computational complexity is extremely high.
Therefore, for practical applications, it is very desirable to achieve
a similar performance at a much lower complexity. Two types of
techniques are proposed in this paper to reduce the computational
burden of the trellis-based algorithms. One is splitting a very
heavy calculation stage into two sequential steps with much less
computation. The other is reducing the candidates in the trellis
for parameter search. Together, when applicable, our approach
achieves a similar coding performance (audio quality) but requires
less than 1/1000 complexity in computation.
Index Terms—Advanced audio coding (AAC), audio coding,
rate-distortion control, trellis-based search.
I. INTRODUCTION
I
N THE last decade, analog audio has been gradually re-
placed by high-fidelity digital audio. Moreover, to meet
the demand of efficient transmission and storage of digital
audio for diversified multimedia applications, many high-ef-
ficient audio coding schemes have been developed, such as
MPEG-1/2/4 audio coding standards and Dolby AC-3 [1]. The
MPEG-4 advanced audio coding (AAC) is one of the most re-
cent-generation audio coders specified by the ISO/IEC MPEG
standards committee [2]. The core part of the MPEG-4 AAC
is based on the MPEG-2 AAC technology. The MPEG-4 AAC
features a number of additional coding tools and coder configu-
rations comparing to MPEG-2 AAC [3], [4]. Consequently, the
MPEG-4 AAC is a very efficient audio compression algorithm
aiming at a wide variety of different applications, such as
Internet, wireless, and digital broadcast arenas.
One critical element contributing to a good AAC encoder
is selecting two sets of coding parameters properly, the scale
factor (SF) and Huffman codebook (HCB) in the rate-distortion
(R-D) loop. Because encoding these coding parameters is in-
terband dependent in AAC, the proper choice of their values to
maximize the coding performance becomes a difficult problem.
Manuscript received April 11, 2004; revised February 25, 2005. This work
was supported by the National Science Council, Taiwan, R.O.C., under Grant
NSC-91-2219-E-009-011. The associate editor coordinating the review of this
manuscript and approving it for publication was Dr. Ravi P. Ramachandran.
The Authors are with the Department of Electronics Engineering, Na-
tional Chiao Tung University, Hsinchu 300, Taiwan, R.O.C. (e-mail:
u8911831.ee89g@nctu.edu.tw; hmhang@mail.nctu.edu.tw).
Digital Object Identifier 10.1109/TSA.2005.857789
Two-loop search (TLS) [5] is a commonly known R-D control
algorithm, which is also used in the MPEG-4 AAC verifica-
tion model (VM) [6]. VM is the encoder software developed
by the MPEG committee to verify the coding syntax. However,
as pointed out by [7] and [8], the poor choice of coding param-
eters in the TLS algorithm is one shortcoming of the current
MPEG-4 AAC VM and, therefore, its compression efficiency is
lower than expected particularly at low rates.
Two trellis-based high-performance R-D control algorithms
for AAC are proposed by [7] and [8]. One distinct feature of
these R-D control algorithms, as comparing to TLS, is that both
bit rate and distortion are controlled simultaneously and the in-
terband relationship of coding parameters, SF and HCB, is also
counted in choosing their values. These R-D control algorithms
are formulated as Viterbi search through the trellis diagram [9],
[10] to find the optimal coding parameters and, therefore, are
called trellis-based optimization. As discussed in [8], the sub-
jective quality of the trellis-based optimization scheme is sig-
nificantly better than that of TLS. However, its computational
complexity is extremely high and thus it is not suitable for prac-
tical applications, such as real-time encoding with power con-
straint. Therefore, it is very desirable to achieve a similar per-
formance at a much lower complexity.
In this paper, two types of techniques are introduced to speed
up the trellis-based optimization procedure. In the first type of
fast algorithms, we break the combined SF and HCB parameter
selection stage into two sequential steps and thus call it cas-
caded trellis-based optimization. In the second type of fast al-
gorithms, by observing the audio signal characteristics and sta-
tistics we develop a few rules that can reduce significantly the
number of candidates in the trellis. These two techniques are
fairly independent. Together, the overall computational com-
plexity is dramatically reduced while the coding performance
degradation is small.
The organization of this paper is as follows. In Section II,
a brief overview of the typical MPEG-4 AAC encoder is pro-
vided. The proposed cascaded trellis-based R-D control algo-
rithm and its variations are described in Section III. The pro-
posed fast trellis search schemes are described in Section IV.
The complexity analysis of the proposed R-D control algorithms
and the simulation results with quality evaluation are summa-
rized in Section V.
II. OVERVIEW ON AAC ENCODER
The block diagram of a typical MPEG-4 AAC encoder
is shown in Fig. 1. The time-domain audio signals are first
converted to their frequency-domain representation (spectral
1558-7916/$20.00 © 2006 IEEE