32 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 13, NO. 1, JANUARY 2005
Predicting and Preventing Unmasking Incurred
in Coded Audio Post-Processing
Michael M. Goodwin, Member, IEEE, Aaron J. Hipple, Member, IEEE, and Brian Link, Member, IEEE
Abstract—In modern audio compression algorithms, the
masking properties of the auditory system are exploited to im-
prove the coding gain, namely, quantization noise is introduced
in the signal in time-frequency regions where it will be masked.
However, since signal modifications change the characteristics of
the masking regions, degradations may result if a decoded audio
signal is modified. In this paper, we explain and demonstrate how
modifying an audio signal can result in the unmasking of signal
components that were imperceptible in the unmodified signal. We
consider both pitch-shifting and linear filtering modifications;
synthetic and natural audio examples are provided to verify the
unmasking phenomenon. We discuss how modification of decoded
audio may lead to unmasking of quantization noise, describe
conditions for which such unmasking may occur, and propose
a method for adjusting the masking threshold employed in the
audio coder to make the decoded signal robust to quantization
noise unmasking for a given set of signal modifications.
Index Terms—Audio post-processing, masking, noise shaping,
perceptual audio coding, pitch-shifting, psychoacoustics, quantiza-
tion noise, unmasking.
I. INTRODUCTION
A
UDIO signal modification has been of continuing interest
for numerous applications ranging from speech enhance-
ment to music synthesis. With the growing use of compressed
formats as a medium for storing and distributing audio, it is nec-
essary to consider the effects of compression on the quality of
any subsequent modification of the decoded signal.
Modern audio compression schemes rely heavily on the
masking properties of the auditory system; masking principles
are used to identify imperceptible signal components that need
not be coded and to determine the extent of quantization noise
that can be added to the signal without degrading the perceived
quality. While the decoded signal may be perceptually indis-
tinguishable from the original, a modification of the decoded
signal may be perceptually different from a modification of the
original signal since the masking properties are not necessarily
preserved by the modification. Fig. 1 shows a simple frame-
work for comparing the respective signals to assess the effect
of compression on the modification.
Manuscript received March 14, 2001; revised December 4, 2003. The asso-
ciate editor coordinating the review of this manuscript and approving it for pub-
lication was Dr. Bryan George.
M. M. Goodwin is with the Creative Advanced Technology Center, Scotts
Valley, CA 95066, USA (e-mail: mgoodwin@atc.creative.com).
A. J. Hipple is with McDSP, Mountain View, CA 94043, USA (e-mail:
hipple@ieee.org).
B. Link is with Dolby Laboratories, Inc., San Francisco, CA 94103, USA
(e-mail: link@ieee.org).
Digital Object Identifier 10.1109/TSA.2004.834456
Fig. 1. Comparison framework for evaluating the effect of compression on the
quality of audio modifications.
This paper is organized as follows. In Section II, the basic
principles of auditory masking are reviewed; we discuss simul-
taneous masking and present a simple masking model which de-
scribes how masking properties change as a function of masker
frequency and intensity. These dependencies motivate consid-
eration of how masking varies when the signal is modified;
Sections III and IV deal with pitch-shifting modifications and
linear filtering, respectively. In Section III, we show how pitch-
shifting an audio signal can result in unmasking of signal com-
ponents that were imperceptible in the original signal. We pro-
vide synthetic examples to verify the phenomenon for various
basic masker-probe combinations; we further provide examples
where pitch-shifting of compressed audio unmasks quantization
noise. In Section IV, we consider linear filtering modifications,
again demonstrating the unmasking effect for both synthetic and
natural audio.
The demonstrations of quantization noise unmasking indicate
that these considerations are indeed relevant to the area of audio
coding, especially given the increasing use of post-processing
operations on decoded audio in modern multimedia systems. We
thus continue in Section V with a detailed discussion of quanti-
zation noise unmasking in decoded audio. Given a set of poten-
tial modifications, we present a rule for evaluating the robust-
ness of an audio codec as well as a corresponding approach for
adjusting the masking curve used in the coder so as to make the
decoded audio robust to quantization noise unmasking.
II. AUDITORY MASKING
One of the most basic concepts in psychoacoustics is the
threshold of hearing, which describes the minimum intensity
level at which a signal can be detected. Auditory masking, then,
is the process by which one signal (the masker) elevates the
threshold of detection for another signal (the probe); in the pres-
ence of the masker, the probe must have a greater intensity to be
detected.
Masking phenomena operate across both time and frequency
in a manner dictated by the time-frequency resolution of the au-
ditory system. It is common to divide masking processes into
three categories based on the relative temporal locations of the
1063-6676/$20.00 © 2005 IEEE