32 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 13, NO. 1, JANUARY 2005 Predicting and Preventing Unmasking Incurred in Coded Audio Post-Processing Michael M. Goodwin, Member, IEEE, Aaron J. Hipple, Member, IEEE, and Brian Link, Member, IEEE Abstract—In modern audio compression algorithms, the masking properties of the auditory system are exploited to im- prove the coding gain, namely, quantization noise is introduced in the signal in time-frequency regions where it will be masked. However, since signal modiﬁcations change the characteristics of the masking regions, degradations may result if a decoded audio signal is modiﬁed. In this paper, we explain and demonstrate how modifying an audio signal can result in the unmasking of signal components that were imperceptible in the unmodiﬁed signal. We consider both pitch-shifting and linear ﬁltering modiﬁcations; synthetic and natural audio examples are provided to verify the unmasking phenomenon. We discuss how modiﬁcation of decoded audio may lead to unmasking of quantization noise, describe conditions for which such unmasking may occur, and propose a method for adjusting the masking threshold employed in the audio coder to make the decoded signal robust to quantization noise unmasking for a given set of signal modiﬁcations. Index Terms—Audio post-processing, masking, noise shaping, perceptual audio coding, pitch-shifting, psychoacoustics, quantiza- tion noise, unmasking. I. INTRODUCTION A UDIO signal modiﬁcation has been of continuing interest for numerous applications ranging from speech enhance- ment to music synthesis. With the growing use of compressed formats as a medium for storing and distributing audio, it is nec- essary to consider the effects of compression on the quality of any subsequent modiﬁcation of the decoded signal. Modern audio compression schemes rely heavily on the masking properties of the auditory system; masking principles are used to identify imperceptible signal components that need not be coded and to determine the extent of quantization noise that can be added to the signal without degrading the perceived quality. While the decoded signal may be perceptually indis- tinguishable from the original, a modiﬁcation of the decoded signal may be perceptually different from a modiﬁcation of the original signal since the masking properties are not necessarily preserved by the modiﬁcation. Fig. 1 shows a simple frame- work for comparing the respective signals to assess the effect of compression on the modiﬁcation. Manuscript received March 14, 2001; revised December 4, 2003. The asso- ciate editor coordinating the review of this manuscript and approving it for pub- lication was Dr. Bryan George. M. M. Goodwin is with the Creative Advanced Technology Center, Scotts Valley, CA 95066, USA (e-mail: mgoodwin@atc.creative.com). A. J. Hipple is with McDSP, Mountain View, CA 94043, USA (e-mail: hipple@ieee.org). B. Link is with Dolby Laboratories, Inc., San Francisco, CA 94103, USA (e-mail: link@ieee.org). Digital Object Identiﬁer 10.1109/TSA.2004.834456 Fig. 1. Comparison framework for evaluating the effect of compression on the quality of audio modiﬁcations. This paper is organized as follows. In Section II, the basic principles of auditory masking are reviewed; we discuss simul- taneous masking and present a simple masking model which de- scribes how masking properties change as a function of masker frequency and intensity. These dependencies motivate consid- eration of how masking varies when the signal is modiﬁed; Sections III and IV deal with pitch-shifting modiﬁcations and linear ﬁltering, respectively. In Section III, we show how pitch- shifting an audio signal can result in unmasking of signal com- ponents that were imperceptible in the original signal. We pro- vide synthetic examples to verify the phenomenon for various basic masker-probe combinations; we further provide examples where pitch-shifting of compressed audio unmasks quantization noise. In Section IV, we consider linear ﬁltering modiﬁcations, again demonstrating the unmasking effect for both synthetic and natural audio. The demonstrations of quantization noise unmasking indicate that these considerations are indeed relevant to the area of audio coding, especially given the increasing use of post-processing operations on decoded audio in modern multimedia systems. We thus continue in Section V with a detailed discussion of quanti- zation noise unmasking in decoded audio. Given a set of poten- tial modiﬁcations, we present a rule for evaluating the robust- ness of an audio codec as well as a corresponding approach for adjusting the masking curve used in the coder so as to make the decoded audio robust to quantization noise unmasking. II. AUDITORY MASKING One of the most basic concepts in psychoacoustics is the threshold of hearing, which describes the minimum intensity level at which a signal can be detected. Auditory masking, then, is the process by which one signal (the masker) elevates the threshold of detection for another signal (the probe); in the pres- ence of the masker, the probe must have a greater intensity to be detected. Masking phenomena operate across both time and frequency in a manner dictated by the time-frequency resolution of the au- ditory system. It is common to divide masking processes into three categories based on the relative temporal locations of the 1063-6676/$20.00 © 2005 IEEE