TOP-DOWN STRATEGIES IN PARAMETER SELECTION OF SINUSOIDAL MODELING OF
AUDIO
Toni Hirvonen and Athanasios Mouchtaris
Department of Computer Science, University of Crete, and
Institute of Computer Science
Foundation for Research and Technology - Hellas (FORTH-ICS)
Heraklion, Crete, Greece
ABSTRACT
Sinusoidal modeling of audio requires the model parameters to
be selected by analyzing the original signal spectrum. This paper
proposes two improvements in sinusoidal selection by consider-
ing how psychoacoustic masking curves can be calculated using a
top-down strategy in certain situations. First, a non-iterative com-
ponent selection method to be used in combination with an added
residual signal is presented. Tests indicate computational gain and
quality increase when the method is used with a noise-synthesized
residual. Secondly, the estimation of the masking curve in binaural
listening when signals are panned is considered. Tests show that
knowledge of the degree of panning is beneſcial when heavy pan-
ning is applied to simultaneously rendered audio object signals.
Index Terms— audio coding, sinusoidal modeling, psychoa-
coustic masking
1. INTRODUCTION
Sinusoidal modeling [1] of audio is one of the most popular para-
metric audio modeling methods, since it has the capacity to rep-
resent an audio signal with good quality by only modeling a rela-
tively small number of spectral components. Some types of sounds
cannot be accurately represented by the sinusoidal model. For
these cases, an additional component is included (residual part),
which models the sinusoidal error signal, i.e. the difference be-
tween the actual signal and its modeled version [2].
In sinusoidal modeling, energetic masking (due to the human
auditory system) is utilized to determine the frequencies of the
most perceptually important components [3]. This is usually done
in an iterative manner; after selecting one component, the resid-
ual magnitude spectrum and the masking curve are updated. At
each step, the component frequency that minimizes a perceptual
distortion measure is selected. The remaining model parameters
(sinusoidal amplitudes and phases) are estimated from the original
signal after the frequency selection.
State-of-art approaches for sinusoidal selection such as [3] can
be thought of as implementing a bottom-up approach, where no
information of the signal reconstruction model or playback condi-
tions are exploited. At each step, the method maximizes the energy
This work has been funded in part by the Marie Curie TOK “ASPIRE”
grant, and in part by the PEOPLE-IAPP “AVID-MODE” grant, within the
6th and 7th European Community Framework Programs respectively.
of the spectrum that is covered by the masker of the new compo-
nent. No additional criteria, such as naturalness due to the use of
a residual model are considered in this process. The purpose of
this paper is to reſne the sinusoidal parameter selection process to
be more ſtting to certain applications in a more top-down manner.
The term “top-dow” in this paper implies that instead of using the
processing tools independently, we introduce a holistic approach
of the reproduction process and conditions that can be used to al-
ter the methods in a way which is beneſcial for these particular
conditions. This paper proposes two contributions regarding the
frequency selection in the sinusoidal model: (a) a non-iterative
process for estimating the perceptually important frequency com-
ponents, and (b) masking curve estimation when multiple signals
are to be panned before reproduction.
The former contribution is useful when a residual signal is
used. In this case, the synthesized energy is close to that of the
original signal. Consequently, we show that our non-iterative
method for sinusoidal component selection offers an improved
sound quality compared to current iterative methods, besides the
added computational efſciency. The latter contribution indicates
how the sinusoidal selection must be implemented in cases when
the audio signals are modeled before mixing occurs. The signif-
icance of this result relates to the current efforts in the upcoming
MPEG Spatial Audio Object Coding (SAOC) standard [4] and to
the possibility of applying the sinusoidal model in this context (see
for example [5]). In SAOC, the goal is to encode multiple audio
signals before they are mixed into a stereo or multichannel repro-
duction. This offers the advantage of mixing at the decoder, which
is expected to enable a variety of interactive audio applications.
2. NON-ITERATIVE COMPONENT SELECTION
This section discusses an improved method for component selec-
tion in the sinusoidal model, for the case when using an additive
residual signal. Unlike in Section 3, modeling of single-channel
audio is considered in this section.
2.1. Psychoacoustic Sinusoidal Matching Pursuit
Current state-of-the-art methods employ perceptual matching pur-
suit algorithms to determine the sinusoidal parameters of each
frame. In [3], an improved frequency masking model was com-
bined with Psychoacoustical Matching Pursuit (PAMP). At each
iteration i PAMP minimizes the perceptual distortion measure D
273 978-1-4244-4296-6/10/$25.00 ©2010 IEEE ICASSP 2010