PERIODIC SIGNAL EXTRACTION
WITH GLOBAL AMPLITUDE AND PHASE MODULATION
FOR MUSIC SIGNAL DECOMPOSITION
Mahdi Triki, Dirk T.M. Slock
Eurecom Institute
2229 route des Crˆ etes, B.P. 193, 06904 Sophia Antipolis Cedex, FRANCE
Email: triki,slock @eurecom.fr
ABSTRACT
A key building block in music transcription and indexing oper-
ations is the decomposition of the music signal into notes. We
model a note signal as a periodic signal with (slow) global varia-
tion of amplitude (reflecting attack, sustain, decay) and frequency
(limited time warping). The bandlimited variation of global am-
plitude and frequency gets expressed through a subsampled repre-
sentation and parameterization of the corresponding signals. As-
suming additive white Gaussian noise, a Maximum Likelihood ap-
proach is proposed for the estimation of the model parameters and
the optimization is performed in an iterative (cyclic) fashion that
leads to a sequence of simple least-squares problems. Particular at-
tention is paid to the estimation of the basic periodic signal, which
can have a non-integer period, and the estimation of the amplitude
signal with guaranteed positivity.
1. INTRODUCTION
Sinusoidal model based music analysis/synthesis has received con-
siderable interest in the computer music community [4, 5, 6]. The
sinusoidal transform, originally developed by Quatieri and McAulay
[3], represents a signal as a sum of discrete time-varying sinusoids
or partials:
(1)
The estimation of the model parameters is typically carried out
using a short-time Fourier transform (STFT) with a fixed analy-
sis frame size and a fixed stride between frames. The sinusoids
are extracted by peak-picking in the STFT magnitude spectrum.
Intermediate values are obtained by interpolation. A fundamen-
tal problem faced by the traditional sinusoidal-model based tech-
niques, and which arises due to the STFT, is smearing of the fre-
quency response [8, 7]. In fact, over the period of a single analysis
frame, the algorithm estimates the amplitude, frequency and phase
of any sinusoids it believes to be present. Because of the near log-
arithmic scale of pitch perception, we need very long windows in
order to accurately estimate the pitch of low frequency partials.
Eur´ ecom’s research is partially supported by its industrial partners:
Hasler Foundation, Swisscom, Thales Communications, ST Microelec-
tronics, CEGETEL, France T´ el´ ecom, Bouygues Telecom, Hitachi Europe
Ltd. and Texas Instruments. The work reported herein was also partially
supported by the SIEPIA project of the French RIAM network (Recherche
et Innovation en Audiovisuel et Multim´ edia).
On the other hand, the time resolution of these parameters is only
as fine as the window length, itself. And, since the music signal
is strongly non-stationary , it is not always possible to find a good
tradeoff between time and frequency resolution. Also, determining
the sinusoid parameters from the STFT peak amplitude and phase
only works well for high frequency resolution, high SNR and in
the absence of modulation.
Another drawback of these techniques is that they ignore the
harmonic structure of the music signal. In fact, they consider the
signal as a mixture of a finite number of arbitrary sinusoids, and
not as a periodic signal. For treating periodic signals, the state of
the art is limited to the estimation of pure periodic signals with
period equal to an integer number of samples [1, 2]. In these ref-
erences, the authors propose a Maximum Likelihood approach to
analyze pure periodic signals. They show that the resulting pro-
cedure can be interpreted as a signal projection onto suitable sub-
spaces.
This paper extends the results of those references, and tries
to merge the modulated sinusoidal modeling and the periodic sig-
nal analysis techniques, by considering periodic signals with non-
integer period and global amplitude variation and time warping.
The use of this model gives a compromise between reality and a
parsimonious parameterization. Indeed, global amplitude varia-
tion reflects mostly attack, sustain, and decay of the whole note
signal. Whereas, the global time warping allow the capture of vi-
brato and sliding notes. With an eye on future extensions to poly-
phonic sounds, the method should be able to work in fairly low
SNR. Hence it is important to have parsimonious parameteriza-
tions in order to limit the estimation noise. The motivation for the
proposed model is to provide a good compromise between approx-
imation noise and estimation noise.
In music, the nominal frequency of a note is known. So we assume
an analysis exploring the hypothesis of the presence of a note at
any possible nominal note frequency. However, we do not treat
the harmonics of a note signal separately as a simple filter bank
approach would do (this is basically the state of the art in music
signal analysis). Rather, the energy in all harmonics is exploited
jointly through the treatment of the complete periodic signal, in or-
der to robustify the detection of the note signal and the estimation
of its modulation characteristics. The Global Modulation (GM) as-
sumption helps the separation of note signals that have harmonics
in common.
This paper is organized as follows. In section (2), the global
modulation model is presented. The extraction procedure will then
be derived in section (3). Performance of the algorithm is evalu-
III - 233 0-7803-8874-7/05/$20.00 ©2005 IEEE ICASSP 2005
➠ ➡