Hindawi Publishing Corporation
EURASIP Journal on Audio, Speech, and Music Processing
Volume 2010, Article ID 523791, 15 pages
doi:10.1155/2010/523791
Research Article
Correlation-Based Amplitude Estimation of Coincident Partials
in Monaural Musical Signals
Jayme Garcia Arnal Barbedo
1
and George Tzanetakis
2
1
Department of Communications, FEEC, UNICAMP C.P. 6101, CEP: 13.083-852, Campinas, SP, Brazil
2
Department of Computer Science, University of Victoria, Columbia, Canada V8W 3P6
Correspondence should be addressed to Jayme Garcia Arnal Barbedo, jbarbedo@gmail.com
Received 12 January 2010; Revised 29 April 2010; Accepted 5 July 2010
Academic Editor: Mark Sandler
Copyright © 2010 J. G. A. Barbedo and G. Tzanetakis. This is an open access article distributed under the Creative Commons
Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.
This paper presents a method for estimating the amplitude of coincident partials generated by harmonic musical sources
(instruments and vocals). It was developed as an alternative to the commonly used interpolation approach, which has several
limitations in terms of performance and applicability. The strategy is based on the following observations: (a) the parameters of
partials vary with time; (b) such a variation tends to be correlated when the partials belong to the same source; (c) the presence
of an interfering coincident partial reduces the correlation; and (d) such a reduction is proportional to the relative amplitude of
the interfering partial. Besides the improved accuracy, the proposed technique has other advantages over its predecessors: it works
properly even if the sources have the same fundamental frequency, it is able to estimate the first partial (fundamental), which is not
possible using the conventional interpolation method, it can estimate the amplitude of a given partial even if its neighbors suffer
intense interference from other sources, it works properly under noisy conditions, and it is immune to intraframe permutation
errors. Experimental results show that the strategy clearly outperforms the interpolation approach.
1. Introduction
The problem of source separation of audio signals has
received increasing attention in the last decades. Most of the
effort has been devoted to the determined and overdeter-
mined cases, in which there are at least as many sensors as
sources [1–4]. These cases are, in general, mathematically
more treatable than the underdetermined case, in which
there are fewer sensors than sources. However, most real-
world audio signals are underdetermined, many of them
having only a single channel. This has motivated a number
of proposals dealing with this kind of problem. Most of such
proposals try to separate speech signals [5–9], speech from
music [10–12], or a singing voice from music [13]. Only
recently methods trying to deal with the task of separating
different instruments in monaural musical signals have been
proposed [14–18].
One of the main challenges faced in music source sepa-
ration is that, in real musical signals, simultaneous sources
(instruments and vocals) normally have a high degree of
correlation and overlap both in time and frequency, as a
result of the underlying rules normally followed by western
music (e.g., notes with integer ratios of pitch intervals). The
high degree of correlation prevents many existing statistical
methods from being used, because those normally assume
that the sources are statistically independent [14, 15, 18].
The use of statistical tools is further limited by the also very
common assumption that the sources are highly disjoint in
the time-frequency plane [19, 20], which does not hold when
the notes are harmonically related.
An alternative that has been used by several authors is
the sinusoidal modeling [21–23], in which the signals are
assumed to be formed by the sum of a number of sinusoids
whose parameters can be estimated [24].
In many applications, only the frequency and amplitude
of the sinusoids are relevant, because the human hearing is
relatively insensitive to the phase [25]. However, estimating
the frequency in the context of musical signals is often
challenging, since the frequencies do not remain steady with
time, especially in the presence of vibrato, which manifests