Hindawi Publishing Corporation EURASIP Journal on Audio, Speech, and Music Processing Volume 2010, Article ID 523791, 15 pages doi:10.1155/2010/523791 Research Article Correlation-Based Amplitude Estimation of Coincident Partials in Monaural Musical Signals Jayme Garcia Arnal Barbedo 1 and George Tzanetakis 2 1 Department of Communications, FEEC, UNICAMP C.P. 6101, CEP: 13.083-852, Campinas, SP, Brazil 2 Department of Computer Science, University of Victoria, Columbia, Canada V8W 3P6 Correspondence should be addressed to Jayme Garcia Arnal Barbedo, jbarbedo@gmail.com Received 12 January 2010; Revised 29 April 2010; Accepted 5 July 2010 Academic Editor: Mark Sandler Copyright © 2010 J. G. A. Barbedo and G. Tzanetakis. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. This paper presents a method for estimating the amplitude of coincident partials generated by harmonic musical sources (instruments and vocals). It was developed as an alternative to the commonly used interpolation approach, which has several limitations in terms of performance and applicability. The strategy is based on the following observations: (a) the parameters of partials vary with time; (b) such a variation tends to be correlated when the partials belong to the same source; (c) the presence of an interfering coincident partial reduces the correlation; and (d) such a reduction is proportional to the relative amplitude of the interfering partial. Besides the improved accuracy, the proposed technique has other advantages over its predecessors: it works properly even if the sources have the same fundamental frequency, it is able to estimate the ﬁrst partial (fundamental), which is not possible using the conventional interpolation method, it can estimate the amplitude of a given partial even if its neighbors suﬀer intense interference from other sources, it works properly under noisy conditions, and it is immune to intraframe permutation errors. Experimental results show that the strategy clearly outperforms the interpolation approach. 1. Introduction The problem of source separation of audio signals has received increasing attention in the last decades. Most of the eﬀort has been devoted to the determined and overdeter- mined cases, in which there are at least as many sensors as sources [1–4]. These cases are, in general, mathematically more treatable than the underdetermined case, in which there are fewer sensors than sources. However, most real- world audio signals are underdetermined, many of them having only a single channel. This has motivated a number of proposals dealing with this kind of problem. Most of such proposals try to separate speech signals [5–9], speech from music [10–12], or a singing voice from music [13]. Only recently methods trying to deal with the task of separating diﬀerent instruments in monaural musical signals have been proposed [14–18]. One of the main challenges faced in music source sepa- ration is that, in real musical signals, simultaneous sources (instruments and vocals) normally have a high degree of correlation and overlap both in time and frequency, as a result of the underlying rules normally followed by western music (e.g., notes with integer ratios of pitch intervals). The high degree of correlation prevents many existing statistical methods from being used, because those normally assume that the sources are statistically independent [14, 15, 18]. The use of statistical tools is further limited by the also very common assumption that the sources are highly disjoint in the time-frequency plane [19, 20], which does not hold when the notes are harmonically related. An alternative that has been used by several authors is the sinusoidal modeling [21–23], in which the signals are assumed to be formed by the sum of a number of sinusoids whose parameters can be estimated [24]. In many applications, only the frequency and amplitude of the sinusoids are relevant, because the human hearing is relatively insensitive to the phase [25]. However, estimating the frequency in the context of musical signals is often challenging, since the frequencies do not remain steady with time, especially in the presence of vibrato, which manifests