IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 1, JANUARY 2011 111
Musical Instrument Classification Using
Individual Partials
Jayme Garcia Arnal Barbedo and George Tzanetakis, Member, IEEE
Abstract—In a musical signals, the spectral and temporal con-
tents of instruments often overlap. If the number of channels is at
least the same as the number of instruments, it is possible to apply
statistical tools to highlight the characteristics of each instrument,
making their identification possible. However, in the underdeter-
mined case, in which there are fewer channels than sources, the
task becomes challenging. One possible way to solve this problem
is to seek for regions in the time and/or frequency domains in
which the content of a given instrument appears isolated. The
strategy presented in this paper explores the spectral disjointness
among instruments by identifying isolated partials, from which a
number of features are extracted. The information contained in
those features, in turn, is used to infer which instrument is more
likely to have generated that partial. Hence, the only condition for
the method to work is that at least one isolated partial exists for
each instrument somewhere in the signal. If several isolated par-
tials are available, the results are summarized into a single, more
accurate classification. Experimental results using 25 instruments
demonstrate the good discrimination capabilities of the method.
Index Terms—Feature extraction, partialwise instrument classi-
fication, spectral disjointness, underdetermined mixtures.
I. INTRODUCTION
T
HE identification of the instruments that compose a mu-
sical signal has received increasing attention in the last
years. Such an interest is fed by the potential benefits that an
accurate instrument classifier can bring to other digital audio
applications. In particular, musical genre classification can be
greatly improved if the instruments present in a given song are
known, since this information can be used to narrow down the
set of potential musical genres. Sound source separation algo-
rithms can also explore such information, particularly if they
deal with underdetermined signals. In this case, the knowledge
about the instruments can be used to create instrument-specific
rules to improve the quality of the sound source separation.
Early work in the area was mainly devoted to the identifi-
cation of instruments in monophonic signals. This problem is,
in general, less challenging than the polyphonic case, since the
Manuscript received September 04, 2009; revised December 09, 2009. Date
of publication March 11, 2010; date of current version October 01, 2010. This
work was supported by Foreign Affairs and International Trade Canada under a
Post-Doctoral Research Fellowship Program (PDRF). The associate editor co-
ordinating the review of this manuscript and approving it for publication was
Dr. Dan Ellis.
J. G. A. Barbedo was with the Department of Computer Science, University of
Victoria, Victoria, BC V8W 3P6 , Canada . He is now with the Department of
Communications, FEEC, UNICAMP C.P. 6101, CEP: 13.083-852, Campinas,
SP, Brazil (e-mail: jgab@decom.fee.unicamp.br).
G. Tzanetakis is with the Department of Computer Science, University of
Victoria, Victoria, BC V8W 3P6 , Canada (e-mail: gtzan@cs.uvic.ca).
Digital Object Identifier 10.1109/TASL.2010.2045186
instrument to be classified is isolated from the interference of
any other sound source. Most of those proposals deal with gen-
eral instruments [1]–[11], while a few others deal with specific
cases, like classification of woodwinds [12], [13] and discrimi-
nation between piano and guitar [14].
In the last years, a number of strategies capable of dealing
with polyphonic musical signals have been proposed. Most of
them have some important limitations.
— Limited number of instruments: some of the methods pro-
posed in the literature only work and/or were only tested
for a small (six or less) set of instruments (e.g., [15]–[22]).
— Low accuracy: in some cases the accuracy is below 50%
even considering few instruments (e.g., [19], [23]).
— Instrument combinations set a priori: in this case, the
strategies try to classify the signals according to prede-
fined combinations of instruments; hence, they fail if the
mixture has a combination of instruments that was not
considered in the training (e.g., [24], [25]).
— Polyphony limited to duets: some strategies can only deal
with two simultaneous instruments (e.g., [26], [27]).
Thus, despite the clear advancements achieved in the last
years, there are still many limitations that prevent instrument
identification tools to be more widely used. This paper presents
a simple and reliable strategy to identify instruments in poly-
phonic musical signals that overcomes some of the main limi-
tations faced by its predecessors. The identification uses a ma-
jority decision based upon pairwise comparisons of instrument
likelihoods. A related but more complex approach was used by
Essid et al. [5] to classify solo musical phrases. The method
presented here is basically a system in which majority rules are
successively applied, as briefly described in the following.
In real musical signals, simultaneous sources (instruments
and vocals) normally have a high degree of correlation and
overlap both in time and frequency, as a result of the underlying
rules normally followed by western music (e.g., notes with
integer ratios of pitch intervals). This can make the identifi-
cation of instruments challenging. However, one can expect
to find at least some unaffected partials throughout the signal,
which can be explored to provide cues about the corresponding
instrument. As a result of such an observation, the proposed al-
gorithm extracts features individually for each partial that does
not collide with any other partial (isolated partials). Each pair of
instruments is characterized by a particular set of nine features,
selected from a complete set of 34 features. Each partial is
identified with one of the pair of instruments using a linear
classifier. If such a feature is greater than a given threshold,
it represents a certain instrument; otherwise, it represents the
other one. A first majority rule is applied by summarizing the
results of the nine features; as a result, each pair of instruments
1558-7916/$26.00 © 2010 IEEE