IEEE SIGNAL PROCESSING LETTERS, VOL. 14, NO. 8, AUGUST 2007 521
Automatic Classification of Musical
Genres Using Inter-Genre Similarity
Ulas¸ Ba˘ gcı, Student Member, IEEE, and Engin Erzin, Senior Member, IEEE
Abstract—Musical genre classification is an essential tool for
music information retrieval systems and it has potential to become
a highly demanded application in various media platforms. Two
important problems of the automatic musical genre classification
are feature extraction and classifier design. In this letter, we
propose two novel classifiers using inter-genre similarity (IGS)
modeling and investigate the use of dynamic timbral texture
features in order to improve automatic musical genre classi-
fication performance. Inter-genre similarity is modeled over
hard-to-classify samples of the musical genre feature space. In
the classification, samples within inter-genre similarity class are
eliminated to reduce inter-genre confusion and to improve genre
classification performance. Experimental results show that the
proposed classifiers provide better classification rates than the
existing methods.
Index Terms—Inter-genre similarity (IGS) modeling, Mel-fre-
quency cepstral coefficients (MFCC), musical genre classification.
I. INTRODUCTION
G
ENRE classification is crucial for the categorization of
musical pieces. Automatic musical genre classification
has important applications in professional media production,
radio stations, audio-visual archive management, entertainment
and recently on the Internet. Although musical genre classifi-
cation is done manually and it is hard to precisely define the
specific content of a musical genre, it is generally agreed that
audio signals of music belonging to the same genre contain
certain common characteristics since they have similar har-
monic and rhythmical language. These common characteristics
have motivated recent research activities to improve automatic
musical genre classification [1]–[8]. The problem is inherently
challenging as the human identification rates after listening 3 s
samples are reported to be around 70% [9].
Feature extraction and classifier design are two important
problems of the automatic musical genre classification. Timbral
texture features, which represent short-time spectral infor-
mation, rhythmic content features including beat and tempo,
and pitch content features are thoroughly investigated in [1].
High-level musical features including instrumentation, texture,
rhythm, dynamics, pitch statistics, melody and chords are
Manuscript received October 7, 2006; revised December 10, 2006. The as-
sociate editor coordinating the review of this manuscript and approving it for
publication was Dr. Steve Renais.
U. Ba˘ gcı was with the College of Engineering, Koç University, Istanbul
34450, Turkey. He is now with the University of Nottingham, Nottingham NG7
2RD, U.K. (e-mail: uxb@cs.nott.ac.uk).
E. Erzin is with the College of Engineering, Koç University, Istanbul 34450,
Turkey (e-mail: eerzin@ku.edu.tr).
Digital Object Identifier 10.1109/LSP.2006.891320
investigated in [5]. Another novel feature extraction method
is proposed in [3], in which local and global information of
music signals are captured by computation of histograms on
their Daubechies wavelets coefficients for the characterization
of genre, emotion, style, and similarity informations. A com-
parison of human and automatic musical genre classification
is presented in a recent work [4]. Mel-frequency cepstral co-
efficients (MFCC) are used for modeling and discrimination
of music signals [1]. Linear prediction cepstrum coefficients,
zero-crossing rates, Mel-frequency cepstral coefficients, spec-
tral power, amplitude envelope, spectrum flux, and cepstrum
flux are investigated as features to characterize music content
for automatic classification of pure and vocal music [7]. Various
classifiers such as K-nearest neighbor (KNN) and Gaussian
mixture model (GMM) classifiers [1], [3], construction of
decision trees [10], multi-class AdaBoost [8] and support
vector machines (SVM) [3], [7] are employed for automatic
musical genre classification. In another study, radial basis
function (RBF) networks with a combination of unsupervised
and supervised initialization methods are used for fast training
and classification of musical genre [6].
In this study, we extend the boosting idea, which was pro-
posed in [11], to use inter-genre similarity information for the
design of discriminative classifiers. In order to improve discrim-
ination between musical genre classes, hard-to-classify samples
are used to model an inter-genre similarity class. During the
classification, the samples within the inter-genre similarity class
are discarded to reduce inter-genre confusion and to improve
the classification rates. Experimental results over the musical
genre dataset compiled by Tzanetakis [1] show that the pro-
posed method provides better classification rates than the ex-
isting methods.
The organization of the letter includes a brief description of
the feature extraction in Section II. The discriminative musical
genre classification which uses the inter-genre similarity is dis-
cussed in Section III. Experimental results are provided in Sec-
tion IV followed by discussions and conclusions.
II. FEATURE EXTRACTION
In the literature, features for musical genre classification
are examined under two groups: timbral texture and rhythmic
content features. Timbral features represent short-time proper-
ties, such as spectral information and zero-crossings. Rhythmic
content features represent long-term properties including beat,
tempo, pitch content, etc. The instrumentation of a music
performance has a significant influence in genre recognition.
The timbre of constituent sound sources is reflected in the
spectral distribution of a music signal. Hence, spectral features
1070-9908/$25.00 © 2007 IEEE
Authorized licensed use limited to: UNIVERSITY OF NOTTINGHAM. Downloaded on February 4, 2010 at 09:45 from IEEE Xplore. Restrictions apply.