IEEE SIGNAL PROCESSING LETTERS, VOL. 14, NO. 8, AUGUST 2007 521 Automatic Classiﬁcation of Musical Genres Using Inter-Genre Similarity Ulas¸ Ba˘ gcı, Student Member, IEEE, and Engin Erzin, Senior Member, IEEE Abstract—Musical genre classiﬁcation is an essential tool for music information retrieval systems and it has potential to become a highly demanded application in various media platforms. Two important problems of the automatic musical genre classiﬁcation are feature extraction and classiﬁer design. In this letter, we propose two novel classiﬁers using inter-genre similarity (IGS) modeling and investigate the use of dynamic timbral texture features in order to improve automatic musical genre classi- ﬁcation performance. Inter-genre similarity is modeled over hard-to-classify samples of the musical genre feature space. In the classiﬁcation, samples within inter-genre similarity class are eliminated to reduce inter-genre confusion and to improve genre classiﬁcation performance. Experimental results show that the proposed classiﬁers provide better classiﬁcation rates than the existing methods. Index Terms—Inter-genre similarity (IGS) modeling, Mel-fre- quency cepstral coefﬁcients (MFCC), musical genre classiﬁcation. I. INTRODUCTION G ENRE classiﬁcation is crucial for the categorization of musical pieces. Automatic musical genre classiﬁcation has important applications in professional media production, radio stations, audio-visual archive management, entertainment and recently on the Internet. Although musical genre classiﬁ- cation is done manually and it is hard to precisely deﬁne the speciﬁc content of a musical genre, it is generally agreed that audio signals of music belonging to the same genre contain certain common characteristics since they have similar har- monic and rhythmical language. These common characteristics have motivated recent research activities to improve automatic musical genre classiﬁcation [1]–[8]. The problem is inherently challenging as the human identiﬁcation rates after listening 3 s samples are reported to be around 70% [9]. Feature extraction and classiﬁer design are two important problems of the automatic musical genre classiﬁcation. Timbral texture features, which represent short-time spectral infor- mation, rhythmic content features including beat and tempo, and pitch content features are thoroughly investigated in [1]. High-level musical features including instrumentation, texture, rhythm, dynamics, pitch statistics, melody and chords are Manuscript received October 7, 2006; revised December 10, 2006. The as- sociate editor coordinating the review of this manuscript and approving it for publication was Dr. Steve Renais. U. Ba˘ gcı was with the College of Engineering, Koç University, Istanbul 34450, Turkey. He is now with the University of Nottingham, Nottingham NG7 2RD, U.K. (e-mail: uxb@cs.nott.ac.uk). E. Erzin is with the College of Engineering, Koç University, Istanbul 34450, Turkey (e-mail: eerzin@ku.edu.tr). Digital Object Identiﬁer 10.1109/LSP.2006.891320 investigated in [5]. Another novel feature extraction method is proposed in [3], in which local and global information of music signals are captured by computation of histograms on their Daubechies wavelets coefﬁcients for the characterization of genre, emotion, style, and similarity informations. A com- parison of human and automatic musical genre classiﬁcation is presented in a recent work [4]. Mel-frequency cepstral co- efﬁcients (MFCC) are used for modeling and discrimination of music signals [1]. Linear prediction cepstrum coefﬁcients, zero-crossing rates, Mel-frequency cepstral coefﬁcients, spec- tral power, amplitude envelope, spectrum ﬂux, and cepstrum ﬂux are investigated as features to characterize music content for automatic classiﬁcation of pure and vocal music [7]. Various classiﬁers such as K-nearest neighbor (KNN) and Gaussian mixture model (GMM) classiﬁers [1], [3], construction of decision trees [10], multi-class AdaBoost [8] and support vector machines (SVM) [3], [7] are employed for automatic musical genre classiﬁcation. In another study, radial basis function (RBF) networks with a combination of unsupervised and supervised initialization methods are used for fast training and classiﬁcation of musical genre [6]. In this study, we extend the boosting idea, which was pro- posed in [11], to use inter-genre similarity information for the design of discriminative classiﬁers. In order to improve discrim- ination between musical genre classes, hard-to-classify samples are used to model an inter-genre similarity class. During the classiﬁcation, the samples within the inter-genre similarity class are discarded to reduce inter-genre confusion and to improve the classiﬁcation rates. Experimental results over the musical genre dataset compiled by Tzanetakis [1] show that the pro- posed method provides better classiﬁcation rates than the ex- isting methods. The organization of the letter includes a brief description of the feature extraction in Section II. The discriminative musical genre classiﬁcation which uses the inter-genre similarity is dis- cussed in Section III. Experimental results are provided in Sec- tion IV followed by discussions and conclusions. II. FEATURE EXTRACTION In the literature, features for musical genre classiﬁcation are examined under two groups: timbral texture and rhythmic content features. Timbral features represent short-time proper- ties, such as spectral information and zero-crossings. Rhythmic content features represent long-term properties including beat, tempo, pitch content, etc. The instrumentation of a music performance has a signiﬁcant inﬂuence in genre recognition. The timbre of constituent sound sources is reﬂected in the spectral distribution of a music signal. Hence, spectral features 1070-9908/$25.00 © 2007 IEEE Authorized licensed use limited to: UNIVERSITY OF NOTTINGHAM. Downloaded on February 4, 2010 at 09:45 from IEEE Xplore. Restrictions apply.