IEEE SIGNAL PROCESSING LETTERS, VOL. 14, NO. 8, AUGUST 2007 521 Automatic Classification of Musical Genres Using Inter-Genre Similarity Ulas¸ Ba˘ gcı, Student Member, IEEE, and Engin Erzin, Senior Member, IEEE Abstract—Musical genre classification is an essential tool for music information retrieval systems and it has potential to become a highly demanded application in various media platforms. Two important problems of the automatic musical genre classification are feature extraction and classifier design. In this letter, we propose two novel classifiers using inter-genre similarity (IGS) modeling and investigate the use of dynamic timbral texture features in order to improve automatic musical genre classi- fication performance. Inter-genre similarity is modeled over hard-to-classify samples of the musical genre feature space. In the classification, samples within inter-genre similarity class are eliminated to reduce inter-genre confusion and to improve genre classification performance. Experimental results show that the proposed classifiers provide better classification rates than the existing methods. Index Terms—Inter-genre similarity (IGS) modeling, Mel-fre- quency cepstral coefficients (MFCC), musical genre classification. I. INTRODUCTION G ENRE classification is crucial for the categorization of musical pieces. Automatic musical genre classification has important applications in professional media production, radio stations, audio-visual archive management, entertainment and recently on the Internet. Although musical genre classifi- cation is done manually and it is hard to precisely define the specific content of a musical genre, it is generally agreed that audio signals of music belonging to the same genre contain certain common characteristics since they have similar har- monic and rhythmical language. These common characteristics have motivated recent research activities to improve automatic musical genre classification [1]–[8]. The problem is inherently challenging as the human identification rates after listening 3 s samples are reported to be around 70% [9]. Feature extraction and classifier design are two important problems of the automatic musical genre classification. Timbral texture features, which represent short-time spectral infor- mation, rhythmic content features including beat and tempo, and pitch content features are thoroughly investigated in [1]. High-level musical features including instrumentation, texture, rhythm, dynamics, pitch statistics, melody and chords are Manuscript received October 7, 2006; revised December 10, 2006. The as- sociate editor coordinating the review of this manuscript and approving it for publication was Dr. Steve Renais. U. Ba˘ gcı was with the College of Engineering, Koç University, Istanbul 34450, Turkey. He is now with the University of Nottingham, Nottingham NG7 2RD, U.K. (e-mail: uxb@cs.nott.ac.uk). E. Erzin is with the College of Engineering, Koç University, Istanbul 34450, Turkey (e-mail: eerzin@ku.edu.tr). Digital Object Identifier 10.1109/LSP.2006.891320 investigated in [5]. Another novel feature extraction method is proposed in [3], in which local and global information of music signals are captured by computation of histograms on their Daubechies wavelets coefficients for the characterization of genre, emotion, style, and similarity informations. A com- parison of human and automatic musical genre classification is presented in a recent work [4]. Mel-frequency cepstral co- efficients (MFCC) are used for modeling and discrimination of music signals [1]. Linear prediction cepstrum coefficients, zero-crossing rates, Mel-frequency cepstral coefficients, spec- tral power, amplitude envelope, spectrum flux, and cepstrum flux are investigated as features to characterize music content for automatic classification of pure and vocal music [7]. Various classifiers such as K-nearest neighbor (KNN) and Gaussian mixture model (GMM) classifiers [1], [3], construction of decision trees [10], multi-class AdaBoost [8] and support vector machines (SVM) [3], [7] are employed for automatic musical genre classification. In another study, radial basis function (RBF) networks with a combination of unsupervised and supervised initialization methods are used for fast training and classification of musical genre [6]. In this study, we extend the boosting idea, which was pro- posed in [11], to use inter-genre similarity information for the design of discriminative classifiers. In order to improve discrim- ination between musical genre classes, hard-to-classify samples are used to model an inter-genre similarity class. During the classification, the samples within the inter-genre similarity class are discarded to reduce inter-genre confusion and to improve the classification rates. Experimental results over the musical genre dataset compiled by Tzanetakis [1] show that the pro- posed method provides better classification rates than the ex- isting methods. The organization of the letter includes a brief description of the feature extraction in Section II. The discriminative musical genre classification which uses the inter-genre similarity is dis- cussed in Section III. Experimental results are provided in Sec- tion IV followed by discussions and conclusions. II. FEATURE EXTRACTION In the literature, features for musical genre classification are examined under two groups: timbral texture and rhythmic content features. Timbral features represent short-time proper- ties, such as spectral information and zero-crossings. Rhythmic content features represent long-term properties including beat, tempo, pitch content, etc. The instrumentation of a music performance has a significant influence in genre recognition. The timbre of constituent sound sources is reflected in the spectral distribution of a music signal. Hence, spectral features 1070-9908/$25.00 © 2007 IEEE Authorized licensed use limited to: UNIVERSITY OF NOTTINGHAM. Downloaded on February 4, 2010 at 09:45 from IEEE Xplore. Restrictions apply.