Segmentation of Carnatic Music Items using KL2, GMM and CFB Energy Feature Krishnaraj Sekhar PV Dept. of Computer Science & Engineering Indian Institute of Technology, Madras pvkrajpv@gmail.com Sridharan Sankaran Dept. of Computer Science & Engineering Indian Institute of Technology, Madras sridharan.sankaran@gmail.com Hema Murthy Dept. of Computer Science & Engineering Indian Institute of Technology, Madras hema@cse.iitm.ac.in Abstract—Every Carnatic music concert is made up of many musical items. Every musical item has a lyrical composition (kriti) which can be optionally preceded by an ¯ al¯ apan¯ a segment. The duration of the ¯ al¯ apan¯ a along with the ag¯ a in which the ¯ al¯ apan¯ a has been rendered is a strong indication of an artist’s creativity and musical knowledge. Hence automatic segmentation of an item to extract the ¯ al¯ apan¯ a segment is of great value in qualitative assessment of a concert. Segmenting a musical item into ¯ al¯ apan¯ a and kriti has applications in musical retrieval. To find the boundary between ¯ al¯ apan¯ a and kriti, KL2 distance on Cent Filterbank Energy feature is used that locates change in timbre property. A GMM is used to verify the boundary. To further improve the accuracy of segmentation, rules based on musical domain knowledge are automatically applied. Using this approach a frame-level accuracy of 91.34% was obtained. I. I NTRODUCTION Structural segmentation of musical items directly from au- dio is a well-researched problem in the literature. With the ever increasing volume of available digital music, efficient storage, indexing and retrieval has become an issue. Segmentation of a musical item into its structural components has several applications. The segments can be used to index the audio for music summarisation, searching, browsing the audio (es- pecially when an item is very long) and recommendation. With content distribution through online portals such as iTunes.com, there is a definite need to allow users to listen to samples of various parts of the song before the song is purchased. Most approaches to segmentation have relied upon statistical methods. In [1] segmentation is proposed based on significant change in statistical properties. Sheh [2] proposes to use EM-based HMM for chord-based segmentation. Rhodes [3] incorporates the expected segment duration as an explicit prior probability distribution in a Bayesian framework for audio segmentation. Non machine learning approaches have primarily used time frequency features to identify segment boundaries. In [4], 12-dimensional chroma vectors are extracted at the frame- level, a similarity measure is performed between the segments and then the segments are agglomerated to determine the chorus segments. Serra [5] uses 12-dimensional enhanced- chroma features. Goto [4], extracts 12-dimensional chroma vectors at the frame-level, performs a similarity measure between the segments and then agglomerates the segments to determine the chorus segments. While these approaches have been attempted to segment Western music compositions, the task of segmenting an item into ¯ al¯ apan¯ a and kriti in Carnatic music involves differentiating between the textures of the music during ¯ al¯ apan¯ a and kriti. While the kriti segment involves both melody and rhythm and hence includes the participation of percussion instruments, the ¯ al¯ apan¯ a segment involves only melody contributed by lead performer and the accompanying violinist. In [6] segmentation of a full length concert or an individual item using applause as a boundary is attempted. While this approach is suitable when applauses are present and audible, alternate algorithms have to be explored. The objective of this paper is to come up with an approach to identify the ¯ alapana and kriti segments using Cent filter- bank based features, KL2 distance metric and GMM. The rest of this paper is organised as below. Section II explains the significance of ¯ al¯ apan¯ a and kriti segments in Carnatic music. Section III briefly outlines why MFCC is not suitable for our purpose and why CFB based feature was chosen. Section IV describes the various steps involved in extracting Cent Filterbank (CFB) energy feature. Section V describes the KL2 measure and its use in segmentation . Section VI describes in detail our segmentation approach. Section VII tabulates and discusses the results. II. CHARACTERISTICS OF SEGMENTS IN CARNATIC MUSIC Carnatic music is a classical music tradition widely per- formed in the southern part of India. ag¯ as (melodic modes), alas (repeating rhythmic cycle) and lyrics form the three pillars on which Carnatic music rests. A typical Carnatic music concert varies in duration from 90 mins to 3 hours and is made up of a succession of musical items. These items are standard lyrical compositions (kritis) with melodies set to specific ag¯ as and rhythm structure set to specific alas.The kritis can be optionally preceded by ¯ al¯ apan¯ a. ¯ Al¯ apan¯ a is a way of rendition to explore the features and beauty of a ag¯ a. ¯ Al¯ apan¯ a in Sanskrit means a dialog. Since ¯ al¯ apan¯ a is purely melodic with no lyrical and rhythmic components, it is best suited to bring out the various facets of a ag¯ a. The performer brings out the beauty of a ag¯ a using creativity and internalised knowledge about the grammar of