NOVEL AUDIO FEATURES FOR CAPTURING TEMPO SALIENCE IN MUSIC RECORDINGS Balaji Thoshkahna 2 , Meinard M¨ uller 1 , Venkatesh Kulkarni 1,2 , Nanzhu Jiang 1 1 International Audio Laboratories Erlangen 2 Fraunhofer Institute for Integrated Circuits IIS, Erlangen, Germany balaji.thoshkahna@iis.fraunhofer.de, meinard.mueller@audiolabs-erlangen.de ABSTRACT In music compositions, certain parts may be played in an improvisa- tional style with a rather vague notion of tempo, while other parts are characterized by having a clearly perceivable tempo. Based on this observation, we introduce in this paper some novel audio features for capturing tempo-related information. Rather than measuring the specific tempo of a local section of a given recording, our objective is to capture the existence or absence of a notion of tempo, a kind of tempo salience. By a quantitative analysis within an Indian music scenario, we demonstrate that our audio features capture the aspect of tempo salience well, while being independent of continuous fluc- tuations and local changes in tempo. Index Terms— Audio, Music, Tempo, Salience, Analysis, Seg- mentation, Classification 1. INTRODUCTION Tempo and beat are fundamental aspects of music, and the auto- mated extraction of such information from audio recordings consti- tutes one of the central and well-studied research areas in music sig- nal processing [1, 2, 3, 4, 5, 6, 7, 8, 9]. When assuming a steady beat and a single global tempo, many automated methods yield ac- curate tempo estimates and beat tracking results [10, 11]. However, the task becomes much more difficult when one deals with music with weak note onsets and local tempo changes [6]. A discussion of difficult examples for beat tracking can also be found in [12, 11]. Instead of extracting tempo and beat information explicitly, various spectrogram-like representations have been proposed for visualizing tempo-related information over time. Such mid-level representations include tempograms [13, 14, 15], rhythmograms [16], or beat spec- trograms [4, 17]. Cyclic versions of time-tempo representations, which possess a high degree of robustness to pulse level switches, have been introduced in [15, 17]. For certain types of music, however, there is hardly a notion of tempo or beat. For example, this is often the case for music that is played in an improvisational style. Even within a single com- position, there may be parts with a rather vague notion of tempo, while other parts are characterized by having a clearly perceivable tempo and rhythm. As an example, let us consider the song “In the year 2525” by Zager and Evans, which has the musical struc- ture IV1V2V3V4V5V6V7BV8O, see Figure 1a. The song starts with a slow contemplative intro, which is represented by the I -part. The eight verse sections of the song, which are represented by the V - parts, have a clear rhythm with a well-defined tempo. Between the seventh verse V7 and eighth verse V8, the improvisational style of the beginning is resumed in the bridge part B—some kind of melan- cholic retrospect. Fig. 1. (a) Musical structure of the song “In the year 2525” by Za- ger and Evans. (b) Tempogram representation of the recording. (c) Tempo salience feature. (d) Manual annotation of parts with a clear rhythm and parts with a vague tempo. In this paper, we deal with an aspect of tempo which we refer to as tempo salience. Rather than measuring the concrete tempo of a local section of a given recording, our objective is to capture the exis- tence or absence of a notion of tempo. For our computations we start with a mid-level representation known as cyclic tempogram [15]. As our main technical contribution, we then derive several novel one- dimensional audio features that locally measure the tempo salience, see Figure 1c for an example. Note that it is not our objective to determine the tempo itself. Instead the salience features should only express the degree to which there may be any sense of a perceiv- able tempo or not—irrespective of possible abrupt tempo changes or continuous tempo fluctuations. The remainder of this paper is structured as follows. In Sec- tion 2, we further motivate our salience concept by considering a sce- nario from Indian Carnatic music. We review in Section 3 the con- cept of tempogram representations and then describe in Section 4 the technical details for deriving our novel salience features from these representations. Finally, in Section 5, we report on a some quantita- tive evaluations that indicate the potential of our salience features. 2. MOTIVATING APPLICATION SCENARIO As a motivating scenario for our features, we consider a music genre that goes beyond Western music. Carnatic music plays an important role in the culture of South India, where huge music festivals with hundreds of concerts are held. Many of the large-scale compositions performed at such occasions consist of several contrasting parts [18]. A typical example for structural parts of a Canatic music composi-