CONTOUR-BASED MELODY REPRESENTATION: AN ANALYTICAL STUDY Sumantra Dutta Roy, Preeti Rao and Ameya S. Galinde Department of Electrical Engineering, IIT Bombay, Powai, Mumbai - 400 076. {sumantra, prao, ameya}@ee.iitb.ac.in ABSTRACT In this paper, we identify parameters crucial to the performance of a Query By Humming (QBH) system, and present an analytical approach to de- termining optimal values of such parameters. Ex- isting systems use heuristically chosen parameters - our analytical results are in accordance with such values. We present results of experimentation with simulated data, as well as an actual melody database of a QBH system. 1. INTRODUCTION Query by Humming (hereafter, QBH) has emerged as an important area of research in audio-based search engines, and building eﬃcient Human- Computer Interfaces (HCI) [1], [2], [3], [4], [5], [6], [7]. Thus far, related work in the area has only considered the use of a 3-, 5- or 7-level pitch contour [8], [6], [7], [9]. - this is usually based on empirical studies. To the best of our knowl- edge, there has been no attempt to derive such an estimate using analytical methods. This paper presents an analytical study of QBH systems. We develop a new coeﬃcient to evaluate the perfor- mance of a QBH system. Our results of exper- iments with analytical models, as well as an ac- tual melody database give results consistent with those in existing literature. While this paper con- siders the special case of uniform tune lengths, our current research involves extending this to a Dy- namic Programming-based framework to handle the most general case. 2. THE REPRESENTATION SCHEME We assume all notes in a musical piece lie only on a quantized set of absolute pitch values (in Hz). The ratio of any two adjacent pitch values is 2 1/12 . The interval between two such adjacent values is a ‘semitone’, and an ensemble of 12 such pitch values or notes is an ‘octave’ [3]. Without loss of gener- ality, we may assume a logarithmic scale for such notes with the least count on this scale correspond- ing to a semitone. (This ensures equal distances along the logarithmic scale, corresponding to any two adjacent semitones) Our representation of a user query or any melody consists of a sequence of numbers where each number signiﬁes the ‘dis- tance’ between the current note and the previous note, or more precisely, the number of semitones lying in between these two adjacent notes in the musical piece [8]. As an example, we can consider a (fairly real- istic) case where a note will be no more than an octave apart from its previous note i.e., it can ei- ther ascend by a maximum of 1 octave (+12 semi- tones) or descend by a maximum of 1 octave (-12 semitones). This is quite a reasonable assumption, since most musical pieces proceed with a steady rise or fall of notes and even a pitch change of one octave between adjacent notes is rarely encoun- tered. We may represent this span of relative notes by the set [−12, +12]. For a casual singer’s notes sung slightly oﬀ-key, we approximate the corre- sponding relative note to the closest one on our scale. As an illustrative example, let us consider the following sequence of 4 notes (all notes lying in the same octave) in Western music notation (all notes in the same scale, say Scale C Major): [Mi