N-GRAM PATTERN MATCHING AND DYNAMIC PROGRAMMING FOR SYMBOLIC MELODY SEARCH Alexandra L. Uitdenbogerd RMIT University Melbourne, Victoria Australia ABSTRACT For this submission to MIREX 2007, we again provide a simple base-line for comparison against other algorithms for the task of symbolic melody matching against both monophonic and polyphonic collections of music. This year, we have included the n-gram-based matching tech- nique that is implemented by building an n-gram index of the query, which is then used to search through each melody or track within the collection. In addition we pro- vide the two dynamic programming algorithms submit- ted to MIREX 2006 for the same tasks. All three algo- rithms were statistically indistinguishable from the other algorithms submitted for the task, and were two orders of magnitude faster. 1 INTRODUCTION In the interests of providing continuity so that algorithms can be compared across years, we again submit algorithms to be used as a baseline for the evaluation of symbolic music matching. In our prior work on the topic of symbolic music matching [1, 2, 3] we found that the use of n-grams of length four to seven were about as effective as a dynamic programming-based matching technique (local alignment) for finding relevant melodies in a collection of polyphonic symbolically stored music. The melodies were matched using an intermediate form consisting of strings that en- coded the interval between adjacent notes, with a maxi- mum interval of one octave. Intervals larger than an oc- tave were mapped to the harmonically equivalent interval within an octave. The n-gram technique can be implemented in a vari- ety of ways. For MIREX 2005, an inverted index was used, which is theoretically the most efficient approach for a large number of queries. The index building cost, however, can be quite large. For this submission, we have chosen to submit an implementation that has low index- building costs but still has much faster search than dy- namic programming-based matching. c 2007 Austrian Computer Society (OCG). 2 TECHNIQUES All three techniques submitted to MIREX 2007 make use of the three-phase music matching model, which consists of melody extraction, melody standardisation, followed by melody matching [3, 4]. The melody extraction phase, includes notes that are the highest pitch at each instant. Melody standardisation converts the note sequence to a sequence of intervals that have a maximum size of one octave, with all intervals exceeding that interval being mapped to a harmonically similar interval within an oc- tave. For example, the interval from D to G an octave and a fourth (perfect 11th, or 17 semitones) would be mapped to the interval of a perfect fourth (5 semitones). Our ex- periments on symbolic polyphonic collections of approx- imately then thousand pieces showed little difference in precision when matching using an exact interval represen- tation and the simplified representation described above, that we call “directed modulo-12” [3]. The third and final stage of the process consists of matching the standardised query melody to each of the standardised melodies of the collection. The three match- ing techniques are described below. 2.1 Matching Techniques The first matching technique, which we have named Start- Match Alignment, initialises and fills the array in the man- ner of global alignment, but, in the manner of local align- ment, returns the highest score within the matrix. The equation used to calculate each cell’s value is the same as for global alignment. a[i, j ]= max 8 > > > > < > > > > : a[i - 1,j ]+ d i ≥ 1 a[i, j - 1] + d j ≥ 1 a[i - 1,j - 1] + e p(i)= t(j ) and i, j ≥ 1 a[i - 1,j - 1] + m p(i) 6= t(j ) and i, j ≥ 1 0 i, j =0 (1) where d is the cost of an insert or delete, e is the value of an exact match, m is the cost of a mismatch, i and j are non-negative integers, p(i) represents the ith symbol in the “pattern” or query, and t(j ) represents the j th symbol in the “text”, or potential answer string. The weights we