Tararira: Query By Singing System Ernesto L´ opez, Mart´ ın Rocamora Instituto de Ingenier´ ıa El´ ectrica Facultad de Ingenier´ ıa de la Universidad de la Rep´ ublica Julio Herrera y Reissig 565 – (598) (2) 711 09 74, Montevideo, Uruguay elopez,rocamora@fing.edu.uy Abstract This extended abstract details a submission to the Music In- formation Retrieval Evaluation eXchange in the Query by Singing/Humming task. The problem of query by singing consists of building a machine capable of simulating the cognitive process of identifying a musical piece from a few sung notes of its melody. In this work, the algorithms of pitch tracking, onset detection and melody matching used in the system Tararira [1] are briefly described. Much effort has been put on automatic transcription of singing voice as it is a key factor in the overall performance. A novel way of combining note by note matching with the approach based on pitch time series matching is introduced. Keywords: QBH, MIREX, melody matching. 1. Introduction Through the last decade, different approaches to face the query by singing problem were considered. In all the pro- posals, the database consist of music in symbolic notation, generally MIDI, instead of raw or compressed audio as there is no sufficiently robust automatic way to extract the melody directly from a recording to compare it with the query. The systems proposed can be divided, from its repre- sentation and matching technique, basically into two ap- proaches. The traditional approach is based on note by note comparison [2][3], whereas a more recent approach utilizes the comparison of fundamental frequency time series [4][5]. The first approach consist of transcribing the voice signal into a sequence of notes and searching for the best occur- rences of this pattern on database of melodies. Due to the performance decrease produced by transcription errors, the other approach avoids the automatic transcription, compar- ing melodies as fundamental frequency time series. Unfor- tunately, this implies working with long sequences (very long compared to sequences of notes) therefore computa- tional time becomes prohibitive. Moreover, it is necessary Work partly supported by Comisi´ on Sectorial de Investigaci´ on Cient´ ıfica (CSIC). Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. c 2006 University of Victoria to require the user to sing a previously defined melody frag- ment [4][5]. In the system Tararira [1] a novel way of combining both approaches is introduced, that preserves the advantages of each of them. Firstly, the system selects a reduced group of candidates from the database, using note by note matching. Then, the selection is refined using fundamental frequency time series comparison. The system architecture is divided in two main stages, as depicted in figure 1. The first one is the transcription of the query into a sequence of notes. In the second one, this sequence is matched to the melodies stored in the database, and a list of musical pieces is retrieved, in a similarity order. The transcription stage involves the following tasks: • To estimate the fundamental frequency contour to set the note pitches. • To segment the audio signal in order to establish the onset time and duration of notes. • To perform a melodic analysis to adjust the note pitches to the equal tempered scale. The tasks of the matching stage are: • To codify the note sequence so as to obtain key and tempo transposition independence in the matching. • To set flexible similarity rules to take into account query ornaments or mistakes, and automatic transcrip- tion errors. • To refine the candidates selection, avoiding automatic transcription errors, by comparing fundamental fre- quency time series. Melodic Analysis Pitch Tracking Voice Signal Refinement Search Result Database Segmentation Transcription Melody Matching Codification Matching Figure 1: Block diagram of the system.