Journal of Intelligent Information Systems, 21:1, 35–52, 2003 c 2003 Kluwer Academic Publishers. Manufactured in The Netherlands. Analysis of Vowels in Sung Queries for a Music Information Retrieval System MAUREEN MELLODY maureen@stievater.com Applied Physics Program, University of Michigan, Ann Arbor, Michigan, USA MARK A. BARTSCH ∗ mbartsch@eecs.umich.edu GREGORY H. WAKEFIELD ghw@eecs.umich.edu Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, Michigan, USA Received June 30, 2002; Revised July 31, 2002; Accepted August 15, 2002 Abstract. A method for analyzing and categorizing the vowels of a sung query is described and analyzed. This query system uses a combination of spectral analysis and parametric clustering techniques to divide a single query into different vowel regions. The method is applied separately to each query, so no training or repeated measures are necessary. The vowel regions are then transformed into strings and string search methods are used to compare the results from various songs. We apply this method to a small pilot study consisting of 40 sung queries from each of 7 songs. Approximately 60% of the queries are correctly identified with their corresponding song, using only the vowel stream as the identifier. Keywords: music information retrieval, vowels, singing, query by humming 1. Introduction One can easily envision the following scenario: a music listener has heard a pop tune on the radio, and would like to download a copy from a computer database of musical pieces. However, he knows neither the name of the song nor its singer, but can remember some of the music itself. The most convenient method for this listener to query the music database is to sing the remembered portion of the pop tune into his computer. The quickly growing field of music information retrieval often uses such a “query-by-humming” paradigm for searching musical databases. In most such systems, a fundamental frequency tracking algorithm is used to parse a sung query for melodic content (McNab et al., 1996; Smith et al., 1997; Mazzoni and Dannenberg, 2001). The resulting melodic information is used to search a musical database using either string matching techniques (Ghias et al., 1995; Smith et al., 1997; McNab et al., 2000) or other models such as hidden Markov models (Birmingham et al., 2001; Shifrin et al., 2002). What happens, though, if the person singing into the retrieval system is a terrible singer? One approach that is being investigated is the development of sophisticated models of pitch error (Meek and Birmingham, 2002). Such models, however, still assume that the user of the system has some moderate singing ability. This is a general flaw with the use of singing ∗ Author to whom all correspondence should be addressed.