A PROTOTYPICAL SERVICE FOR REAL-TIME ACCESS TO LOCAL CONTEXT-BASED MUSIC INFORMATION Frank Kurth, Meinard M¨ uller, Andreas Ribbrock, Tido R¨ oder, David Damm, and Christian Fremerey University of Bonn, Germany Department of Computer Science III ABSTRACT In this contribution we propose a generic service for real- time access to context-based music information such as lyrics or score data. In our web-based client-server sce- nario, a client application plays back a particular (wave- form) audio recording. During playback, the client con- nects to a server which in turn identifies the particular piece of audio as well as the current playback position. Subsequently, the server delivers local, i.e., position spe- cific, context-based information on the audio piece to the client. The client then synchronously displays the received information during acoustic playback. We demonstrate how such a service can be established using recent MIR (Music Information Retrieval) techniques such as audio identification and synchronization and present two partic- ular application scenarios. Keywords: Music services, context-based information, fingerprinting, synchronization. 1. INTRODUCTION The last years have seen the development of several funda- mental MIR techniques such as audio fingerprinting [3, 4], audio identification [1], score-based retrieval, or synchro- nization of music in different formats [2, 6, 7]. Besides the development of tools for basic retrieval tasks, the impor- tance of using feature-based representations for exchang- ing content-based music information (i.e., any informa- tion related to the content of the raw music data) over the internet has been recognized recently [8]. As an important example, compact noise-robust audio fingerprints may be used to precisely specify a playback position within a par- ticular piece of PCM audio [4]. As the online distribution of audio documents evolves, there is an increasing demand for advanced MIR services which are able to provide content-based information as well as metadata related to particular music documents. While there are already various services and resources on the internet providing global information such as lyrics or Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. c 2004 Universitat Pompeu Fabra. scores for a particular piece of music, there is still a lack of services providing local information for small excerpts of a given piece of music. However, such local context- based information (e.g., information related to a local time interval) can be of great value to a user while listening to a piece of music. Examples of applications incorporating local context-based information include score following, lyrics following, karaoke, or the online-display of transla- tions or commentaries. In this contribution, we propose a generic framework which allows users to access and exchange context-based (local) information related to particular pieces of audio. We demonstrate the feasibility of our framework by pre- senting two services, one providing context-based lyrics, the other providing score information. 2. GENERIC FRAMEWORK The generic scenario of the proposed service consists of a preprocessing phase and the runtime environment. In the preprocessing phase, we start with a given data collection of PCM audio pieces. For each of these audio pieces, we assume the existence of a particular type of ad- ditional, context-based information such as the lyrics in case of pop-songs or score information in case of classi- cal music. The preprocessing phase consists of two parts. In the first part, we create a fingerprint database (FPDB) using the raw audio material. Employing fingerprinting techniques such as [4], the FPDB allows us to precisely identify a particular (short) excerpt taken from any audio piece within the collection. The identification provides us with the respective song ID and the current position of the excerpt within that song. The second part of preprocess- ing consists of linking the context-based information for each audio piece to the actual time-line of that piece. This amounts to assigning a particular starting position and du- ration to each basic component, e.g., a single word in the lyrics scenario or a single note in the score scenario. Fig. 1 shows a score-, PCM-, and MIDI-version of the first mea- sures of J.S. Bach’s Aria con variazioni (BWV 988). The upper part of the figure illustrates the concept of score- PCM synchronization where a link between a symbolic note event and its corresponding physical realization is in- dicated by an arrow. Below, a corresponding illustration is given for a MIDI-PCM synchronization. Technically, the linking may be performed by specialized synchroniza-