Audio Engineering Society Convention Paper Presented at the 112th Convention 2002 May 10–13 Munich, Germany This convention paper has been reproduced from the author’s advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio Engineering Society, 60 East 42nd Street, New York, New York 10165-2520, USA; also see www.aes.org. All rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the Journal of the Audio Engineering Society. Time-domain Polyphonic Transcription using Self-Generating Databases Juan Pablo Bello, Laurent Daudet, and Mark Sandler Department of Electronic Engineering, Queen Mary, University of London, Mile End Road, London E1 4NS, UK Correspondence should be addressed to Juan Pablo Bello (juan.bello-correa@elec.qmul.ac.uk) ABSTRACT We describe a new method for the estimation of multiple pitch information in recorded piano music. The method works in the time-domain and makes use of a self-generating database of all possible notes. First, we show how accurate polyphonic pitch detection can be achieved given an adequate database. Then an algorithm is proposed that generates the database from the music, using estimation of predominant pitches in the frequency-domain and pitch-shifting techniques. Both systems generate a MIDI representation of the original signal. This method -that can be generalized to any solo instrument- overcomes the usual constraints of the traditional frequency-domain approach regarding intervals and quantity of notes. INTRODUCTION AND OBJECTIVES Extracting meaning from music is a process that comes natu- ral to human perception. There are very different levels of in- formation, subjective (e.g. style, mood...) and objective (e.g. tempo, notes...), that can be extracted from music. Usually objective feature recognition hugely depends on the levels of training and knowledge that the listener possesses. The music transcription task (the process of converting audio to score) is an example of this: the more complex the musical input becomes, the more acute the need for prior knowledge. Automatic music transcription tries to recreate this task with computer algorithms, but becomes increasingly complicated when dealing with polyphonic music, that presents multiplic- ity of pitches and possibly timbres. Most monophonic tran- scription techniques are not applicable to this case, forcing researchers to switch views and to propose novel ways of tack- ling different aspects of this problem. Previous systems rely, almost as a rule, on the analysis of information in the frequency domain. In time-frequency rep-