MULTIPLE-F0 ESTIMATION FOR MIREX 2008 Chunghsin Yeh and Axel Roebel Analysis-Synthesis team IRCAM/CNRS-STMS Paris, France Wei-Chen Chang Dep. of Computer Science and Information Engineering National Cheng-Kung University, Tainan, Taiwan ABSTRACT This extended abstract describes the system proposed for MIREX (Music Information Retrieval Evaluation eXchange) 2008 in the Multiple Fundamental Frequency Estimation and Tracking contest. This system is based on a frame-by- frame analysis with a recently developed tracking mecha- nism. It is submitted for the first two sub-tasks: (1) frame- by-frame evaluation (2) note contour evaluation. 1 INTRODUCTION The proposed multiple-F0 (fundamental frequency) estima- tion system is composed of two parts: frame-based F0 esti- mation and source stream tracking. The number of sources, or polyphony, and the related F0s are first estimated on a frame-by-frame basis [1]. The tracking mechanism then re- fines the estimation of the number of source streams in a maximum likelihood manner [2]. Compared with the ver- sion submitted in 2007, the frame-based F0 estimation part has been improved, especially in the estimation accuracy of higher polyphony. This year, two versions are submitted for the first sub-task frame-by-frame evaluation: (i) frame- based estimation without tracking and (ii) frame-based esti- mation with tracking. For the second sub-task note contour evaluation, the tracking results are reported. 2 FRAME-BASED MULTIPLE-F0 ESTIMATION The frame-based F0 estimation part is based on a score func- tion which evaluates the plausibility of a set of F0 hypothe- ses [3]. It evaluates all possible combinations among F0 hypotheses for the concurrent source number from 0 to the maximal polyphony hypothesis. Then, the best set of F0s is selected progressively by means of two criteria related to the residual and the spectral smoothness. It is composed of four stages. At first, the adaptive noise level estimation distin- guishes the sinusoidal components. F0 candidates are then iteratively extracted until no significant sinusoidal compo- nents are left to explain. The score function joint evaluates all the combinations of F0 candidates and the best set is se- lected by a polyphony inference algorithm. 2.1 Noise level estimation Under the assumption that the power spectrum of noise is nearly flat within a narrow frequency band, the magnitude distribution of narrow band noise is modeled by means of Rayleigh distribution. Consequently, the noise level is mod- eled as a succession of Rayleigh distributions, each of which is a function of frequency. An adaptive noise level estima- tion algorithm has been developed to iteratively approxi- mate the underlying noise level [4]. According to the es- timated noise level, the spectral peaks are classified into si- nusoids (above the noise level) and noise (below the noise level). 2.2 F0 candidate selection This stage aims at select the F0 candidates in a precise and concise manner such that the number of their combinations is reduced to a reasonable amount [1]. The NHRF0s (non- harmonically related F0s) are first extracted in an iterative estimation/suppression process. Each NHRF0 represents a harmonic group of partials which do not overlap completely with the partials of the other groups. Then, HRF0s (har- monically related F0s) are detected within each harmonic group by means of detecting partials disturbing the envelope smoothness. 2.3 Joint evaluation of F0 hypotheses Given a set of F0 hypotheses, the hypothetical sources are constructed by partial selection and overlap treatment. The related combination is evaluated by a score function com- posed of four criteria: 1. Harmonicity: harmonic matching 2. Mean Bandwidth: envelope smoothness 3. Spectral centroid: energy concentration in lower par- tials 4. Synchronicity: synchronous amplitude evolution within a single source The linear combination of the four criteria forms the score function which evaluates the plausibility of a given combi- nation of F0 hypotheses.