Multi-target pitch tracking of vibrato sources in noise using the GM-PHD filter Dan Stowell and Mark D. Plumbley dan.stowell@eecs.qmul.ac.uk Centre for Digital Music, Queen Mary University of London, UK Abstract Probabilistic approaches to tracking often use single-source Bayesian models; applying these to multi-source tasks is problematic. We apply a principled multi-object track- ing implementation, the Gaussian mixture probability hypothesis density filter, to track multiple sources having fixed pitch plus vi- brato. We demonstrate high-quality filtering in a synthetic experiment, and find improved tracking using a richer feature set which cap- tures underlying dynamics. Our implementa- tion is available as open-source Python code. Probabilistic modelling of audio objects is useful be- cause Bayesian methods can be used to make prin- cipled inferences about the content of audio signals. For reasons of simplicity and tractability, inferences based on single-source models are widely used, such as the standard Hidden Markov Model (HMM) approach to speech recognition and music modelling. However, music is very often polyphonic, so there is a need to analyse acoustic scenes in which multiple sources may be simultaneously active. Multi-source tracking can be achieved by repeated application of single-source models, but this does not reflect the true scene and may yield sub-optimal results (Mahler, 2007). Existing multi-source approaches in music informatics often use non-probabilistic techniques. Probabilistic approaches exist, such as Probabilistic Latent Com- ponent Analysis (PLCA) which characterises sources as time-varying activations of spectral bases. How- ever, such models are not always well-matched to au- dio objects with structured variability over time, and are poorly suited to causal (e.g. real-time) tracking. In this paper we investigate an alternative multiple Appearing in Proceedings of the 29 th International Confer- ence on Machine Learning, Edinburgh, Scotland, UK, 2012. Copyright 2012 by the author(s)/owner(s). tracking paradigm, which models a set-valued random variable having multiple objects (Mahler, 2007). The probability hypothesis density filter (PHD filter) is one practical realisation of this approach. Given a system with linear Markov state updates and a linear obser- vation model, it propagates a density through time which is an estimate of the underlying system state. The PHD filter was originally formulated as a particle- type filter. Later work introduced the Gaussian mix- ture PHD filter (GM-PHD filter), using a Gaussian mixture (GM) to represent state and having improved performance (Vo & Ma, 2006; Mahler, 2007). The GM-PHD filter has similarities to a HMM- or Kalman-type filter with hidden state represented as a GM, propagated from one time-frame to the next. However, the GM does not represent a probability den- sity but the “intensity”: the first moment of the set- valued system state. The intensity does not integrate to 1, but to a total reflecting the expected number of objects present; its value at a location can be thought of as the expected number of objects at that location. 1. Implementation The GM-PHD filter has been applied to music audio in one published work (Clark et al., 2007), to post-process the output of sinusoidal modelling of piano notes, us- ing a model assuming fixed pitch and decaying ampli- tude. Here we explore its application to tracking mul- tiple pitched sources in noise, where the sources may exhibit vibrato modulations that obscure the observed pitch. In particular, we wish to study whether the fil- ter can recover stable tracks from observations such as might be found by dictionary-type approaches, whose output is often a variable number of observations. In practice, a missed detection may be more or less desirable than a false positive, and some reweighting of the tendency to positive or negative errors is desired. In the following, we use a multiplicative bias factor to alter this tendency: we multiply the total weight by the bias factor, before rounding the result to give the