PARTICLE FILTERING ALGORITHMS FOR TRACKING MULTIPLE SOUND SOURCES USING MICROPHONE ARRAYS Mitsuru Kawamoto , Futoshi Asano , Hideki Asoh , and Kiyoshi Yamamoto 1. National Institute of Advanced Industrial Science and Technology (AIST), Central 2, 1-1-1 Umezono, Tsukuba, Ibaraki 305-8568, Japan 2. CREST, JST. ABSTRACT A particle ¿ltering algorithm using the parameters in the EM (Expectation-Maximization) algorithm is proposed for track- ing multiple sound sources. Differently from the conven- tional EM based algorithms, the proposed algorithm can track multiple sound sources without knowing their starting points. Moreover, an idea of the group tracking is applied to the par- ticle ¿ltering algorithm so that better tracking performances can be obtained. Experimental results show the validity of the proposed algorithm. Index TermsParticle ¿ltering algorithms, EM algo- rithms, Tracking, Multiple sound sources, Microphone arrays 1. INTRODUCTION Sound source tracking using microphone arrays has been one of the central problems in radar, sonar, navigation, speech in- teraction, and so on. In this paper, we propose a method of tracking for mul- tiple sound sources, using particle ¿ltering algorithms. The particle ¿lter is used to estimate sound positions and on/off audio status. Differently from the conventional particle ¿lter- ing algorithms, e.g., [2, 3], the information used to handle the particle ¿lter is only audio signals. In [9], a particle ¿ltering algorithm utilizing only the information of audio signals has been proposed, but the number of tracking sound sources is only one. Hence, in this paper, for the tracking of multiple sound sources, we want to show a method where good track- ing performances can be obtained by particle ¿lters using only the information of audio signals. To this objective, in our particle ¿lter, as a function of esti- mating importance weights [4], a pseudo-likelihood function, which is calculated by the parameters used in Expectation- Maximization (EM) algorithms (EMAs), is proposed. Since an effect of signal separation is embedded in the EMA [1], the EMA based pseudo-likelihood function may be suitable for tracking multiple sound sources. Some examples, in which EMAs are applied to sound lo- calization and tracking problems, have been introduced until now [1, 5, 6]. In the EMA, given the initial value for esti- mating the sound location or the tracked point, and then by iterating the E-step and the M-step alternately, the localiza- tion or the tracking is achieved. This is one of the advantages of the EMA compared with other conventional localization methods such as MUSIC [7]. However, if the initial value is far from desired solutions, it cannot be guaranteed whether or not the EMA provides the desired solution (see Section 4). In the proposed algorithm, such a problem can be avoided using the particle ¿lter (see Section 4). Moreover, we consider applying an idea of the group track- ing [8] to the particle ¿ltering. Then we expect that better tracking performances can be obtained by the proposed algo- rithm. Experimental results show the validity of the proposed algorithm. 2. SOUND LOCALIZATION USING THE EM ALGORITHM (EMA) In this section, the EMA based sound localization method is brieÀy introduced, because we adopt the idea of the EMA to the proposed algorithm and hence this explanation may be helpful for understanding the proposed algorithm. 2.1. Audio Signal Model Throughout this paper, audio signals are treated in the frequency domain. The short-time Fourier transform (STFT) of the microphone input is de¿ned as =[ , , ] (input vector), where is the STFT of th microphone input at time and frequency , is the number of microphones. Hereafter, the index of frequency is omitted for the simplicity of writing. The input vector can be modeled as (1) where is a location vector matrix de¿ned as (2) =[ , , ] is a source spectrum vector , and =[ , , ] is a background noise spectrum vector. Here, is the number of active sound sources and ( = 1,2, , ) represent the 2D directions of the sound sources. The noise is assumed to be zero mean Gaussian I  129 1424407281/07/$20.00 ©2007 IEEE ICASSP 2007