PARTICLE FILTERING ALGORITHMS FOR TRACKING MULTIPLE SOUND SOURCES
USING MICROPHONE ARRAYS
Mitsuru Kawamoto , Futoshi Asano , Hideki Asoh , and Kiyoshi Yamamoto
1. National Institute of Advanced Industrial Science and Technology (AIST),
Central 2, 1-1-1 Umezono, Tsukuba, Ibaraki 305-8568, Japan
2. CREST, JST.
ABSTRACT
A particle ¿ltering algorithm using the parameters in the EM
(Expectation-Maximization) algorithm is proposed for track-
ing multiple sound sources. Differently from the conven-
tional EM based algorithms, the proposed algorithm can track
multiple sound sources without knowing their starting points.
Moreover, an idea of the group tracking is applied to the par-
ticle ¿ltering algorithm so that better tracking performances
can be obtained. Experimental results show the validity of
the proposed algorithm.
Index Terms— Particle ¿ltering algorithms, EM algo-
rithms, Tracking, Multiple sound sources, Microphone arrays
1. INTRODUCTION
Sound source tracking using microphone arrays has been one
of the central problems in radar, sonar, navigation, speech in-
teraction, and so on.
In this paper, we propose a method of tracking for mul-
tiple sound sources, using particle ¿ltering algorithms. The
particle ¿lter is used to estimate sound positions and on/off
audio status. Differently from the conventional particle ¿lter-
ing algorithms, e.g., [2, 3], the information used to handle the
particle ¿lter is only audio signals. In [9], a particle ¿ltering
algorithm utilizing only the information of audio signals has
been proposed, but the number of tracking sound sources is
only one. Hence, in this paper, for the tracking of multiple
sound sources, we want to show a method where good track-
ing performances can be obtained by particle ¿lters using only
the information of audio signals.
To this objective, in our particle ¿lter, as a function of esti-
mating importance weights [4], a pseudo-likelihood function,
which is calculated by the parameters used in Expectation-
Maximization (EM) algorithms (EMAs), is proposed. Since
an effect of signal separation is embedded in the EMA [1],
the EMA based pseudo-likelihood function may be suitable
for tracking multiple sound sources.
Some examples, in which EMAs are applied to sound lo-
calization and tracking problems, have been introduced until
now [1, 5, 6]. In the EMA, given the initial value for esti-
mating the sound location or the tracked point, and then by
iterating the E-step and the M-step alternately, the localiza-
tion or the tracking is achieved. This is one of the advantages
of the EMA compared with other conventional localization
methods such as MUSIC [7]. However, if the initial value is
far from desired solutions, it cannot be guaranteed whether or
not the EMA provides the desired solution (see Section 4). In
the proposed algorithm, such a problem can be avoided using
the particle ¿lter (see Section 4).
Moreover, we consider applying an idea of the group track-
ing [8] to the particle ¿ltering. Then we expect that better
tracking performances can be obtained by the proposed algo-
rithm. Experimental results show the validity of the proposed
algorithm.
2. SOUND LOCALIZATION USING THE EM
ALGORITHM (EMA)
In this section, the EMA based sound localization method is
brieÀy introduced, because we adopt the idea of the EMA to
the proposed algorithm and hence this explanation may be
helpful for understanding the proposed algorithm.
2.1. Audio Signal Model
Throughout this paper, audio signals are treated in the
frequency domain. The short-time Fourier transform (STFT)
of the microphone input is de¿ned as =[ , ,
] (input vector), where is the STFT of
th microphone input at time and frequency , is the
number of microphones. Hereafter, the index of frequency
is omitted for the simplicity of writing. The input vector can
be modeled as
(1)
where is a location vector matrix de¿ned as
(2)
=[ , , ] is a source spectrum vector , and
=[ , , ] is a background noise spectrum
vector. Here, is the number of active sound sources and
( = 1,2, , ) represent the 2D directions of the sound
sources. The noise is assumed to be zero mean Gaussian
I 129 1424407281/07/$20.00 ©2007 IEEE ICASSP 2007