GLIMPSING INDEPENDENT VECTOR ANALYSIS: SEPARATING MORE SOURCES THAN
SENSORS USING ACTIVE AND INACTIVE STATES
Alireza Masnadi-Shirazi,Wenyi Zhang and Bhaskar D. Rao
Department of Electrical and Computer Engineering, University of California, San Diego
{amasnadi, w3zhang, brao}@ucsd.edu
ABSTRACT
In this paper, we explore the problem of separating convolut-
edly mixed signals in the overcomplete (degenerate) case of
having more sources than sensors. We exploit a common form
of nonstationarity, especially present in speech, wherein the
signals have silence periods intermittently, hence varying the
set of active sources with time. A novel approach is proposed
that takes advantage of different combinations of silence gaps
in the source signals at each time period. This enables the
algorithm to ”glimpse” or listen in the gaps, hence compen-
sating for the global degeneracy by allowing it to learn the
mixing matrices at periods where it is locally less degenerate.
Experiments using simulated and real room recordings were
carried out yielding good separation results.
Index Terms— Overcomplete systems, independent
component analysis, convolutive mixtures, blind source sep-
aration
1. INTRODUCTION
Frequency domain blind source separation (BSS) methods
have been extensively studied to separate convolutedly mixed
signals. By computing the short time Fourier transform
(STFT), convolution in the time domain translates to linear
mixing in the frequency domain enabling the use of Inde-
pendent component analysis (ICA) on each frequency bin.
However, due to ICA being indeterminate in permutation,
further post processing methods have to be used to avoid the
permutation problem. Independent vector analysis (IVA) is a
method that avoids such permutation issues by utilizing the
inner dependencies between the frequency bins. IVA models
each individual source as a dependent multivariate symmetric
distribution while still maintaining the fundamental assump-
tion of BSS that each source is independent from the other.
Therefore, one can say that IVA is the multi-dimensional
extension of ICA [1].
For standard ICA-based methods, when the number of
sources M becomes greater than the number of sensors L
(M>L), the process of estimating the mixing matrix and
This research was supported by UC MICRO grants 07-034, 08-65 spon-
sered by Qualcomm Inc.
the sources are not that straightforward. Various methods
in the past with different underlying assumptions have been
proposed to deal with overcompleteness (degeneracy) in ICA
linear mixing. One method uses a maximum likelihood ap-
proximation framework for learning the overcomplete mixing
matrix and a Laplacian prior for inferring of the sources[2].
Other methods incorporate geometric/probabilistic clustering
approaches while relying heavily on sparsity assumptions (at
each time mainly one source is active) [3, 4, 5]. All such
methods, however, do not take into consideration the tempo-
ral dynamic structure of the signals.
Most signals of interest in BSS like speech, music and
EEG are nonstationary. One common type of nonstationarity,
especially present in speech, is that the signals can have inter-
mittent silence periods, hence varying the set of active sources
with time. Such feature can be used to deal with degenerate
BSS. As the set of active sources for each time period de-
creases, the degree of degeneracy (M − L) decreases locally.
Hence, by exploiting silence gaps, one is actually compen-
sating for the global degeneracy by making use of segments
where it is locally less degenerate. An approach to model
active and inactive intervals for instantaneous linear mixing
overcomplete case has been proposed. This method models
the sources as a two-mixture of Gaussians with zero means
and unknown variances similar to that of independent fac-
tor analysis(IFA) [6], and incorporates a Markov model on
a hidden variable that controls state of activity or inactivity
for each source. A complicated and inefficient three layered
hidden variable (one for the Markov state of activity and two
as in normal IFA) estimation algorithm based on variational
Bayes is implemented [7]. Extending this to IVA for convo-
luted mixtures proves to be even more complicated. In our
previous work we proposed a simple and efficient algorithm
to model the states of activity and inactivity in the presence of
noise for the non-denegerate case of convoluted mixing using
a simple mixture model[8]. Unlike the method in [7] where
the on/off states were embedded in the sources themselves,
they were modeled more naturally as controllers turning on
and off the columns of the mixing matrices. In this paper we
build upon our previous work to facilitate the degenerate case
in convoluted mixing. Moreover, Various studies have con-
firmed that human listeners use similar strategies of exploit-
2010 978-1-4244-4296-6/10/$25.00 ©2010 IEEE ICASSP 2010