70 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 1, JANUARY 2007
Blind Source Separation Exploiting Higher-Order
Frequency Dependencies
Taesu Kim, Student Member, IEEE, Hagai T. Attias, Soo-Young Lee, Member, IEEE, and
Te-Won Lee, Member, IEEE
Abstract—Blind source separation (BSS) is a challenging
problem in real-world environments where sources are time de-
layed and convolved. The problem becomes more difficult in very
reverberant conditions, with an increasing number of sources,
and geometric configurations of the sources such that finding
directionality is not sufficient for source separation. In this paper,
we propose a new algorithm that exploits higher order frequency
dependencies of source signals in order to separate them when
they are mixed. In the frequency domain, this formulation as-
sumes that dependencies exist between frequency bins instead of
defining independence for each frequency bin. In this manner,
we can avoid the well-known frequency permutation problem. To
derive the learning algorithm, we define a cost function, which is
an extension of mutual information between multivariate random
variables. By introducing a source prior that models the inherent
frequency dependencies, we obtain a simple form of a multivariate
score function. In experiments, we generate simulated data with
various kinds of sources in various environments. We evaluate the
performances and compare it with other well-known algorithms.
The results show the proposed algorithm outperforms the others
in most cases. The algorithm is also able to accurately recover six
sources with six microphones. In this case, we can obtain about
16-dB signal-to-interference ratio (SIR) improvement. Similar
performance is observed in real conference room recordings with
three human speakers reading sentences and one loudspeaker
playing music.
Index Terms—Blind source separation (BSS), cocktail party
problem, convolutive mixture, frequency domain, higher order
dependency, independent component analysis, permutation
problem.
I. INTRODUCTION
I
N RECENT years, recovering the original source sig-
nals from observed signals without knowing the mixing
process, so called blind source separation (BSS), has attracted
a number of researchers. BSS is relevant to many applications
Manuscript received February 1, 2005; revised December 6, 2005. The work
of T. Kim and S.-Y. Lee were supported in part by the Chung Moon Soul Center
for Bioinformation Bioelectronics and in part by the Brain Neuroinformatics
Research Program, Korean Ministry of Science and Technology. The associate
editor coordinating the review of this manuscript and approving it for publica-
tion was Dr. Bhiksha(GE) Raj.
T. Kim is with the Deptartment of Biosystems, Korea Advanced Institute of
Science and Technology (KAIST), Dajeon 305-701, Korea and also with the In-
stitute for Neural Computation, University of California at San Diego, La Jolla,
CA 92093 USA (e-mail: taesu.kim@kaist.ac.kr; taesu@ucsd.edu).
H. T. Attias is with Golden Metallic, Inc., San Francisco, CA 94147 USA
(htattias@goldenmetallic.com).
S.-Y. Lee is with the Deptartment of Biosystems, Korea Advanced Institute
of Science and Technology, Dajeon 305-701, Korea (e-mail: sylee@kaist.ac.kr).
T.-W. Lee is with the Institute for Neural Computation, University of Cali-
fornia at San Diego, La Jolla, CA 92093 USA (e-mail: tewon@ucsd.edu).
Digital Object Identifier 10.1109/TASL.2006.872618
including speech enhancement for noise robust speech recog-
nition, crosstalk separation in telecommunication, high-quality
hearing aids equipment, analyzing biological signals such as
electroencephalograph (EEG) and magnetoencephalograph
(MEG). The fundamental assumption in the BSS problem is
that the source signals are statistically independent.
Independent component analysis (ICA) is the method to find
statistically independent sources from mixtures of sources by
utilizing higher order statistics [1]–[3]. In its simplest form, the
ICA model assumes linear instantaneous mixing without sensor
noise and the number of sources being equal to the number of
sensors. When trying to solve the problem of separating source
signals mixed in a real environment, those assumptions are not
valid and model extensions are required. In those cases, ob-
served signals are not instantaneous mixtures of sources, but
convolutive mixtures, which mean that they are mixed with time
delays and convolutions. In order to deal with convolved mix-
tures, the ICA model formulation and the learning algorithm
have been extended to convolutive mixtures in both the time
and the frequency domains [4]–[9]. Those models are known
as solutions to the multichannel blind deconvolution problem.
In the case of the time domain approach, solutions usually re-
quire intensive computations with long de-reverberation filters,
and the resulting unmixed source signals are whitened due to
the independent and identically distributed (i.i.d.) assumption
[5]. Slow convergence speed especially for colored signals has
been observed. The computational load and slow convergence
can be overcome by the frequency domain approach, in which
multiplication at each frequency bin replaces convolution oper-
ation in the time domain. Thus, one can apply the ICA algorithm
to instantaneous mixtures in each frequency bin. Although this
may be attractive, the main problem then is the permutation of
the ICA solutions over different frequency bins due to the in-
determinacy of permutation inherent in the ICA algorithm. One
should correct the permutations of separating matrices at each
frequency so that the separated signal in the time domain is re-
constructed properly.
Various approaches have been proposed to solve the permu-
tation problem. A popular approach is to impose a smoothness
constraint of the source that translates into smoothing the sep-
arating filter. This approach has been realized by several tech-
niques such as averaging separating matrices with adjacent fre-
quencies [9], limiting the filter length in the time domain [10],
or considering the coherency of separating matrices at adjacent
frequencies [11]. Another related approach is based on direction
of arrival (DOA) estimation which is much used in array signal
processing. By analyzing the directivity patterns formed by a
1558-7916/$20.00 © 2006 IEEE