IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 5, JULY 2007 1551
Convolutive Blind Source Separation in the Frequency
Domain Based on Sparse Representation
Zhaoshui He, Shengli Xie, Senior Member, IEEE, Shuxue Ding, Member, IEEE, and
Andrzej Cichocki, Senior Member, IEEE
Abstract—Convolutive blind source separation (CBSS) that ex-
ploits the sparsity of source signals in the frequency domain is ad-
dressed in this paper. We assume the sources follow complex Lapla-
cian-like distribution for complex random variable, in which the
real part and imaginary part of complex-valued source signals are
not necessarily independent. Based on the maximum a posteriori
(MAP) criterion, we propose a novel natural gradient method for
complex sparse representation. Moreover, a new CBSS method is
further developed based on complex sparse representation. The de-
veloped CBSS algorithm works in the frequency domain. Here, we
assume that the source signals are sufficiently sparse in the fre-
quency domain. If the sources are sufficiently sparse in the fre-
quency domain and the filter length of mixing channels is relatively
small and can be estimated, we can even achieve underdetermined
CBSS. We illustrate the validity and performance of the proposed
learning algorithm by several simulation examples.
Index Terms—Complex Laplacian-like distribution, convolutive
blind source separation (CBSS), frequency domain, permutation
problem, probability density function, sparse representation (SR).
I. INTRODUCTION
C
ONVOLUTIVE blind source separation (CBSS) is a
popular research topic in signal processing, machine
learning, and neural networks. It has many applications in
wireless telecommunication, image processing, biomedical
signal processing, speech enhancement/recognition, etc., in
which the mixtures are convolutions of the sources [1]–[9].
CBSS is very challenging because there are many unknown
channel parameters that need to be estimated. Recently, many
Manuscript received July 1, 2006; revised January 30, 2007. This work was
supported in part by the National Natural Science Foundation of China under
Grants 60325310, U0635001, and 60505005, in part by the Natural Science
Fund of Guangdong Province, China, under Grants 04205783 and 05103553,
and in part by the Specialized Prophasic Basic Research Projects, Ministry of
Science and Technology, China, under Grant 2005CCA04100. The associate ed-
itor coordinating the review of this manuscript and approving it for publication
was Dr. Shoji Makino.
Z. He is with the School of Electronics and Information Engineering, South
China University of Technology, Guangzhou 510640, China, and also with the
Laboratory for Advanced Brain Signal Processing, RIKEN Brain Science Insti-
tute, Saitama 351-0198, Japan (e-mail: he_shui@tom.com).
S. Xie is with the School of Electronics and Information Engineering,
South China University of Technology, Guangzhou 510640, China (e-mail:
adshlxie@scut.edu.cn).
S. Ding is with the School of Computer Science and Engineering, Univer-
sity of Aizu, Fukushima 965-8580, Japan, and also with the Laboratory for
Advanced Brain Signal Processing, RIKEN Brain Science Institute, Saitama
351-0198, Japan (e-mail: sding@u-aizu.ac.jp).
A. Cichocki is with the Laboratory for Advanced Brain Signal Processing,
RIKEN Brain Science Institute, Saitama 351-0198, Japan, and also with the
System Research Institute, Polish Academy of Sciences (PAN), 00-901 Warsaw,
Poland and also with the Warsaw University of Technology, 00-661 Warsaw,
Poland (e-mail: cia@brain.riken.jp).
Digital Object Identifier 10.1109/TASL.2007.898457
methods have been proposed to solve this problem, some of
which suggested directly processing data in the time domain
[10], [11]. However, working in the time domain has the dis-
advantage of being rather computationally expensive due to the
calculation of many convolutions. Also, it is not very conve-
nient to sufficiently take advantage of the independence of the
sources. So many researchers have paid much attention to de-
veloping frequency domain CBSS methods [1]–[3], [12]. In the
frequency domain, the convolution operator is transformed into
the simple multiplication operator for the narrowband signals.
So, the CBSS problem is transformed into the instantaneous
BSS problem at each frequency bin. We can then employ con-
ventional independent component analysis (ICA) algorithms to
separately perform BSS at each frequency bin [13]. However,
the performance of many frequency-domain CBSS methods is
considerably limited due to the inherent permutation problem
[1], [3], [13], which is not significant for time-domain methods.
So, for many frequency-domain methods, we usually need to
employ the extra, precise, and robust methods to solve the per-
mutation problem [13].
In recent years, sparse representations (SR) or sparse com-
ponent analysis (SCA) has proven to be a useful tool in many
applications, especially in blind source separation [14]–[17],
[19]–[22]. Unlike ICA, SR estimates the sources based on the
assumption that the source signals are sparse (in the time do-
main or other linear transform domain) [14]–[17], [19]–[22]. SR
leads to powerful techniques capable of estimating more sources
than sensors. In addition, sparse representations have also led to
significant improvements in the performance of BSS/ICA tech-
niques even in the standard case in which the number of sources
is equal to or smaller than the number of sensors [16], [21].
In practice, many source signals are often not sparse in the
time domain but rather are sparse in the frequency domain or the
time-frequency domain, for example, many speech and audio
signals, especially, music signals. Some typical examples can
be found in [14], [20], [21], [23]. More typically, Yilmaz and
Rickard found that the independent sources even approximately
satisfied the W-disjoint orthogonality condition [24], [25] when
the sources in the time-frequency domain are sufficiently sparse.
Based on this condition, they demixed more sources from only
two observations.
To overcome the difficulties of CBSS, during the past several
years, SR has been employed to perform CBSS in the frequency
domain [4], [6]. For example, Bofill and Monte discussed the
recovery of the sources given the convolution matrix (i.e., the
mixing channels are known) in the underdetermined case [4].
They compared two different assumptions. 1) One assumption
is that the real and imaginary parts of the complex sources are
1558-7916/$25.00 © 2007 IEEE