IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 5, JULY 2007 1551 Convolutive Blind Source Separation in the Frequency Domain Based on Sparse Representation Zhaoshui He, Shengli Xie, Senior Member, IEEE, Shuxue Ding, Member, IEEE, and Andrzej Cichocki, Senior Member, IEEE Abstract—Convolutive blind source separation (CBSS) that ex- ploits the sparsity of source signals in the frequency domain is ad- dressed in this paper. We assume the sources follow complex Lapla- cian-like distribution for complex random variable, in which the real part and imaginary part of complex-valued source signals are not necessarily independent. Based on the maximum a posteriori (MAP) criterion, we propose a novel natural gradient method for complex sparse representation. Moreover, a new CBSS method is further developed based on complex sparse representation. The de- veloped CBSS algorithm works in the frequency domain. Here, we assume that the source signals are sufficiently sparse in the fre- quency domain. If the sources are sufficiently sparse in the fre- quency domain and the filter length of mixing channels is relatively small and can be estimated, we can even achieve underdetermined CBSS. We illustrate the validity and performance of the proposed learning algorithm by several simulation examples. Index Terms—Complex Laplacian-like distribution, convolutive blind source separation (CBSS), frequency domain, permutation problem, probability density function, sparse representation (SR). I. INTRODUCTION C ONVOLUTIVE blind source separation (CBSS) is a popular research topic in signal processing, machine learning, and neural networks. It has many applications in wireless telecommunication, image processing, biomedical signal processing, speech enhancement/recognition, etc., in which the mixtures are convolutions of the sources [1]–[9]. CBSS is very challenging because there are many unknown channel parameters that need to be estimated. Recently, many Manuscript received July 1, 2006; revised January 30, 2007. This work was supported in part by the National Natural Science Foundation of China under Grants 60325310, U0635001, and 60505005, in part by the Natural Science Fund of Guangdong Province, China, under Grants 04205783 and 05103553, and in part by the Specialized Prophasic Basic Research Projects, Ministry of Science and Technology, China, under Grant 2005CCA04100. The associate ed- itor coordinating the review of this manuscript and approving it for publication was Dr. Shoji Makino. Z. He is with the School of Electronics and Information Engineering, South China University of Technology, Guangzhou 510640, China, and also with the Laboratory for Advanced Brain Signal Processing, RIKEN Brain Science Insti- tute, Saitama 351-0198, Japan (e-mail: he_shui@tom.com). S. Xie is with the School of Electronics and Information Engineering, South China University of Technology, Guangzhou 510640, China (e-mail: adshlxie@scut.edu.cn). S. Ding is with the School of Computer Science and Engineering, Univer- sity of Aizu, Fukushima 965-8580, Japan, and also with the Laboratory for Advanced Brain Signal Processing, RIKEN Brain Science Institute, Saitama 351-0198, Japan (e-mail: sding@u-aizu.ac.jp). A. Cichocki is with the Laboratory for Advanced Brain Signal Processing, RIKEN Brain Science Institute, Saitama 351-0198, Japan, and also with the System Research Institute, Polish Academy of Sciences (PAN), 00-901 Warsaw, Poland and also with the Warsaw University of Technology, 00-661 Warsaw, Poland (e-mail: cia@brain.riken.jp). Digital Object Identifier 10.1109/TASL.2007.898457 methods have been proposed to solve this problem, some of which suggested directly processing data in the time domain [10], [11]. However, working in the time domain has the dis- advantage of being rather computationally expensive due to the calculation of many convolutions. Also, it is not very conve- nient to sufficiently take advantage of the independence of the sources. So many researchers have paid much attention to de- veloping frequency domain CBSS methods [1]–[3], [12]. In the frequency domain, the convolution operator is transformed into the simple multiplication operator for the narrowband signals. So, the CBSS problem is transformed into the instantaneous BSS problem at each frequency bin. We can then employ con- ventional independent component analysis (ICA) algorithms to separately perform BSS at each frequency bin [13]. However, the performance of many frequency-domain CBSS methods is considerably limited due to the inherent permutation problem [1], [3], [13], which is not significant for time-domain methods. So, for many frequency-domain methods, we usually need to employ the extra, precise, and robust methods to solve the per- mutation problem [13]. In recent years, sparse representations (SR) or sparse com- ponent analysis (SCA) has proven to be a useful tool in many applications, especially in blind source separation [14]–[17], [19]–[22]. Unlike ICA, SR estimates the sources based on the assumption that the source signals are sparse (in the time do- main or other linear transform domain) [14]–[17], [19]–[22]. SR leads to powerful techniques capable of estimating more sources than sensors. In addition, sparse representations have also led to significant improvements in the performance of BSS/ICA tech- niques even in the standard case in which the number of sources is equal to or smaller than the number of sensors [16], [21]. In practice, many source signals are often not sparse in the time domain but rather are sparse in the frequency domain or the time-frequency domain, for example, many speech and audio signals, especially, music signals. Some typical examples can be found in [14], [20], [21], [23]. More typically, Yilmaz and Rickard found that the independent sources even approximately satisfied the W-disjoint orthogonality condition [24], [25] when the sources in the time-frequency domain are sufficiently sparse. Based on this condition, they demixed more sources from only two observations. To overcome the difficulties of CBSS, during the past several years, SR has been employed to perform CBSS in the frequency domain [4], [6]. For example, Bofill and Monte discussed the recovery of the sources given the convolution matrix (i.e., the mixing channels are known) in the underdetermined case [4]. They compared two different assumptions. 1) One assumption is that the real and imaginary parts of the complex sources are 1558-7916/$25.00 © 2007 IEEE