70 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 1, JANUARY 2007 Blind Source Separation Exploiting Higher-Order Frequency Dependencies Taesu Kim, Student Member, IEEE, Hagai T. Attias, Soo-Young Lee, Member, IEEE, and Te-Won Lee, Member, IEEE Abstract—Blind source separation (BSS) is a challenging problem in real-world environments where sources are time de- layed and convolved. The problem becomes more difficult in very reverberant conditions, with an increasing number of sources, and geometric configurations of the sources such that finding directionality is not sufficient for source separation. In this paper, we propose a new algorithm that exploits higher order frequency dependencies of source signals in order to separate them when they are mixed. In the frequency domain, this formulation as- sumes that dependencies exist between frequency bins instead of defining independence for each frequency bin. In this manner, we can avoid the well-known frequency permutation problem. To derive the learning algorithm, we define a cost function, which is an extension of mutual information between multivariate random variables. By introducing a source prior that models the inherent frequency dependencies, we obtain a simple form of a multivariate score function. In experiments, we generate simulated data with various kinds of sources in various environments. We evaluate the performances and compare it with other well-known algorithms. The results show the proposed algorithm outperforms the others in most cases. The algorithm is also able to accurately recover six sources with six microphones. In this case, we can obtain about 16-dB signal-to-interference ratio (SIR) improvement. Similar performance is observed in real conference room recordings with three human speakers reading sentences and one loudspeaker playing music. Index Terms—Blind source separation (BSS), cocktail party problem, convolutive mixture, frequency domain, higher order dependency, independent component analysis, permutation problem. I. INTRODUCTION I N RECENT years, recovering the original source sig- nals from observed signals without knowing the mixing process, so called blind source separation (BSS), has attracted a number of researchers. BSS is relevant to many applications Manuscript received February 1, 2005; revised December 6, 2005. The work of T. Kim and S.-Y. Lee were supported in part by the Chung Moon Soul Center for Bioinformation Bioelectronics and in part by the Brain Neuroinformatics Research Program, Korean Ministry of Science and Technology. The associate editor coordinating the review of this manuscript and approving it for publica- tion was Dr. Bhiksha(GE) Raj. T. Kim is with the Deptartment of Biosystems, Korea Advanced Institute of Science and Technology (KAIST), Dajeon 305-701, Korea and also with the In- stitute for Neural Computation, University of California at San Diego, La Jolla, CA 92093 USA (e-mail: taesu.kim@kaist.ac.kr; taesu@ucsd.edu). H. T. Attias is with Golden Metallic, Inc., San Francisco, CA 94147 USA (htattias@goldenmetallic.com). S.-Y. Lee is with the Deptartment of Biosystems, Korea Advanced Institute of Science and Technology, Dajeon 305-701, Korea (e-mail: sylee@kaist.ac.kr). T.-W. Lee is with the Institute for Neural Computation, University of Cali- fornia at San Diego, La Jolla, CA 92093 USA (e-mail: tewon@ucsd.edu). Digital Object Identifier 10.1109/TASL.2006.872618 including speech enhancement for noise robust speech recog- nition, crosstalk separation in telecommunication, high-quality hearing aids equipment, analyzing biological signals such as electroencephalograph (EEG) and magnetoencephalograph (MEG). The fundamental assumption in the BSS problem is that the source signals are statistically independent. Independent component analysis (ICA) is the method to find statistically independent sources from mixtures of sources by utilizing higher order statistics [1]–[3]. In its simplest form, the ICA model assumes linear instantaneous mixing without sensor noise and the number of sources being equal to the number of sensors. When trying to solve the problem of separating source signals mixed in a real environment, those assumptions are not valid and model extensions are required. In those cases, ob- served signals are not instantaneous mixtures of sources, but convolutive mixtures, which mean that they are mixed with time delays and convolutions. In order to deal with convolved mix- tures, the ICA model formulation and the learning algorithm have been extended to convolutive mixtures in both the time and the frequency domains [4]–[9]. Those models are known as solutions to the multichannel blind deconvolution problem. In the case of the time domain approach, solutions usually re- quire intensive computations with long de-reverberation filters, and the resulting unmixed source signals are whitened due to the independent and identically distributed (i.i.d.) assumption [5]. Slow convergence speed especially for colored signals has been observed. The computational load and slow convergence can be overcome by the frequency domain approach, in which multiplication at each frequency bin replaces convolution oper- ation in the time domain. Thus, one can apply the ICA algorithm to instantaneous mixtures in each frequency bin. Although this may be attractive, the main problem then is the permutation of the ICA solutions over different frequency bins due to the in- determinacy of permutation inherent in the ICA algorithm. One should correct the permutations of separating matrices at each frequency so that the separated signal in the time domain is re- constructed properly. Various approaches have been proposed to solve the permu- tation problem. A popular approach is to impose a smoothness constraint of the source that translates into smoothing the sep- arating filter. This approach has been realized by several tech- niques such as averaging separating matrices with adjacent fre- quencies [9], limiting the filter length in the time domain [10], or considering the coherency of separating matrices at adjacent frequencies [11]. Another related approach is based on direction of arrival (DOA) estimation which is much used in array signal processing. By analyzing the directivity patterns formed by a 1558-7916/$20.00 © 2006 IEEE