IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 8, AUGUST 2002 1855
A Frequency Domain Blind Signal Separation
Method Based on Decorrelation
Daniël W. E. Schobben and Piet C. W. Sommen, Member, IEEE
Abstract—This paper addresses the issue of separating multiple
speakers from mixtures of these that are obtained using multiple
microphones in a room. An adaptive blind signal separation algo-
rithm, which is entirely based on second-order statistics, is derived.
One of the advantages of this algorithm is that no parameters need
to be tuned. Moreover, an extension of the algorithm that can si-
multaneously deal with blind signal separation and echo cancella-
tion is derived. Experiments with real recordings have been carried
out, showing the effectiveness of the algorithm for real-world sig-
nals.
Index Terms—Audio applications, blind signal separation, echo
cancellation, second-order statistics.
I. INTRODUCTION
H
UMANS can focus their attention on any one sound
source out of a mixture. This was termed the “cocktail
party effect” by Cherry [1]. This ability is due to the relations
between the signals that are picked up by the left and the
right ear, e.g., interaural differences in time and intensity. It is
difficult, however, to understand speech that is recorded at a
cocktail party with only one microphone, even for people with
perfect hearing capabilities. People with hearing impairments
readily have these problems when present at a cocktail party.
Current audio systems cannot discern one sound from another
like humans can.
Blind signal separation (BSS) deals with the problem of
recovering independent signals using only observed mixtures
of these. These techniques are termed blind as the acoustic
transfer functions from the sources to the microphones are un-
known, and there are no reference signals against which the
recovered source signals can be compared. For acoustic appli-
cations a convolutive separation algorithm is required, i.e., the
separation consists of employing multichannel finite impulse
response (MC-FIR) filtering to these signals. BSS algorithms
have been successful in separating nonconvolutive mixtures of
nonreal-world signals for over a decade. Successful BSS of
nonconvolutive mixed, delayed mixed, and synthetical convolu-
tive mixed audio signals were reported in [2]–[4], respectively.
Only after 1995 were successful experiments reported with the
comprehensive problem of separating signals that are recorded
Manuscript received August 27, 1999; revised April 30, 2002. The associate
editor coordinating the review of this paper and approving it for publication was
Prof. Dr. Ir. Bart L. R. De Moor.
D. W. E. Schobben was with the Technische Universiteit Eindhoven, Eind-
hoven, The Netherlands. He is now with the Philips Research Laboratories,
Eindhoven, The Netherlands (e-mail: Daniel.Schobben@Philips.com).
P. C. W. Sommen is with the Technische Universiteit Eindhoven, Eindhoven,
The Netherlands (e-mail: P.C.W.Sommen@tue.nl).
Publisher Item Identifier 10.1109/TSP.2002.800417.
using microphones in a real-world environment [5]–[13]. Most
of these experiments are done in controlled laboratory environ-
ments and require improvements to be successful in commercial
applications. These techniques do not facilitate acoustic echo
cancellation which is also required in audio applications. This
paper presents an algorithm that requires no parameter tuning
and is suitable for joint BSS and acoustic echo cancellation. An
overview of the early work of nonconvolutive and convolutive
BSS can be found in [14] and [15], respectively. An overview
of BSS approaches that are promising for audio applications
is found in [16]–[18].
The BSS algorithm that is presented in this paper is based on
output decorrelation. Molgedey and Schuster minimized cross-
correlations for two different time lags to achieve signal sepa-
ration in the nonconvolutive case [19]. Multiple time lags are
used for convolutive signal separation by other researchers [4],
[8], [9], [11]–[13]. Output decorrelation uses only second-order
statistics, which theoretically limits its separation capabilities
as compared with higher order statistics. For real-world signals,
second-order statistics can be sufficient, however, to achieve
BSS [19]–[21]. Second-order statistics have the advantage that
they can be estimated more reliably using less computational
power than higher order statistics. In addition, HOS algorithms
contain nonlinear elements that need to be tuned to the data to
obtain a good performance.
When applying filters to separate (or unmix) the sources, the
cross-correlations of the outputs can be expressed in terms of
the cross-correlations of the observed signals and the known un-
mixing system [12]. A cost function that is composed of these
cross-correlations of the observations and the unmixing filters
can be minimized using a gradient search. However, this is a
difficult task as the unmixing filters typically have thousands
of coefficients that need to be estimated in audio signal separa-
tion algorithms from a cost function that is a nonlinear function
of these coefficients. Therefore, frequency-domain approaches
have been used recently in which the signal separation is done
more or less independent for each frequency [8], [9], [13].
After a description of the notations used and the assumptions
in Section II, the remainder of this paper is organized as fol-
lows. In Section III, the optimization criterion is described that
will be minimized by the BSS algorithm. The optimization is
done by minimizing the cross-correlations among the outputs
of the MC-FIR separating filter. To achieve a computationally
inexpensive algorithm with fast convergence, this criterion is
transformed to the frequency domain in Section IV. First, in
Section IV-A, the filter coefficients are expressed in the fre-
quency domain such that the cross-correlations become zero.
At this point, no restrictions are imposed to ensure that the filter
1053-587X/02$17.00 © 2002 IEEE