DOES DEREVERBERATION HELP MULTICHANNELSOURCE SEPARATION? A CASE STUDY Nicol´ as L´ opez 1,2 , Mounira Maazaoui 1 , Yves Grenier 1 , Ga¨ el Richard 1 and Ivan Bourmeyster 2 1 Institut Mines-T´ el´ ecom - T´ el´ ecom ParisTech - CNRS/LTCI - 37/39 rue Dareau, 75014 Paris, France 2 Arkamys - 31 rue Pouchet, 75017 Paris, France ABSTRACT Multichannel blind source separation performances rapidly degrade when the mixtures are highly reverberated. In fact, blind source separation algorithms usually focus on the sepa- ration task without dealing with the dereverberation problem. Some recent studies attempted to reduce the reverberation by introducing a dereverberation module before or after the blind source separation but only limited success was obtained in improving the separation performance in highly reverberant rooms. In this article, we conduct a number of experiments combining state of the art spectral enhancement-based dere- verberation and source separation algorithms showing that, in this particular case, speech enhancement does not improve the performance of blind source separation. Index Terms— Blind source separation, speech derever- beration, spectral subtraction, microphone array. 1. INTRODUCTION In a multichannel acoustic scene analysis context, one impor- tant task is to separate different audio sources that are active simultaneously. This is the case for example in robot audi- tion where the robot equipped with a microphone array must separate the speech signal from several competing talkers, so that it can recognize a given sentence. In this context, blind source separation (BSS) techniques use the multichannel in- formation received at the sensors to recover in separate chan- nels the acoustic events related to a given source. A common approach for BSS is to assume an instantaneous mixture of in- dependent and equally distributed sources. When these con- ditions are actually met, which is equivalent to consider that sources are propagated in an anechoic environment, methods like independent component analysis give satisfying separa- tion results. In practice, instantaneous blind source separation tech- niques are known to fail in reverberant rooms [1], where the mixtures become convolutive. State of the art methods deal with this limitation by working in the frequency domain so that the convolutive mixture can be approximated with an instantaneous one. Methods based on independent compo- nent analysis, Non Negative Matrix Factorization and sparse optimization have shown satisfying separation results when the reverberation is low or moderate. However, when the room is highly reverberant the separation performances de- grade dramatically. This is because longer Room Impulse Responses (RIR) require longer analysis windows to span all the convolution effects in the frequency domain. But by using longer windows the assumption of independence between the sources does not hold anymore. The separation performances are then bounded by the trade-off between the independence of the sources and the length of the convolutive ﬁlter. In a recent work, Maazaoui et al. used beamforming methods as a preprocessing step for BSS [2]. By focalizing the directiv- ity of the sensor array towards the sources, the reverberation from the jammer direction is attenuated and as a consequence the separation performances are improved. Speech dereverberation (SD) techniques have been largely studied in recent years, leading to better reverberation reduc- tion than beamforming techniques [3]. One should then expect to improve the separation performance by previously applying some SD processing to the mixture. In [4], Yoshioka et. al proposed to use a multichannel SD algorithm based on linear prediction as a preprocessing step for BSS. The SD and BSS ﬁlters were jointly optimized leading to signiﬁcantly better separation results in rooms with reverberation times of 0.3 and 0.5 seconds. Similar results were observed with the multichannel approach proposed in [5]. In this paper we investigate the inﬂuence of single chan- nel spectral enhancement-based dereverberation as a prepro- cessing step for multichannel BSS for a large range of rever- beration times. We considerer a simple framework: single channel SD is applied to every channel and the dereverber- ated mixtures are separated by multichannel BSS. By using a single channel approach for SD we propose a system that is independent of the geometry of the microphone array. It also allows to parallelize the SD task, allowing for faster process- ing in real-time applications. We use state-of-the-art methods for this study. SD is performed with the method proposed by Habets et al. in [6] and BSS with method by Maazaoui et al. in [2]. We show that, in this particular conﬁguration, reduc- ing the reverberation does not lead to an improvement on the separation performances. This paper is organized as follows: in Section 2 we brieﬂy EUSIPCO 2013 1569743603 1