NEW INSIGHTS INTO NON-CAUSAL MULTICHANNEL LINEAR FILTERING FOR NOISE REDUCTION Mehrez Souden, Jacob Benesty, and Soﬁ` ene Affes INRS- ´ EMT, 800, de la Gaucheti` ere Ouest, Suite 6900, Montr´ eal, H5A 1K6, Qc, Canada. {souden,benesty,affes}@emt.inrs.ca ABSTRACT We investigate a general framework for noise reduction which con- sists in controlling the level of signal distortion while reducing the level of noise. A parameterized non-causal ﬁlter that allows for tun- ing the signal distortion and noise reduction inversely is obtained and is referred to as parameterized multichannel non-causal Wiener ﬁlter (PMWF) herein. The same optimization problem leads to the minimum variance distortionless response (MVDR) as a particular case of the PMWF. In contrast to earlier works, the proposed expres- sions of the PMWF and MVDR are simpliﬁed and require the knowl- edge of the speech and noise statistics only. To rigorously quantify the gains and losses when using these ﬁlters, we establish simpli- ﬁed closed-form expressions for three measures, namely, the signal distortion index, the noise reduction factor, and the output signal-to- noise ratio (SNR), and highlight the tradeoff between noise reduction and speech distortion in the multichannel case. Index Terms— Multichannel noise reduction, Wiener ﬁlter, min- imum variance distortionless response, speech distortion. 1. INTRODUCTION Speech signals perceived by communication devices are generally corrupted by background noise or interference from other competing sources. To cope with this issue, several noise reduction approaches have been developed so far including [1]-[12]. In contrast to the single-channel based techniques, microphone-array based process- ing is promising since it takes advantage of the spatial aperture in addition to the classical frequency and time dimensions. When compared to their time-domain counterparts, frequency- domain approaches for noise reduction are generally preferred be- cause each frequency bin can be processed apart from the others. This allows for easier calculations and interesting relationships can be found. The well known multichannel Wiener ﬁlter is optimal in the mean-square error sense. However, it can introduce unde- sirable distortions to the speech [1]. Parameterized multichannel ﬁltering allows for tuning the signal distortion and noise reduc- tion [1, 2, 3] while forcing a distortionless response when reducing the noise power leads to the MVDR ﬁlter [2, 4, 5]. In [6, 7], the parametrization of the adaptive noise canceler in the standard Grif- ﬁth and Jim generalized sidelobe canceler [8] has been shown to reduce the speech distortions by controlling the signal leakage due to the system model errors (microphones mismatch, spatial aliasing, reverberation, etc). In [9], a general cost function combining system model prior and estimated model terms was considered. However, system-model-based prior is known to deteriorate the performance of the ﬁlters in the presence of system model errors. In this paper, we focus on a general framework that does not re- quire any preprocessing and consists in minimizing the power of the noise captured by the microphones and ﬁltered by the ﬁlter of inter- est while controlling the desired signals distortion which is deﬁned as the dissimilarity between one noise-free reference microphone signal and the overall ﬁltered noise-free microphone signals. This approach, albeit essentially equivalent to the traditional way of re- ducing the signal distortion subject to some constraint on the output noise, is more intuitive in the sense that it allows to see the connec- tion with the MVDR. By doing so, we develop a new simpliﬁed ex- pression for the PMWF that depends on the noise and speech statis- tics only. The second contribution of this work consists in analyti- cally investigating the tradeoff between noise reduction and speech distortion in the multichannel case and studying the effect of some key parameters, namely, the input SNR and number of microphones on the performance of these ﬁlters. 1.1. Data Model We consider the following frequency-domain representation of the data model [2]: Yn(jω)= Gn(jω)S(jω)+ Vn(jω)= Xn(jω)+ Vn(jω), (1) where Yn(jω),Gn(jω),S(jω), and Vn(jω) are the discrete-time Fourier transforms (DTFT’s) of the nth microphone output, the channel impulse response between the source and the nth micro- phone, the desired speech signal, and the additive noise, respectively. Our aim is to reduce the noise and recover one of the sig- nal components, say Xn 0 (jω), n0 ∈ {1, ..., N }, the best way we can (along some criteria to be deﬁned later) by applying a linear ﬁlter hn 0 (jω) to the overall observation vector y(jω)= [Y1(jω) Y2(jω) ··· YN (jω)] T . The output of this ﬁlter is: Z(jω)= h H n 0 (jω)y(jω)= h H n 0 (jω)x(jω) Dn 0 (jω) + h H n 0 (jω)v(jω) νn 0 (jω) , (2) where x(jω) and v(jω) are deﬁned like y(jω). Dn 0 (jω) and νn 0 (jω) are the speech and noise components at the out- put of hn 0 (jω), respectively. We also deﬁne g(jω) = [G1(jω) G2(jω) ··· GN (jω)] T . 1.2. Deﬁnitions We use the deﬁnitions given in [2]. For completeness, we specify some of them here. First, we deﬁne the power spectrum density (PSD) matrix of a vector a(jω) as Φaa(jω)= E a(jω)a H (jω) . Since we are taking the n0th noise-free microphone signal as a ref- erence, we deﬁne the local input SNR as SNR(ω)= φ xn 0 xn 0 (ω) φ vn 0 vn 0 (ω) , where φaa(ω)= E |A(jω)| 2 is the PSD of a(t) [having A(jω) as DTFT]. Recall that our aim is to have an optimal estimate of Xn 0 (jω) at the output of the linear ﬁlter hn 0 (jω). Hence, we deﬁne the error signals Ex,n 0 (jω)= Xn 0 (jω) − Dn 0 (jω) and Ev,n 0 (jω)= νn 0 (jω). We obtain: Ex,n 0 (jω) = [un 0 − hn 0 (jω)] H x(jω), (3) Ev,n 0 (jω) = h H n 0 (jω)v(jω), (4) 141 978-1-4244-2354-5/09/$25.00 ©2009 IEEE ICASSP 2009