NEW INSIGHTS INTO NON-CAUSAL MULTICHANNEL LINEAR FILTERING FOR NOISE REDUCTION Mehrez Souden, Jacob Benesty, and Sofi` ene Affes INRS- ´ EMT, 800, de la Gaucheti` ere Ouest, Suite 6900, Montr´ eal, H5A 1K6, Qc, Canada. {souden,benesty,affes}@emt.inrs.ca ABSTRACT We investigate a general framework for noise reduction which con- sists in controlling the level of signal distortion while reducing the level of noise. A parameterized non-causal filter that allows for tun- ing the signal distortion and noise reduction inversely is obtained and is referred to as parameterized multichannel non-causal Wiener filter (PMWF) herein. The same optimization problem leads to the minimum variance distortionless response (MVDR) as a particular case of the PMWF. In contrast to earlier works, the proposed expres- sions of the PMWF and MVDR are simplified and require the knowl- edge of the speech and noise statistics only. To rigorously quantify the gains and losses when using these filters, we establish simpli- fied closed-form expressions for three measures, namely, the signal distortion index, the noise reduction factor, and the output signal-to- noise ratio (SNR), and highlight the tradeoff between noise reduction and speech distortion in the multichannel case. Index TermsMultichannel noise reduction, Wiener filter, min- imum variance distortionless response, speech distortion. 1. INTRODUCTION Speech signals perceived by communication devices are generally corrupted by background noise or interference from other competing sources. To cope with this issue, several noise reduction approaches have been developed so far including [1]-[12]. In contrast to the single-channel based techniques, microphone-array based process- ing is promising since it takes advantage of the spatial aperture in addition to the classical frequency and time dimensions. When compared to their time-domain counterparts, frequency- domain approaches for noise reduction are generally preferred be- cause each frequency bin can be processed apart from the others. This allows for easier calculations and interesting relationships can be found. The well known multichannel Wiener filter is optimal in the mean-square error sense. However, it can introduce unde- sirable distortions to the speech [1]. Parameterized multichannel filtering allows for tuning the signal distortion and noise reduc- tion [1, 2, 3] while forcing a distortionless response when reducing the noise power leads to the MVDR filter [2, 4, 5]. In [6, 7], the parametrization of the adaptive noise canceler in the standard Grif- fith and Jim generalized sidelobe canceler [8] has been shown to reduce the speech distortions by controlling the signal leakage due to the system model errors (microphones mismatch, spatial aliasing, reverberation, etc). In [9], a general cost function combining system model prior and estimated model terms was considered. However, system-model-based prior is known to deteriorate the performance of the filters in the presence of system model errors. In this paper, we focus on a general framework that does not re- quire any preprocessing and consists in minimizing the power of the noise captured by the microphones and filtered by the filter of inter- est while controlling the desired signals distortion which is defined as the dissimilarity between one noise-free reference microphone signal and the overall filtered noise-free microphone signals. This approach, albeit essentially equivalent to the traditional way of re- ducing the signal distortion subject to some constraint on the output noise, is more intuitive in the sense that it allows to see the connec- tion with the MVDR. By doing so, we develop a new simplified ex- pression for the PMWF that depends on the noise and speech statis- tics only. The second contribution of this work consists in analyti- cally investigating the tradeoff between noise reduction and speech distortion in the multichannel case and studying the effect of some key parameters, namely, the input SNR and number of microphones on the performance of these filters. 1.1. Data Model We consider the following frequency-domain representation of the data model [2]: Yn()= Gn()S()+ Vn()= Xn()+ Vn(), (1) where Yn(),Gn(),S(), and Vn() are the discrete-time Fourier transforms (DTFT’s) of the nth microphone output, the channel impulse response between the source and the nth micro- phone, the desired speech signal, and the additive noise, respectively. Our aim is to reduce the noise and recover one of the sig- nal components, say Xn 0 (), n0 {1, ..., N }, the best way we can (along some criteria to be defined later) by applying a linear filter hn 0 () to the overall observation vector y()= [Y1() Y2() ··· YN ()] T . The output of this filter is: Z()= h H n 0 ()y()= h H n 0 ()x() Dn 0 () + h H n 0 ()v() νn 0 () , (2) where x() and v() are defined like y(). Dn 0 () and νn 0 () are the speech and noise components at the out- put of hn 0 (), respectively. We also define g() = [G1() G2() ··· GN ()] T . 1.2. Definitions We use the definitions given in [2]. For completeness, we specify some of them here. First, we define the power spectrum density (PSD) matrix of a vector a() as Φaa()= E a()a H () . Since we are taking the n0th noise-free microphone signal as a ref- erence, we define the local input SNR as SNR(ω)= φ xn 0 xn 0 (ω) φ vn 0 vn 0 (ω) , where φaa(ω)= E |A()| 2 is the PSD of a(t) [having A() as DTFT]. Recall that our aim is to have an optimal estimate of Xn 0 () at the output of the linear filter hn 0 (). Hence, we define the error signals Ex,n 0 ()= Xn 0 () Dn 0 () and Ev,n 0 ()= νn 0 (). We obtain: Ex,n 0 () = [un 0 hn 0 ()] H x(), (3) Ev,n 0 () = h H n 0 ()v(), (4) 141 978-1-4244-2354-5/09/$25.00 ©2009 IEEE ICASSP 2009