NEW INSIGHTS INTO NON-CAUSAL MULTICHANNEL
LINEAR FILTERING FOR NOISE REDUCTION
Mehrez Souden, Jacob Benesty, and Sofi` ene Affes
INRS-
´
EMT, 800, de la Gaucheti` ere Ouest, Suite 6900, Montr´ eal, H5A 1K6, Qc, Canada.
{souden,benesty,affes}@emt.inrs.ca
ABSTRACT
We investigate a general framework for noise reduction which con-
sists in controlling the level of signal distortion while reducing the
level of noise. A parameterized non-causal filter that allows for tun-
ing the signal distortion and noise reduction inversely is obtained
and is referred to as parameterized multichannel non-causal Wiener
filter (PMWF) herein. The same optimization problem leads to the
minimum variance distortionless response (MVDR) as a particular
case of the PMWF. In contrast to earlier works, the proposed expres-
sions of the PMWF and MVDR are simplified and require the knowl-
edge of the speech and noise statistics only. To rigorously quantify
the gains and losses when using these filters, we establish simpli-
fied closed-form expressions for three measures, namely, the signal
distortion index, the noise reduction factor, and the output signal-to-
noise ratio (SNR), and highlight the tradeoff between noise reduction
and speech distortion in the multichannel case.
Index Terms— Multichannel noise reduction, Wiener filter, min-
imum variance distortionless response, speech distortion.
1. INTRODUCTION
Speech signals perceived by communication devices are generally
corrupted by background noise or interference from other competing
sources. To cope with this issue, several noise reduction approaches
have been developed so far including [1]-[12]. In contrast to the
single-channel based techniques, microphone-array based process-
ing is promising since it takes advantage of the spatial aperture in
addition to the classical frequency and time dimensions.
When compared to their time-domain counterparts, frequency-
domain approaches for noise reduction are generally preferred be-
cause each frequency bin can be processed apart from the others.
This allows for easier calculations and interesting relationships can
be found. The well known multichannel Wiener filter is optimal
in the mean-square error sense. However, it can introduce unde-
sirable distortions to the speech [1]. Parameterized multichannel
filtering allows for tuning the signal distortion and noise reduc-
tion [1, 2, 3] while forcing a distortionless response when reducing
the noise power leads to the MVDR filter [2, 4, 5]. In [6, 7], the
parametrization of the adaptive noise canceler in the standard Grif-
fith and Jim generalized sidelobe canceler [8] has been shown to
reduce the speech distortions by controlling the signal leakage due
to the system model errors (microphones mismatch, spatial aliasing,
reverberation, etc). In [9], a general cost function combining system
model prior and estimated model terms was considered. However,
system-model-based prior is known to deteriorate the performance
of the filters in the presence of system model errors.
In this paper, we focus on a general framework that does not re-
quire any preprocessing and consists in minimizing the power of the
noise captured by the microphones and filtered by the filter of inter-
est while controlling the desired signals distortion which is defined
as the dissimilarity between one noise-free reference microphone
signal and the overall filtered noise-free microphone signals. This
approach, albeit essentially equivalent to the traditional way of re-
ducing the signal distortion subject to some constraint on the output
noise, is more intuitive in the sense that it allows to see the connec-
tion with the MVDR. By doing so, we develop a new simplified ex-
pression for the PMWF that depends on the noise and speech statis-
tics only. The second contribution of this work consists in analyti-
cally investigating the tradeoff between noise reduction and speech
distortion in the multichannel case and studying the effect of some
key parameters, namely, the input SNR and number of microphones
on the performance of these filters.
1.1. Data Model
We consider the following frequency-domain representation of the
data model [2]:
Yn(jω)= Gn(jω)S(jω)+ Vn(jω)= Xn(jω)+ Vn(jω), (1)
where Yn(jω),Gn(jω),S(jω), and Vn(jω) are the discrete-time
Fourier transforms (DTFT’s) of the nth microphone output, the
channel impulse response between the source and the nth micro-
phone, the desired speech signal, and the additive noise, respectively.
Our aim is to reduce the noise and recover one of the sig-
nal components, say Xn
0
(jω), n0 ∈ {1, ..., N }, the best way
we can (along some criteria to be defined later) by applying a
linear filter hn
0
(jω) to the overall observation vector y(jω)=
[Y1(jω) Y2(jω) ··· YN (jω)]
T
. The output of this filter is:
Z(jω)= h
H
n
0
(jω)y(jω)= h
H
n
0
(jω)x(jω)
Dn
0
(jω)
+ h
H
n
0
(jω)v(jω)
νn
0
(jω)
, (2)
where x(jω) and v(jω) are defined like y(jω). Dn
0
(jω)
and νn
0
(jω) are the speech and noise components at the out-
put of hn
0
(jω), respectively. We also define g(jω) =
[G1(jω) G2(jω) ··· GN (jω)]
T
.
1.2. Definitions
We use the definitions given in [2]. For completeness, we specify
some of them here. First, we define the power spectrum density
(PSD) matrix of a vector a(jω) as Φaa(jω)= E a(jω)a
H
(jω) .
Since we are taking the n0th noise-free microphone signal as a ref-
erence, we define the local input SNR as SNR(ω)=
φ
xn
0
xn
0
(ω)
φ
vn
0
vn
0
(ω)
,
where φaa(ω)= E |A(jω)|
2
is the PSD of a(t) [having A(jω)
as DTFT]. Recall that our aim is to have an optimal estimate of
Xn
0
(jω) at the output of the linear filter hn
0
(jω). Hence, we
define the error signals Ex,n
0
(jω)= Xn
0
(jω) − Dn
0
(jω) and
Ev,n
0
(jω)= νn
0
(jω). We obtain:
Ex,n
0
(jω) = [un
0
− hn
0
(jω)]
H
x(jω), (3)
Ev,n
0
(jω) = h
H
n
0
(jω)v(jω), (4)
141 978-1-4244-2354-5/09/$25.00 ©2009 IEEE ICASSP 2009