STEREO ACOUSTICAL ECHO CANCELLATION BASED ON COMMON POLES
Gabriele Bunkheila, Michele Scarpiniti, Raffaele Parisi and Aurelio Uncini
Infocom Dpt. - Dipartimento di Scienza e Tecnica dell’Informazione e della Comunicazione
Universit` a di Roma “La Sapienza” - via Eudossiana, 18 - 00184 - Rome, Italy.
ABSTRACT
Stereo acoustical echo cancellation is a highly challenging
application in the field of acoustical signal processing. Un-
like the single-channel case, conflicting requirements on the
adaptive filters make this problem ill-posed in its original for-
mulation. In this contribution, it is shown how introducing
an estimate of the Common Acoustical Poles of the receiving
room can lead the adaptive system to remarkable performance
improvements with respect to the classical implementation.
Finally, a detailed comparison between the two schemes is
presented, based on a simulated acoustical environment and
the use of the classical affine projection algorithm.
Index Terms— Acoustic signal processing, Echo sup-
pression, Common poles, ARMA models, Adaptive filtering.
1. INTRODUCTION
Stereo Acoustical Echo Cancellation (SAEC) is a core issue
for hands-free teleconferencing system using a pair of audio
channels [1] and a representative application in the more gen-
eral area of supervised multichannel adaptive estimation of
Room Transfer Functions (RTFs). In the single-channel case,
the length of the adaptive filter and the peakiness of the spec-
trum of the input signal are the only important critical factors
undermining the performance of the system [2]. When SAEC
is considered, more critical effects have negative impact on
the adaptive behavior of the filter, by worsening the condi-
tioning of the stereo autocorrelation matrix; these include the
mutual statistical dependence of the two input channels, hav-
ing a direct impact on the autocorrelation matrix and also im-
pacting the so-called “misalignment effect” [1]. As will be
reviewed, unlike the single-channel case these issue are due
to the fact that SAEC is intrinsically ill-posed, and are hardly
overcome by traditional methods. A novel SAEC architecture
based on Common Acoustical Poles (CAP) [3] is proposed,
which implicitly models the RTFs with Auto-Regressive and
Moving-Average (ARMA) filters sharing the same z-domain
denominator; the latter is estimated priorly and its roots (i.e.
the CAP) are supposed to account for the general resonant
This work has been partially supported by the Italian National Project:
Wireless multiplatfOrm mimo active access netwoRks for QoS-demanding
muLtimedia Delivery (WORLD), under grant number 2007R989S.
properties of the room itself. While potentially employing the
same adaptive algorithms, this approach can introduce impor-
tant simplifications with respect to the traditional methods,
since less MA coefficients need to be estimated in real time to
attain a given degree of accuracy. Order selection is believed
to be a topic problem for CAP estimation, hence a recently
proposed method [4] was used to identify the correct order
of the priorly-estimated common denominator. The proposed
architecture is compared to the classical SAEC formulation,
with the use of the Affine Projection Algorithm (APA) [5].
This yields an important reduction of the length of the adap-
tive filters, which in turns allows lowering the condition num-
ber of the stereo autocorrelation matrix, so improving the con-
vergence rate.
The paper is organized as follows. Section 2.1 briefly re-
calls the main issues related to the classical formulation of
SAEC; section 2.2 gives an overview of RTF modeling based
on CAP and their estimation. In section 3, the proposed archi-
tecture is presented, while section 4 describes the simulation
set-up. The results are dealt with in section 5, followed by a
short conclusion.
2. BACKGROUND
2.1. The classical Stereo Acoustical Echo Canceler
A typical stereo echo canceler is depicted in Figure 1. With-
out lack of generality, just one microphone is considered in
the following for the receiving room. Let g
i
(L
G
× 1) be the
vector of the significant coefficients of the impulse response
linking the speaker to the i-th microphone in the transmission
room, while h
i
(L
H
× 1) similarly relates the i-th loudspeaker
to the considered microphone in the receiving room. Finally,
call
ˆ
h
i
(L
FIR
× 1) the vector relative to the corresponding
i-th adaptive FIR filter
1
. The goal is that
ˆ
h
1
and
ˆ
h
2
tend
adaptively to h
1
and h
2
, respectively, based on some statis-
tical minimization of the norm of the error signal, e.g. ‖e‖
2
2
,
so that the speaker in the transmission room is not reached by
the echo of his voice through the backwards channel.
1
Here and in the following, the hat symbol denotes an estimate of the
physical counterpart.
978-1-4244-3298-1/09/$25.00 ©2009 IEEE DSP 2009