STEREO ACOUSTICAL ECHO CANCELLATION BASED ON COMMON POLES Gabriele Bunkheila, Michele Scarpiniti, Raffaele Parisi and Aurelio Uncini Infocom Dpt. - Dipartimento di Scienza e Tecnica dell’Informazione e della Comunicazione Universit` a di Roma “La Sapienza” - via Eudossiana, 18 - 00184 - Rome, Italy. ABSTRACT Stereo acoustical echo cancellation is a highly challenging application in the ﬁeld of acoustical signal processing. Un- like the single-channel case, conﬂicting requirements on the adaptive ﬁlters make this problem ill-posed in its original for- mulation. In this contribution, it is shown how introducing an estimate of the Common Acoustical Poles of the receiving room can lead the adaptive system to remarkable performance improvements with respect to the classical implementation. Finally, a detailed comparison between the two schemes is presented, based on a simulated acoustical environment and the use of the classical afﬁne projection algorithm. Index Terms— Acoustic signal processing, Echo sup- pression, Common poles, ARMA models, Adaptive ﬁltering. 1. INTRODUCTION Stereo Acoustical Echo Cancellation (SAEC) is a core issue for hands-free teleconferencing system using a pair of audio channels [1] and a representative application in the more gen- eral area of supervised multichannel adaptive estimation of Room Transfer Functions (RTFs). In the single-channel case, the length of the adaptive ﬁlter and the peakiness of the spec- trum of the input signal are the only important critical factors undermining the performance of the system [2]. When SAEC is considered, more critical effects have negative impact on the adaptive behavior of the ﬁlter, by worsening the condi- tioning of the stereo autocorrelation matrix; these include the mutual statistical dependence of the two input channels, hav- ing a direct impact on the autocorrelation matrix and also im- pacting the so-called “misalignment effect” [1]. As will be reviewed, unlike the single-channel case these issue are due to the fact that SAEC is intrinsically ill-posed, and are hardly overcome by traditional methods. A novel SAEC architecture based on Common Acoustical Poles (CAP) [3] is proposed, which implicitly models the RTFs with Auto-Regressive and Moving-Average (ARMA) ﬁlters sharing the same z-domain denominator; the latter is estimated priorly and its roots (i.e. the CAP) are supposed to account for the general resonant This work has been partially supported by the Italian National Project: Wireless multiplatfOrm mimo active access netwoRks for QoS-demanding muLtimedia Delivery (WORLD), under grant number 2007R989S. properties of the room itself. While potentially employing the same adaptive algorithms, this approach can introduce impor- tant simpliﬁcations with respect to the traditional methods, since less MA coefﬁcients need to be estimated in real time to attain a given degree of accuracy. Order selection is believed to be a topic problem for CAP estimation, hence a recently proposed method [4] was used to identify the correct order of the priorly-estimated common denominator. The proposed architecture is compared to the classical SAEC formulation, with the use of the Afﬁne Projection Algorithm (APA) [5]. This yields an important reduction of the length of the adap- tive ﬁlters, which in turns allows lowering the condition num- ber of the stereo autocorrelation matrix, so improving the con- vergence rate. The paper is organized as follows. Section 2.1 brieﬂy re- calls the main issues related to the classical formulation of SAEC; section 2.2 gives an overview of RTF modeling based on CAP and their estimation. In section 3, the proposed archi- tecture is presented, while section 4 describes the simulation set-up. The results are dealt with in section 5, followed by a short conclusion. 2. BACKGROUND 2.1. The classical Stereo Acoustical Echo Canceler A typical stereo echo canceler is depicted in Figure 1. With- out lack of generality, just one microphone is considered in the following for the receiving room. Let g i (L G × 1) be the vector of the signiﬁcant coefﬁcients of the impulse response linking the speaker to the i-th microphone in the transmission room, while h i (L H × 1) similarly relates the i-th loudspeaker to the considered microphone in the receiving room. Finally, call ˆ h i (L FIR × 1) the vector relative to the corresponding i-th adaptive FIR ﬁlter 1 . The goal is that ˆ h 1 and ˆ h 2 tend adaptively to h 1 and h 2 , respectively, based on some statis- tical minimization of the norm of the error signal, e.g. ‖e‖ 2 2 , so that the speaker in the transmission room is not reached by the echo of his voice through the backwards channel. 1 Here and in the following, the hat symbol denotes an estimate of the physical counterpart. 978-1-4244-3298-1/09/$25.00 ©2009 IEEE DSP 2009