2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics October 16-19, 2011, New Paltz, NY DECORRELATION BY RESAMPLING IN FREQUENCY DOMAIN FOR MULTI-CHANNEL ACOUSTIC ECHO CANCELLATION BASED ON RESIDUAL ECHO ENHANCEMENT Ted S. Wada, Jason Wung, Biing-Hwang (Fred) Juang Center for Signal and Image Processing, Georgia Institute of Technology, Atlanta, GA 30332 (twada,jason.wung,juang)@ece.gatech.edu ABSTRACT An inter-channel decorrelation procedure via resampling in the frequency domain for multi-channel acoustic echo cancellation (MCAEC) based on residual echo enhancement is proposed. The objective is to efficiently alleviate the non-uniqueness problem while introducing minimal distortion to the audio quality and the signal statistics. The effectiveness is illustrated with respect to the standard approach of using a memoryless nonlinearity or additive noise. A combination of the new decorrelation procedure and the residual echo enhancement technique points towards a computa- tionally feasible yet very robust frequency-domain MCAEC system. Index Terms— multi-channel acoustic echo cancellation, inter- channel decorrelation, frequency-domain resampling, residual echo enhancement 1. INTRODUCTION The non-uniqueness problem arises during multi-channel acoustic echo cancellation (MCAEC) due to highly correlated reference sig- nals (i.e., far-end microphone signals) that degrades the conver- gence rate of the least mean square (LMS) algorithm [1]. The MCAEC solution is also dependent on the far-end room impulse re- sponse and must reconverge, for example, when the far-end speech activities change. Applying a decorrelation procedure before near- end playback should improve the echo path tracking with, hope- fully, a minimal side effect on both the audio quality and the original signal statistics, altercation of which can actually reduce the initial and the steady-state cancellation performances [2]. We proposed in [2] a new decorrelation procedure based on resampling that introduces time-varying signal delay block-wise across channels with negligible audible distortion and works well for stereophonic AEC (SAEC) when combined with the resid- ual echo, or error, enhancement (REE) technique. With so-called “block-iterative” adaptation [3], or batch-wise adaptation in gen- eral, REE permits continuous noise-robust adaptation even during double-talk [3, 4] and also the recovery of lost cancellation perfor- mance caused by the non-uniqueness problem [2]. The decorrela- tion procedure follows the stereo projection technique [5] that pro- vides better echo path tracking by the projection-type adaptive al- gorithms when inherently time-varying correlation of the reference signals is emphasized. Furthermore, resampling by interpolation was utilized in [2] since the conventional upsampling followed by downsampling may be computationally infeasible for real-time ap- plications when the desired sampling rate change is very small (on the order of 0.01%). We will further examine the resampling technique in this paper. First, a resampling procedure in the frequency domain that takes advantage of the efficiency of Fast Fourier Transform (FFT) will be proposed. Second, coherence before and after the decorrelation is measured and compared against the standard applications of the 0 0.5 1 0 1 x 10 -4 Channel 1 delay (s) 0 0.5 1 0 1 x 10 -4 Channel 2 delay (s) time (s) (a) Previous approach [2]. 0 0.5 1 0 1 x 10 -4 Channel 1 delay (s) 0 0.5 1 0 1 x 10 -4 Channel 2 delay (s) time (s) (b) Modified approach. Figure 1: Signal delay is linearly varied after resampling block- wise (a) alternatingly across channels and (b) simultaneously across channels where every other block is resampled after time reversal and reversed back afterward (resampling ratio R =1.0004, frame size N = 2048, and sampling rate fs = 16 kHz). “half-wave rectifying” nonlinearity [6] and the additive white noise. Finally, the new decorrelation procedure is applied to REE-based frequency-domain MCAEC to analyze the combined AEC system’s underlying advantages through simulated data. 2. RESAMPLING IN FREQUENCY DOMAIN As illustrated in Figure 1(a), decorrelation is achieved by resam- pling block-wise to induce a mismatch in the sampling rate, hence linearly time-varying delay with respect to the original time scale, across channels. Let fs and f ′ s be the original and the new sampling rates, respectively. The real-valued resampling ratio R is defined here as R = f ′ s /fs . (1) The identity relationship x(tR −1 ) ←→ X(e j2πfR ) (2) holds for continuous time signal x(t) and the corresponding Fourier transform X(e j2πf ), i.e., the time-frequency scaling is inversely related. The goal is to resample, or interpolate, across frequency rather than across time, with appropriate expansion or contraction dictated by R, to minimize the computation time via FFT. Let X(e jω ) be the discrete-time Fourier transform (DTFT) and XN (k) be the N -point discrete Fourier transform (DFT) of x(n), respectively, with proper bandlimiting during the sampling of x(t) to avoid the frequency aliasing to obtain x(n). Extending an N - point sequence from x(n) by inserting zeros at the end of the se- quence to turn it into an L-point sequence gives the relationship y(n)= x(n), n =0, 1,...,N − 1, 0, n = N,N +1,...,L, ←→ YL(k)= U (e jω ) ∗ X(e jω ) ω=2πk/L , (3) 978-1-4577-0693-6/11/$26.00 c 2011 IEEE 289