5 th Biennial Workshop on DSP for In-Vehicle Systems, Kiel, Germany, 2011 A Shadow Filter Approach to a Wideband FDAF-Based Automotive Handsfree System Marc-André Jung, Tim Fingscheidt Institute for Communications Technology, Technische Universität Braunschweig, Germany E-mail: {jung, fingscheidt}@ifn.ing.tu-bs.de Abstract We present a wideband handsfree system for automotive telephony applications with a synchronously adapted echo canceler and postfilter. It is based on a frequency domain adaptive filter approach and Kalman filter theory, and makes use of a generalized Wiener postfilter for residual echo sup- pression and noise reduction in a consistent way. To provide a high convergence rate in case of time-variant echo paths, the echo canceler with very robust double-talk performance is supported by a fast converging shadow filter, which allows for a good tracking performance. A decimation approach is used to decrease algorithmic delay and computational complexity without loss of quality. Experimental results with car cabin impulse responses show good echo cancellation capabilities with fast convergence times along with extraordinary full- duplex performance, while still keeping an almost untouched speech component in the converged state. Keywords shadow filter, wideband, handsfree system, FDAF, AEC 1. INTRODUCTION High-quality handsfree capabilities are a greatly demanded feature of narrowband and wideband (tele-)communication systems in office, home, or car environments and—referring to the latter—are even mandatory in many countries. Sev- eral state-of-the-art algorithms have been developed to fulfill technical requirements, such as full-duplex speech transmis- sion capability, sufficient acoustic echo cancellation even for highly time-variant echo paths, and minimal speech distortion (e. g., [1–6]). Nevertheless, those requirements often collide with practical restrictions like low complexity and algorith- mic delay [7–9]. Handsfree systems are usually designed to cope with sig- nal degradations stemming from the acoustic environment. These degradations are typically caused by acoustic echo and additive noise, leading to reduced intelligibility and speech quality. This is specifically the case for long round-trip delays or high noise immissions, as can be often found in automotive mobile phone usage. As a countermeasure, acoustic echo can- celers (AECs) [1, 6, 10, 11] and postfilters (PFs) for residual echo suppression (RES) [12, 13] and noise reduction (NR) approaches [14–16] have been proposed, typically working at a sampling rate of f s = 8 kHz (narrowband speech). With upcoming mobile wideband speech transmission at a sampling rate of f s = 16 kHz there are a couple of obstacles to be solved when designing a handsfree system. The doubled sampling rate causes a non-negligible increase of algorithmic complexity and can also lead to other unwanted effects when porting an algorithm from narrowband to wideband [9]. Typical handsfree system representatives in the time do- main are based on the normalized least mean square (NLMS) [17], affine projection (AP) [17–19], recursive least squares (RLS) [20], or Kalman algorithm [6, 21]. These approaches usually feature a simple algorithmic structure with the ability to work on a per-sample base. On the one hand, this usually leads to zero or low delay, on the other hand, modeling of longer impulse responses (IRs) can lead to exceedingly high computational complexity if the filter is adapted every single sample. This problem can be addressed by block processing, where the filter is only adapted once per block of samples. Albeit computationally efficient, this block processing leads to algorithmic delay and a slower convergence rate. Due to the fact that most of these algorithms make the assumption of a spectrally white echo signal but speech signals usually still have some inherent correlation, adaptation can only take place in the limited direction of the error signal vector. This decreased convergence rate can partly be avoided by using some kind of decorrelation technique for the excitation signal [6]. Whereas convergence speed can be increased especially with the RLS and Kalman algorithms, tracking performance often still suffers since adaptation of a well converged system model to IR changes only takes place in little steps [7]. An- other well-known problem of time domain AEC approaches is the poor double-talk performance. Presence of near-end speech or noise leads to undesired adaptation and therefore misestimation of the true impulse response. To avoid this, a— more or less—robust double-talk detection (DTD) scheme is often applied [7], which triggers an adaptation speed reduc- tion during double-talk. Adaptation in a transform domain like subband or fre- quency domain may circumvent some of the above men- tioned deficiencies. However, it should be mentioned that transformation domain processing may introduce other, pos- Marc-André Jung, Tim Fingscheidt — 1