IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 28, 2020 77 Online Estimation of Reverberation Parameters For Late Residual Echo Suppression Naveen Kumar Desiraju , Simon Doclo , Senior Member, IEEE, Markus Buck , Member, IEEE, and Tobias Wolff Abstract—In hands-free telephony and other distant-talk appli- cations, often a short AEC filter is used to achieve fast convergence at low computational cost. As a result, a significant amount of late residual echo (LRE) may remain, especially in highly reverberant environments. This LRE can be suppressed using a postfilter in the subband domain, which requires an estimate of the power spectral density (PSD) of the LRE. To estimate the LRE PSD, an exponentially decaying model with frequency-dependent reverber- ation scaling and decay parameters has frequently been assumed. State-of-the-art methods estimate both reverberation parameters independently of each other, either in offline or in online mode. In this article, we propose two signal-based methods (i.e. output error and equation error) to jointly estimate both reverberation param- eters in online mode. The estimated parameters are then used to generate an estimate for the LRE PSD, which is fed into a postfilter for the purpose of late residual echo suppression. We derive several gradient-descent-based algorithms to simultaneously update both reverberation parameters, minimizing either the mean squared error or the mean squared log error cost function. The proposed methods are compared with state-of-the-art methods in terms of the accuracy of the estimated reverberation parameters and the corresponding LRE PSD estimate. Extensive simulation results using both artificial as well as measured room impulse responses show that the proposed output error method with mean squared log error minimization outperforms state-of-the-art methods in all considered scenarios. Index Terms—Acoustic echo cancellation, adaptive filters, late residual echo estimation, residual echo suppression. I. INTRODUCTION H ANDS-FREE telephony and other distant-talk applica- tions, such as voice-controlled multimedia devices, are of- ten used in large reverberant rooms, where the distance between the desired (near-end) speaker and the microphone may be quite Manuscript received February 19, 2019; revised August 22, 2019; accepted October 13, 2019. Date of publication October 21, 2019; date of current version December 24, 2019. This work was supported by the European Union’s Sev- enth Framework Programme (FP7/2007-2013) project DREAMS under Grant ITN-GA-2012-316969. The associate editor coordinating the review of this manuscript and approving it for publication was M. de Diego. (Corresponding author: Naveen Kumar Desiraju.) N. K. Desiraju was with Acoustic Speech Enhancement Research, Nuance Communications Deutschland GmbH, 89077 Ulm, Germany. He is currently with Harman Connected Services GmbH, 85748 Garching, Germany (e-mail: naveen.desiraju@harman.com). S. Doclo is with the Department of Medical Physics and Acoustics and the Cluster of Excellence Hearing4All, University of Oldenburg, 26111 Oldenburg, Germany (e-mail: simon.doclo@uni-oldenburg.de). M. Buck and T. Wolff were with Acoustic Speech Enhancement Research, Nuance Communications Deutschland GmbH, 89077 Ulm, Germany. They are currently with Cerence Inc., 89077 Ulm, Germany (e-mail: markus.buck@ cerence.com; tobias.wolff@cerence.com). Digital Object Identifier 10.1109/TASLP.2019.2948765 large. Due to the acoustic coupling between the loudspeaker and the microphone, the microphone signal is typically degraded by the acoustic echo of the far-end signal, which may significantly reduce the quality and/or the intelligibility of the near-end speaker. Acoustic echo cancellation (AEC) [1] is a key tech- nology used in such scenarios, aimed at canceling the echo from the microphone signal. An AEC system typically consists of an adaptive filter [2], [3] which estimates the acoustic echo path, i.e. the room impulse response (RIR) between the loudspeaker and the microphone. The adaptive filter is used to generate an estimate of the acoustic echo signal, which is subsequently subtracted from the microphone signal. The resulting signal is referred to as the AEC error signal and is composed of near-end speech, background noise and usually some residual echo, as the AEC filter is unable to completely accurately estimate the RIR in practice (filter misalignment). When deploying an AEC system in a room with a large reverberation time (T 60 ), a large filter length needs to be used in order to achieve good echo cancellation performance. However, using a long filter results in large computational cost for updating the filter and may also lead to slow filter convergence [2], [3]. Hence, aiming at achieving fast filter convergence at low computational cost, in practice often a short AEC filter is used, which however results in a large amount of late residual echo (LRE). In practice, a postfilter is often used in addition to the AEC filter, aimed at suppressing the residual echo and background noise while not distorting the near-end speech signal. Although multi-frame postfilters have been proposed [4], most postfilters are single-tap real-valued gains [5]–[12]. To design the postfilter in the subband domain, an accurate estimate of the power spectral density (PSD) of the residual echo and background noise signals is required. A simple but frequently used method to estimate the PSD of the residual echo signal is to apply a coupling factor to the far-end signal PSD, where the coupling factor is estimated during periods of near-end speech absence [1]. However, since this method does not take into account any temporal context and is unable to model the LRE PSD accurately, its performance is quite poor, especially when using a short AEC filter. Hence, several other LRE PSD estimators have been proposed which are based on the statistical reverberation model proposed in [13], [14], which assumes that the late reverberant part of a RIR decays exponentially at a rate proportional to the T 60 . These PSD estimators require estimates of two parameters: the reverberation decay parameter (corresponding to the T 60 ) and the reverberation scaling parameter (a.k.a. initial power of the LRE). 2329-9290 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.