IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 28, 2020 77 Online Estimation of Reverberation Parameters For Late Residual Echo Suppression Naveen Kumar Desiraju , Simon Doclo , Senior Member, IEEE, Markus Buck , Member, IEEE, and Tobias Wolff Abstract—In hands-free telephony and other distant-talk appli- cations, often a short AEC ﬁlter is used to achieve fast convergence at low computational cost. As a result, a signiﬁcant amount of late residual echo (LRE) may remain, especially in highly reverberant environments. This LRE can be suppressed using a postﬁlter in the subband domain, which requires an estimate of the power spectral density (PSD) of the LRE. To estimate the LRE PSD, an exponentially decaying model with frequency-dependent reverber- ation scaling and decay parameters has frequently been assumed. State-of-the-art methods estimate both reverberation parameters independently of each other, either in ofﬂine or in online mode. In this article, we propose two signal-based methods (i.e. output error and equation error) to jointly estimate both reverberation param- eters in online mode. The estimated parameters are then used to generate an estimate for the LRE PSD, which is fed into a postﬁlter for the purpose of late residual echo suppression. We derive several gradient-descent-based algorithms to simultaneously update both reverberation parameters, minimizing either the mean squared error or the mean squared log error cost function. The proposed methods are compared with state-of-the-art methods in terms of the accuracy of the estimated reverberation parameters and the corresponding LRE PSD estimate. Extensive simulation results using both artiﬁcial as well as measured room impulse responses show that the proposed output error method with mean squared log error minimization outperforms state-of-the-art methods in all considered scenarios. Index Terms—Acoustic echo cancellation, adaptive ﬁlters, late residual echo estimation, residual echo suppression. I. INTRODUCTION H ANDS-FREE telephony and other distant-talk applica- tions, such as voice-controlled multimedia devices, are of- ten used in large reverberant rooms, where the distance between the desired (near-end) speaker and the microphone may be quite Manuscript received February 19, 2019; revised August 22, 2019; accepted October 13, 2019. Date of publication October 21, 2019; date of current version December 24, 2019. This work was supported by the European Union’s Sev- enth Framework Programme (FP7/2007-2013) project DREAMS under Grant ITN-GA-2012-316969. The associate editor coordinating the review of this manuscript and approving it for publication was M. de Diego. (Corresponding author: Naveen Kumar Desiraju.) N. K. Desiraju was with Acoustic Speech Enhancement Research, Nuance Communications Deutschland GmbH, 89077 Ulm, Germany. He is currently with Harman Connected Services GmbH, 85748 Garching, Germany (e-mail: naveen.desiraju@harman.com). S. Doclo is with the Department of Medical Physics and Acoustics and the Cluster of Excellence Hearing4All, University of Oldenburg, 26111 Oldenburg, Germany (e-mail: simon.doclo@uni-oldenburg.de). M. Buck and T. Wolff were with Acoustic Speech Enhancement Research, Nuance Communications Deutschland GmbH, 89077 Ulm, Germany. They are currently with Cerence Inc., 89077 Ulm, Germany (e-mail: markus.buck@ cerence.com; tobias.wolff@cerence.com). Digital Object Identiﬁer 10.1109/TASLP.2019.2948765 large. Due to the acoustic coupling between the loudspeaker and the microphone, the microphone signal is typically degraded by the acoustic echo of the far-end signal, which may signiﬁcantly reduce the quality and/or the intelligibility of the near-end speaker. Acoustic echo cancellation (AEC) [1] is a key tech- nology used in such scenarios, aimed at canceling the echo from the microphone signal. An AEC system typically consists of an adaptive ﬁlter [2], [3] which estimates the acoustic echo path, i.e. the room impulse response (RIR) between the loudspeaker and the microphone. The adaptive ﬁlter is used to generate an estimate of the acoustic echo signal, which is subsequently subtracted from the microphone signal. The resulting signal is referred to as the AEC error signal and is composed of near-end speech, background noise and usually some residual echo, as the AEC ﬁlter is unable to completely accurately estimate the RIR in practice (ﬁlter misalignment). When deploying an AEC system in a room with a large reverberation time (T 60 ), a large ﬁlter length needs to be used in order to achieve good echo cancellation performance. However, using a long ﬁlter results in large computational cost for updating the ﬁlter and may also lead to slow ﬁlter convergence [2], [3]. Hence, aiming at achieving fast ﬁlter convergence at low computational cost, in practice often a short AEC ﬁlter is used, which however results in a large amount of late residual echo (LRE). In practice, a postﬁlter is often used in addition to the AEC ﬁlter, aimed at suppressing the residual echo and background noise while not distorting the near-end speech signal. Although multi-frame postﬁlters have been proposed [4], most postﬁlters are single-tap real-valued gains [5]–[12]. To design the postﬁlter in the subband domain, an accurate estimate of the power spectral density (PSD) of the residual echo and background noise signals is required. A simple but frequently used method to estimate the PSD of the residual echo signal is to apply a coupling factor to the far-end signal PSD, where the coupling factor is estimated during periods of near-end speech absence [1]. However, since this method does not take into account any temporal context and is unable to model the LRE PSD accurately, its performance is quite poor, especially when using a short AEC ﬁlter. Hence, several other LRE PSD estimators have been proposed which are based on the statistical reverberation model proposed in [13], [14], which assumes that the late reverberant part of a RIR decays exponentially at a rate proportional to the T 60 . These PSD estimators require estimates of two parameters: the reverberation decay parameter (corresponding to the T 60 ) and the reverberation scaling parameter (a.k.a. initial power of the LRE). 2329-9290 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.