IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 28, 2020 77
Online Estimation of Reverberation Parameters For
Late Residual Echo Suppression
Naveen Kumar Desiraju , Simon Doclo , Senior Member, IEEE, Markus Buck , Member, IEEE, and Tobias Wolff
Abstract—In hands-free telephony and other distant-talk appli-
cations, often a short AEC filter is used to achieve fast convergence
at low computational cost. As a result, a significant amount of late
residual echo (LRE) may remain, especially in highly reverberant
environments. This LRE can be suppressed using a postfilter in
the subband domain, which requires an estimate of the power
spectral density (PSD) of the LRE. To estimate the LRE PSD, an
exponentially decaying model with frequency-dependent reverber-
ation scaling and decay parameters has frequently been assumed.
State-of-the-art methods estimate both reverberation parameters
independently of each other, either in offline or in online mode. In
this article, we propose two signal-based methods (i.e. output error
and equation error) to jointly estimate both reverberation param-
eters in online mode. The estimated parameters are then used to
generate an estimate for the LRE PSD, which is fed into a postfilter
for the purpose of late residual echo suppression. We derive several
gradient-descent-based algorithms to simultaneously update both
reverberation parameters, minimizing either the mean squared
error or the mean squared log error cost function. The proposed
methods are compared with state-of-the-art methods in terms of
the accuracy of the estimated reverberation parameters and the
corresponding LRE PSD estimate. Extensive simulation results
using both artificial as well as measured room impulse responses
show that the proposed output error method with mean squared
log error minimization outperforms state-of-the-art methods in all
considered scenarios.
Index Terms—Acoustic echo cancellation, adaptive filters, late
residual echo estimation, residual echo suppression.
I. INTRODUCTION
H
ANDS-FREE telephony and other distant-talk applica-
tions, such as voice-controlled multimedia devices, are of-
ten used in large reverberant rooms, where the distance between
the desired (near-end) speaker and the microphone may be quite
Manuscript received February 19, 2019; revised August 22, 2019; accepted
October 13, 2019. Date of publication October 21, 2019; date of current version
December 24, 2019. This work was supported by the European Union’s Sev-
enth Framework Programme (FP7/2007-2013) project DREAMS under Grant
ITN-GA-2012-316969. The associate editor coordinating the review of this
manuscript and approving it for publication was M. de Diego. (Corresponding
author: Naveen Kumar Desiraju.)
N. K. Desiraju was with Acoustic Speech Enhancement Research, Nuance
Communications Deutschland GmbH, 89077 Ulm, Germany. He is currently
with Harman Connected Services GmbH, 85748 Garching, Germany (e-mail:
naveen.desiraju@harman.com).
S. Doclo is with the Department of Medical Physics and Acoustics and the
Cluster of Excellence Hearing4All, University of Oldenburg, 26111 Oldenburg,
Germany (e-mail: simon.doclo@uni-oldenburg.de).
M. Buck and T. Wolff were with Acoustic Speech Enhancement Research,
Nuance Communications Deutschland GmbH, 89077 Ulm, Germany. They are
currently with Cerence Inc., 89077 Ulm, Germany (e-mail: markus.buck@
cerence.com; tobias.wolff@cerence.com).
Digital Object Identifier 10.1109/TASLP.2019.2948765
large. Due to the acoustic coupling between the loudspeaker and
the microphone, the microphone signal is typically degraded by
the acoustic echo of the far-end signal, which may significantly
reduce the quality and/or the intelligibility of the near-end
speaker. Acoustic echo cancellation (AEC) [1] is a key tech-
nology used in such scenarios, aimed at canceling the echo from
the microphone signal. An AEC system typically consists of an
adaptive filter [2], [3] which estimates the acoustic echo path,
i.e. the room impulse response (RIR) between the loudspeaker
and the microphone. The adaptive filter is used to generate
an estimate of the acoustic echo signal, which is subsequently
subtracted from the microphone signal. The resulting signal is
referred to as the AEC error signal and is composed of near-end
speech, background noise and usually some residual echo, as
the AEC filter is unable to completely accurately estimate the
RIR in practice (filter misalignment). When deploying an AEC
system in a room with a large reverberation time (T
60
), a large
filter length needs to be used in order to achieve good echo
cancellation performance. However, using a long filter results in
large computational cost for updating the filter and may also lead
to slow filter convergence [2], [3]. Hence, aiming at achieving
fast filter convergence at low computational cost, in practice
often a short AEC filter is used, which however results in a large
amount of late residual echo (LRE).
In practice, a postfilter is often used in addition to the AEC
filter, aimed at suppressing the residual echo and background
noise while not distorting the near-end speech signal. Although
multi-frame postfilters have been proposed [4], most postfilters
are single-tap real-valued gains [5]–[12]. To design the postfilter
in the subband domain, an accurate estimate of the power
spectral density (PSD) of the residual echo and background
noise signals is required. A simple but frequently used method to
estimate the PSD of the residual echo signal is to apply a coupling
factor to the far-end signal PSD, where the coupling factor
is estimated during periods of near-end speech absence [1].
However, since this method does not take into account any
temporal context and is unable to model the LRE PSD accurately,
its performance is quite poor, especially when using a short
AEC filter. Hence, several other LRE PSD estimators have been
proposed which are based on the statistical reverberation model
proposed in [13], [14], which assumes that the late reverberant
part of a RIR decays exponentially at a rate proportional to the
T
60
. These PSD estimators require estimates of two parameters:
the reverberation decay parameter (corresponding to the T
60
)
and the reverberation scaling parameter (a.k.a. initial power of
the LRE).
2329-9290 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.