LOW VARIANCE BLIND ESTIMATION OF THE REVERBERATION TIME Nicol´ as L´ opez 1,2 , Yves Grenier 2 , Ga¨ el Richard 2 and Ivan Bourmeyster 1 1 Arkamys - 31 rue Pouchet, 75017 Paris, France 2 Institut Mines-T´ el´ ecom - Telecom ParisTech - CNRS/LTCI - 37/39 rue Dareau, 75014 Paris, France ABSTRACT The reverberation time is a key feature for describing the acoustic properties of a reverberant room. It can be com- puted from a measured Room Impulse Response but in many applications it has to be estimated blindly. Existing blind methods give accurate estimates but they often exhibit high variance across different speakers. In this paper, a low vari- ance blind estimator of the reverberation time is derived from the decay rate distribution of the signal. The influence of the reverberation time on the statistical moments of the distri- bution is analyzed and one relevant moment is taken as an estimator. The variance of the estimator is reduced thanks to a prewhitening filter and a modification of the decay rate distribution. Experimental results confirm the accuracy of the method when the observed signal is sufficiently long. Index Terms— Reverberation time, blind estimation, de- cay rate distribution, low variance 1. INTRODUCTION The reverberation time (RT) is one of the main features for describing the acoustics of a room. It is defined as the amount of time required to measure an energy decay of 60 dB after the excitation source is turned off. The RT gives valuable in- formation on the degradation affecting the speech signal [1] and it is needed to calibrate der beration algorithms that are based on a statistical model of the Room Impulse Response (RIR) of the enclosure [2, 3]. It is usually computed from a measured RIR using the well-known Schroeder’s backwards integration method [4]. However in a real speech communi- cation context we do not have access to this information and must proceed blindly. The problem of blind estimation of the RT has largely been addressed in the last decade. Some blind techniques exploit a model of the deformation of the speech signal in- troduced by reverberation. The RT is then mapped to a mea- sure of the deformation of the temporal [5] or the spectral [6] envelope of the signal. In [7], an Artificial Neural Net- work is trained to learn reverberation models. Other methods segment the decaying regions of the log-energy envelope of the signal and use linear regression on these regions to track the decay rate [2]. Recently, Maximum Likelihood (ML) ap- proaches have been developed [8, 9]. RT estimates are con- tinuously computed and an order filter is used to choose the most likely value. Wen et al. develop a blind method linking the second moment of the decay rate distribution to the RT [10]. The method performs in the Fourier domain. For each analysis frame a linear regression is made on the subband log- energy envelope to compute the decay rates. The method is fast and reliable, but exhibits high variance accross speakers. In this paper, we introduce a low variance RT estimator based on the decay rate distribution of speech signals. The estimation is performed in the time domain by studying the distribution of the energy ratios between adjacent frames of the energy envelope. The analysis of the relationship between the statistical moments of this distribution and the RT shows that the variance of the negative-side of the distribution is a reliable estimator of the RT. We show in our experiments that a prewhitening stage significantly reduces the variance of the estimator while keeping a small estimation bias. We also show that using a truncated distribution instead of the symmetric one used in [10] improves the accuracy of the es- timator. The paper is organized as follows: in Section 2 we intro- duce the sound decay model that will be used to derive the RT estimator described in Section 3. The estimator is compared to a state of the art method in Section 4 and some conclusions are suggested in Section 5. 2. MODEL OF SOUND DECAY TAIL The decay tail of a RIR is often modeled as an exponentially damped Gaussian white noise [11]. Since the RT is defined from a measured RIR, we expect to reliably estimate it in speech segments where the RIR model holds. Thus, we model the decay tail d(n) of speech signals as: d(n)= b(n)e -δn (1) where b(n) ∼N (0,σ 2 b ), n is the sample index and δ is the decay rate which is related to the reverberation time by: δ = 3 ln(10)/RT (2) Using equation (1), we compute the energy envelope of the decay tail, denoted e(n):