ADAPTIVE EIGENVALUE DECOMPOSITION ALGORITHM FOR REALTIME ACOUSTIC SOURCE LOCALIZATION SYSTEM Yiteng Huang 329506 Georgia Tech Station Atlanta, Georgia 30332 gt9506b@prism.gatech.edu Jacob Benesty, Gary W. Elko Bell Laboratories, Lucent Technologies 700 Mountain Avenue Murray Hill, New Jersey 07974 jbenesty@bell-labs.com gwe@.bell-labs.com ABSTRACT To locate an acoustic source in a room, the relative delay between microphone pairs must be determined efficiently and accurately. However, most traditional time delay estimation (TDE) algorithms fail in reverberant environments. In this paper, a new approach is proposed that takes into account the reverberation of the room. A realtime PC-based TDE system running under Microsoft Win- dows system was developed with three TDE techniques: classical cross-correlation, phase transform, and a new algorithm that is pro- posed in this paper. The system provides an interactive platform that allows users to compare performance of these algorithms. 1. INTRODUCTION Realtime acoustic source localization system can be used in such applications as camera pointing for teleconferencing and micro- phone array beamformer steering for audio communication and speech processing systems. The problem is difficult because of the nonstationarity of speech and of room acoustic reverberation. Over the last two decades, several approaches have been proposed. Time delay estimation (TDE) between two microphones is becom- ing the technique of choice, especially in recent digital systems. Generalized cross-correlation (GCC) [1] is the most commonly used method for TDE. In this technique, the delay estimate is obtained as the time-lag that maximizes the cross-correlation be- tween filtered versions of the received signals. Techniques have been proposed to improve the GCC in the presence of noise [2, 3]. Because GCC is based on an ideal signal propagation model, it is believed that it has a fundamental weakness of inability to cope well in reverberant environments as shown clearly in [4]. Some improvement may be gained by cepstral prefiltering [5], however, shortcomings still remain. Even though more sophisticated tech- niques [6] exist, they tend to be computationally intensive and are thus not well suited for real-time applications. In this paper, a new approach is proposed that is based on a real signal propagation model (with reverberation) using eigen- value decomposition. Indeed, it will be shown that the eigenvector corresponding to the minimum eigenvalue of the covariance matrix of the microphone signals contains the impulse responses between the source and the microphones (and therefore all the information we need for TDE). In order to evaluate consistent and dynamic performance of proposed algorithm over a range of representative acoustic condi- tions, a real-time acoustic source localization system was devel- oped running on the Windows 95/NT operating systems. Three methods were implemented, namely classical cross-correlation, phase transform, and the proposed adaptive eigenvalue decompo- sition algorithm. 2. MODELS FOR THE TDE PROBLEM 2.1. Ideal Free-Field Model For the given source signal propagating through a generic noisy free space, the signal acquired by the -th ( ) micro- phone can be expressed as follows: (1) where is an attenuation factor due to propagation loss, is the propagation time and is the additive noise. It is further assumed that , , and are zero-mean, uncorrelated, stationary Gaussian random processes. The relative delay between the two microphone signals and is defined as (2) This model generates mathematically clear solution for and is widely used for the classical TDE problem. 2.2. Real Reverberant Model Unfortunately, in a real acoustic environment we must take into ac- count the reverberation of the room and the ideal model no longer holds. Then, a more complicated but more complete model for the microphone signals can be expressed as follows: (3) where denotes convolution and is the acoustic impulse re- sponse of the channel between the source and the -th microphone. Moreover, and might be correlated which is the case when the noise is directional, e.g., from a ceiling fan or an over- head projector. In this case, we do not have an “ideal” solution to the problem, as is the case for the previous model, unless we can accurately determine the two impulse responses, which is a very challenging problem.