Abstract—A combined three-microphone voice activity detector (VAD) and noise-canceling system is studied to enhance speech recognition in an automobile environment. A previous experiment clearly shows the ability of the composite system to cancel a single noise source outside of a defined zone. This paper investigates the performance of the composite system when there are frequently moving noise sources (noise sources are coming from different locations but are not always presented at the same time) e.g. there is other passenger speech or speech from a radio when a desired speech is presented. To work in a frequently moving noise sources environment, whilst a three-microphone voice activity detector (VAD) detects voice from a “VAD valid zone”, the 3-microphone noise canceller uses a “noise canceller valid zone” defined in free- space around the users head. Therefore, a desired voice should be in the intersection of the noise canceller valid zone and VAD valid zone. Thus all noise is suppressed outside this intersection of area. Experiments are shown for a real environment e.g. all results were recorded in a car by omni-directional electret condenser microphones. Keywords—signal processing, voice activity detection, noise canceller, microphone array beamforming I. INTRODUCTION HE most challenging of in-car speech recognition problems is picking up a speech signal from a desired source e.g. a driver’s voice, rather than mechanical noise and other passenger’s speech. The mechanical noise emanates from a number of sources including the engine, road, wind and air-conditioner. Other passenger’s speech as well as speech from the radio is also a challenge to speech recognition.[1] Microphone array beamforming is a well known solution to this issue and has been studied for some thirty years. It has applications to such areas as communications [2], hearing aids[3], speech-recognition[4] robotics[5] and hands-free telephony[6]. A real-time beamformer can be used to reduce the effects of noise on a speech signal. A two microphone approach can be used with one microphone near the desired speech and a second microphone near the noise source[7]. The resulting adaptive filter is updated using the least-mean-squares algorithm (LMS)[8]. This approach is successful when the speech signal Manuscript received July 17, 2006. Z. Qi is with the Institute of Information and Mathematic Science, Massey University at Albany, Auckland, New Zealand. (e-mail: tqi@unitec.ac.nz). T. J. Moir is with Institute of Information and Mathematic Science, Massey University at Albany, Auckland, New Zealand. (e-mail: t.j.moir@massey.ac.nz). is far enough away from the noise so that elements of the speech are not picked up by the noise microphone. It is well known that noise cancellation (Widrow noise canceller) works well when the disturbing noise emanates from a point source. It does not work well when the noise is diffusing. [9, 10] When all mechanical noise and undesired speech come from unknown directions, a microphone array beamformer is used to enhance speech from a geometrical zone and reduce any other speech or noise outside of this zone.[11] In order to improve hands-free speech recognition performance in car environments, a microphone beamforming array has been implemented with a Voice Activity Detector (VAD) which uses time-delay estimation together with magnitude-squared coherence (MSC). [12] This microphone array has been used to form a beamformer with normalized least-mean squares (NLMS) to improve Signal to Noise Ratio (SNR). The experiment clearly shows the ability of the composite system to reduce noise outside of a defined zone. Experiments have been conducted in real-time on a combined three-microphone VAD and noise-canceling system. The VAD assumes that the desired speech falls within a desired geometric zone in free- space which is most appropriate for an automobile environment as it can be defined around the drivers head. The noise-canceling is only required when noise is present during desired speech as the VAD will mute any solo noise-source outside of the zone. The experiment used only pre-recoded phrases. This work clearly demonstrates the ability of the algorithm to cancel speech outside of the zone. However, in a frequently moving noise sources environment, the noise cancellation needs to suppress the unwanted noise when desired speech is also present. This paper investigates this problem in some detail with real-time experiments clearly showing the performance of the canceller. II. ALGORITHM A. Three-microphone VAD switch Carter et al.[13] describe a method for estimating the magnitude-squared coherence (MSC) function for two zero- mean wide-sense-stationary random processes. The estimation technique utilizes the weighted overlapped segmentation fast Fourier transform (FFT). Analytical and empirical results for statistics of the estimator are presented. The analytical expressions are limited to the non-overlapped case. Empirical results show a decrease in bias and variance of the estimator with increasing overlap and suggest a 50-percent overlap as Automotive 3-Microphone Noise Canceller in a Frequently Moving Noise Source Environment Z. Qi and T. J. Moir T International Journal of Information and Communication Engineering 3:4 2007 298