Efficient voice activity detection algorithms using long-term speech information Javier Ram ırez * , Jos e C. Segura 1 , Carmen Ben ıtez, Angel de la Torre, Antonio Rubio 2 Dpto. Electr onica y Tecnolog ıa de Computadores, Universidad de Granada, Campus Universitario Fuentenueva, 18071 Granada, Spain Received 5 May 2003; received in revised form 8 October 2003; accepted 8 October 2003 Abstract Currently, there are technology barriers inhibiting speech processing systems working under extreme noisy condi- tions. The emerging applications of speech technology, especially in the fields of wireless communications, digital hearing aids or speech recognition, are examples of such systems and often require a noise reduction technique operating in combination with a precise voice activity detector (VAD). This paper presents a new VAD algorithm for improving speech detection robustness in noisy environments and the performance of speech recognition systems. The algorithm measures the long-term spectral divergence (LTSD) between speech and noise and formulates the speech/ non-speech decision rule by comparing the long-term spectral envelope to the average noise spectrum, thus yielding a high discriminating decision rule and minimizing the average number of decision errors. The decision threshold is adapted to the measured noise energy while a controlled hang-over is activated only when the observed signal-to-noise ratio is low. It is shown by conducting an analysis of the speech/non-speech LTSD distributions that using long-term information about speech signals is beneficial for VAD. The proposed algorithm is compared to the most commonly used VADs in the field, in terms of speech/non-speech discrimination and in terms of recognition performance when the VAD is used for an automatic speech recognition system. Experimental results demonstrate a sustained advantage over standard VADs such as G.729 and adaptive multi-rate (AMR) which were used as a reference, and over the VADs of the advanced front-end for distributed speech recognition. Ó 2003 Elsevier B.V. All rights reserved. Keywords: Speech/non-speech detection; Speech enhancement; Speech recognition; Long-term spectral envelope; Long-term spectral divergence 1. Introduction An important problem in many areas of speech processing is the determination of presence of speech periods in a given signal. This task can be identified as a statistical hypothesis problem and its purpose is the determination to which category or class a given signal belongs. The decision is made based on an observation vector, frequently * Corresponding author. Tel.: +34-958243271; fax: +34- 958243230. E-mail addresses: javierrp@ugr.es (J. Ram ırez), segura@ ugr.es (J.C. Segura), carmen@ugr.es (C. Ben ıtez), atv@ugr.es ( A. de la Torre), rubio@ugr.es (A. Rubio). 1 Tel.: +34-958243283; fax: +34-958243230. 2 Tel.: +34-958243193; fax: +34-958243230. 0167-6393/$ - see front matter Ó 2003 Elsevier B.V. All rights reserved. doi:10.1016/j.specom.2003.10.002 Speech Communication 42 (2004) 271–287 www.elsevier.com/locate/specom