Objective Comparison of Speech Enhancement Algorithms under real world conditions Stavros Ntalampiras Department of Electrical and Computer Engineering, University of Patras 26500, Rion, Patras, Greece +30 2610 969806 sntalampiras@upatras.gr Todor Ganchev Department of Electrical and Computer Engineering, University of Patras 26500, Rion,Patras, Greece +30 2610 96 9808 tganchev@ieee.org Ilyas Potamitis Department of Music Technology & Acoustics, Technological Educational Institute of Crete 74100, Rethymno, Crete, Greece +30 28310 21911 potamitis@stef.teicrete.gr Nikos Fakotakis Department of Electrical and Computer Engineering, University of Patras 26500, Rion, Patras, Greece +30 2610 996 216 fakotaki@wcl.ee.upatras.gr ABSTRACT Over the past decades the problem of one channel, speech enhancement has been addressed by a great deal of researchers. In this work selected methods belonging to a variety of categories are applied to denoise speech signals corrupted by non-stationary urban noise. The performance of spectral subtraction, signal subspace, model-based and Kalman filtering approaches is evaluated. Several objective measures which are designed to predict human listening tests are employed in order to reach accurate conclusions. Two series of experiments were carried out while multiband spectral subtraction along with a short-time spectral amplitude (STSA) estimator based on the minimization of the mean square error (MSE) of the log-spectra are shown to outperform the rest of the algorithms. Categories and Subject Descriptors I.2.7 [Natural Language Processing]: Speech recognition and synthesis General Terms Algorithms, Performance Keywords Speech Enhancement, Spectral Subtraction, Signal Subspace, Model-based Enhancement, Kalman Filtering 1. INTRODUCTION The 1 primary objective of noise compensation methods as applied in the context of speech processing is to reduce the effect of any signal that is alien to and disruptive of the message conveyed among participants in a communicative event (whether humans or ASR machines). Depending on the application, speech enhancement methods aim at speech quality improvement and/or signal preprocessing for speech or speaker recognition. The key Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. PETRA’08, July 15-19, 2008, Athens, Greece. Copyright 2008 ACM 978-1-60558-067-8-15/07/08…$5.00. difference is that in the latter case, the complexity of the problem to be solved by the recognizer is relaxed by a pre-processing transformation from the time domain to a domain with more desirable properties as regards the recognition process. When speech quality and intelligibility is the issue, it is essential that we respect the specific idiosyncrasies of human speech hearing and, therefore, reconstruct the time-domain signal. In brief, a speech enhancement algorithm aims at one or more of the following goals: a) The improvement of speech quality by reduction of effort, fatigue and original message ambiguity. b) The reduction of noise-induced stress that could probably effect the articulation of speech at low SNRs – well-known as the Lombard effect. c) The elimination of speech-coding inconsistencies. d) Robust Automatic Speech/Speaker Recognition. Although ASR has come to a point that it enables the launch of commercial products, operational systems still face the problem of maintaining high recognition performance in adverse environments due to the mismatch between training and operational acoustical characteristics. Due to the polymorphic manifestations and detrimental effect of noise, speech enhancement remains an open challenge. Comprehensive assessments of noise compensation methods that belong to different speech processing strategies can be found in [1]. After more than three decades of advances on the one-channel, speech enhancement problem, to our opinion, four distinct families of algorithms seem to have predominated in the literature, namely: a) the spectral subtractive algorithms [2], b) the statistical model-based approaches [3, 4, 5], c) the signal subspace approaches [6, 7] and d) the enhancement approaches based on a special type of filtering [8]. In this work, eight speech enhancement methods are evaluated on a real-world database recorded for the needs of speech recognition in motorcycle environment. This database consists of speech recordings coherent with the communication protocol of UK- police force, and especially with the motorcycle policing units. Thus, the performance of the speech enhancement algorithms under consideration is evaluated in conditions characterized with highly non-stationary noise. Specifically, we perform an objective Permission to make digital or hard copies of part or all of this work or personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. PETRA'08, July 15-19, 2008, Athens, Greece. Copyright 2008 ACM 978-1-60558-067-8... $5.00