An improved approach to robust speech recognition using minimum error classi®cation Min-Tau Lin a , Andreas Spanias a, * , Philipos Loizou b a Department of Electrical Engineering, Arizona State University, Tempe, AZ 85287-7206, USA b Department of Applied Science, University of Arkansas at Little Rock, Little Rock, AR 72204-1099, USA Received 3 October 1997; received in revised form 28 December 1998; accepted 10 May 1999 Abstract An eective way of applying minimum error classi®cation (MEC) to improve robustness in speech recognition is presented in this paper. In contrast to the traditional maximum likelihood (ML) training procedure that attempts to maximize the a priori probability of generating the training data set, MEC training attempts to minimize a function of the recognition error on the given training data set. In the MEC training procedure, the N-best algorithm is used to maximize the separation between the correct and competing models over confusable training tokens. The main focus of this paper is to investigate the eectiveness of MEC training when combined with four existing speech recognition algorithms under noisy and telephone mismatched environments. These algorithms are the weighted projection measure (WPM), the minimax approach (MA), the cepstral mean subtraction (CMS) method and the stochastic matching al- gorithms (SMAs). Experiments were performed using the Texas Instruments isolated digits database and the E-set words from the OGI Spelled and Spoken Telephone Corpus. The average word error rate reduction due to MEC training was 22.5% for isolated digit recognition and 8% for E-set word recognition. Ó 2000 Published by Elsevier Science B.V. All rights reserved. Keywords: Minimum error classi®cation (MEC); Hidden Markov Model (HMM) 1. Introduction Speech recognition algorithms have advanced to a level where reliable recognition is within reach (Spanias and Wu, 1992; Cole et al., 1995). Most of the reported improvements, however, have been achieved with speech recorded in a noise-free envi- ronment with high quality recording equipment. In the real world, channel interference, ambient noise, as well as variability in sound recording equipment www.elsevier.nl/locate/specom Speech Communication 30 (2000) 27±36 Nomenclature K classi®er parameter set k i parameter vector for the ith HMM X training set V the number of HMMs O i the observation sequences of training tokens from the ith training class O i n nth observation sequence of ith token class N i the number of training tokens per class H i the maximum likelihood state sequence of the ith HMM * Corresponding author. Tel.: +1-480-965-3424; fax: +1-480- 965-8325. E-mail address: spanias@asu.edu (A. Spanias). 0167-6393/00/$ - see front matter Ó 2000 Published by Elsevier Science B.V. All rights reserved. PII: S 0 1 6 7 - 6 3 9 3 ( 9 9 ) 0 0 0 2 7 - 8