Robust speech recognition using sequential spike code Phillip B. Schafer and Dezhe Z. Jin The past three decades have seen intensive research in automatic speech recognition, but automated systems still fall short of human performance in realistic listening conditions. This performance disparity is particu- larly great in adverse environments with noise, reverberations, or multi- ple speakers. Here, we present a novel, biologically-inspired system for robust isolated word recognition that works by identifying sequences of spikes in a spike timing code. The system has two stages. First, a simu- lation of the auditory nerve response to speech is passed to a population of feature-detecting neurons, which spike selectively in response to 50-ms spectro-temporal patterns. Second, the spike trains are converted to se- quences of symbols and words are identified by comparing the sequences to a set of templates. The sequences are compared using a distance mea- sure based on the longest common subsequence. We test our system on isolated digits from the TI46 database mixed with additive noise and find that it far outperforms a state-of-the-art hidden Markov model system. 1 Introduction Automatic speech recognition (ASR) in noisy conditions remains a challenge for computer algorithms even after decades of research. Most approaches to noise-robust ASR have focused on hidden Markov models (HMMSs), which model speech as a sequence of discrete acoustic features generated by a Markov chain. Because these systems explicitly model the acoustic variability of speech, they do not generalize well to noise types not present during training. On the other hand, human speech perception can remain invariant over a wide range of listening conditions. This disparity has led to a search for new algorithms inspired by neurobiology that can match the robustness of human performance [Scharenborg, 2007, Moore, 2007]. Recent developments in auditory neuroscience have revealed novel features of early auditory encoding that may open new possibilities for computing. Experimen- tal studies of the inferior colliculus [Escab´ ı et al., 2003] and primary auditory cortex [DeWeese et al., 2003, Hrom´ adka et al., 2008, Kayser et al., 2010] have found that neurons encode information with sparsely-occuring, reproducible spikes timed with 1