Tracing the emergence of categorical speech perception in the human auditory system Gavin M. Bidelman a, b, ⁎, Sylvain Moreno c , Claude Alain c, d a Institute for Intelligent Systems, University of Memphis, Memphis, TN 38105, USA b School of Communication Sciences & Disorders, University of Memphis, Memphis, TN 38105, USA c Rotman Research Institute, Baycrest Centre for Geriatric Care, Toronto, ON M6A 2E1, Canada d Department of Psychology, University of Toronto, Toronto, ON M6A 2E1, Canada abstract article info Article history: Accepted 21 April 2013 Available online 3 May 2013 Keywords: Categorical perception Speech perception Brainstem response Auditory event-related potentials (ERP) Neural computation Speech perception requires the effortless mapping from smooth, seemingly continuous changes in sound fea- tures into discrete perceptual units, a conversion exempliﬁed in the phenomenon of categorical perception. Explaining how/when the human brain performs this acoustic–phonetic transformation remains an elusive problem in current models and theories of speech perception. In previous attempts to decipher the neural basis of speech perception, it is often unclear whether the alleged brain correlates reﬂect an underlying per- cept or merely changes in neural activity that covary with parameters of the stimulus. Here, we recorded neuroelectric activity generated at both cortical and subcortical levels of the auditory pathway elicited by a speech vowel continuum whose percept varied categorically from /u/ to /a/. This integrative approach allows us to characterize how various auditory structures code, transform, and ultimately render the perception of speech material as well as dissociate brain responses reﬂecting changes in stimulus acoustics from those that index true internalized percepts. We ﬁnd that activity from the brainstem mirrors properties of the speech waveform with remarkable ﬁdelity, reﬂecting progressive changes in speech acoustics but not the dis- crete phonetic classes reported behaviorally. In comparison, patterns of late cortical evoked activity contain information reﬂecting distinct perceptual categories and predict the abstract phonetic speech boundaries heard by listeners. Our ﬁndings demonstrate a critical transformation in neural speech representations between brainstem and early auditory cortex analogous to an acoustic–phonetic mapping necessary to generate categor- ical speech percepts. Analytic modeling demonstrates that a simple nonlinearity accounts for the transformation between early (subcortical) brain activity and subsequent cortical/behavioral responses to speech (>150– 200 ms) thereby describing a plausible mechanism by which the brain achieves its acoustic-to-phonetic map- ping. Results provide evidence that the neurophysiological underpinnings of categorical speech are present cor- tically by ~175 ms after sound enters the ear. © 2013 Elsevier Inc. All rights reserved. Introduction Sensory phenomena are typically subject to percept invariance in which a continuum of similar features is mapped onto a common iden- tity. This many-to-one mapping is the hallmark of categorical perception (CP) which manifests in many aspects of human cognition including the perception of faces (Beale and Keil, 1995), colors (Franklin et al., 2008), and music (Klein and Zatorre, 2011). CP is particularly important in the context of speech perception whereby gradually morphed sounds along a large acoustic continuum are heard as belonging to one of only a few discrete phonetic classes (Harnad, 1987; Liberman et al., 1967; Pisoni, 1973; Pisoni and Luce, 1987). That is, listeners treat sounds within a given category as perceptually similar despite their otherwise dissimilar acoustic characteristics. Given that categorical percepts do not faithfully map from exact sensory input, they provide useful divisions of informa- tion not contained in the external world (Miller et al., 2003). Presum- ably, this type of “downsampling” mechanism would promote speech comprehension by generating perceptual constancy in the face of indi- vidual variation along multiple acoustic dimensions, e.g., talker variabil- ity in tempo, pitch, or timbre (Prather et al., 2009). Categorical speech boundaries emerge early in life (Eimas et al., 1971) and are further modiﬁed based on one's native tongue (Kuhl et al., 1992) suggesting that the neural mechanisms underlying CP, while potentially innate, are also malleable to the experiential effects of learning and lan- guage experience. Indeed, the fundamental importance of this “phonetic mode” of listening (Liberman and Mattingly, 1989) to speech and lan- guage processing is evident by its integral role in speech acquisition (Eimas et al., 1971; Vihman, 1996) and the grapheme-to-phoneme map- ping essential for reading and writing skills (Mody et al., 1997; Werker and Tees, 1987). Despite its importance to everyday communication, NeuroImage 79 (2013) 201–212 ⁎ Corresponding author at: School of Communication Sciences & Disorders, University of Memphis, 807 Jefferson Ave., Memphis, TN 38105, USA. Fax: +1 901 525 1282. E-mail address: g.bidelman@memphis.edu (G.M. Bidelman). 1053-8119/$ – see front matter © 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.neuroimage.2013.04.093 Contents lists available at SciVerse ScienceDirect NeuroImage journal homepage: www.elsevier.com/locate/ynimg