Tracing the emergence of categorical speech perception in the human
auditory system
Gavin M. Bidelman
a, b,
⁎, Sylvain Moreno
c
, Claude Alain
c, d
a
Institute for Intelligent Systems, University of Memphis, Memphis, TN 38105, USA
b
School of Communication Sciences & Disorders, University of Memphis, Memphis, TN 38105, USA
c
Rotman Research Institute, Baycrest Centre for Geriatric Care, Toronto, ON M6A 2E1, Canada
d
Department of Psychology, University of Toronto, Toronto, ON M6A 2E1, Canada
abstract article info
Article history:
Accepted 21 April 2013
Available online 3 May 2013
Keywords:
Categorical perception
Speech perception
Brainstem response
Auditory event-related potentials (ERP)
Neural computation
Speech perception requires the effortless mapping from smooth, seemingly continuous changes in sound fea-
tures into discrete perceptual units, a conversion exemplified in the phenomenon of categorical perception.
Explaining how/when the human brain performs this acoustic–phonetic transformation remains an elusive
problem in current models and theories of speech perception. In previous attempts to decipher the neural
basis of speech perception, it is often unclear whether the alleged brain correlates reflect an underlying per-
cept or merely changes in neural activity that covary with parameters of the stimulus. Here, we recorded
neuroelectric activity generated at both cortical and subcortical levels of the auditory pathway elicited by a
speech vowel continuum whose percept varied categorically from /u/ to /a/. This integrative approach allows
us to characterize how various auditory structures code, transform, and ultimately render the perception of
speech material as well as dissociate brain responses reflecting changes in stimulus acoustics from those
that index true internalized percepts. We find that activity from the brainstem mirrors properties of the
speech waveform with remarkable fidelity, reflecting progressive changes in speech acoustics but not the dis-
crete phonetic classes reported behaviorally. In comparison, patterns of late cortical evoked activity contain
information reflecting distinct perceptual categories and predict the abstract phonetic speech boundaries
heard by listeners. Our findings demonstrate a critical transformation in neural speech representations between
brainstem and early auditory cortex analogous to an acoustic–phonetic mapping necessary to generate categor-
ical speech percepts. Analytic modeling demonstrates that a simple nonlinearity accounts for the transformation
between early (subcortical) brain activity and subsequent cortical/behavioral responses to speech (>150–
200 ms) thereby describing a plausible mechanism by which the brain achieves its acoustic-to-phonetic map-
ping. Results provide evidence that the neurophysiological underpinnings of categorical speech are present cor-
tically by ~175 ms after sound enters the ear.
© 2013 Elsevier Inc. All rights reserved.
Introduction
Sensory phenomena are typically subject to percept invariance in
which a continuum of similar features is mapped onto a common iden-
tity. This many-to-one mapping is the hallmark of categorical perception
(CP) which manifests in many aspects of human cognition including the
perception of faces (Beale and Keil, 1995), colors (Franklin et al., 2008),
and music (Klein and Zatorre, 2011). CP is particularly important in the
context of speech perception whereby gradually morphed sounds along
a large acoustic continuum are heard as belonging to one of only a few
discrete phonetic classes (Harnad, 1987; Liberman et al., 1967; Pisoni,
1973; Pisoni and Luce, 1987). That is, listeners treat sounds within a
given category as perceptually similar despite their otherwise dissimilar
acoustic characteristics. Given that categorical percepts do not faithfully
map from exact sensory input, they provide useful divisions of informa-
tion not contained in the external world (Miller et al., 2003). Presum-
ably, this type of “downsampling” mechanism would promote speech
comprehension by generating perceptual constancy in the face of indi-
vidual variation along multiple acoustic dimensions, e.g., talker variabil-
ity in tempo, pitch, or timbre (Prather et al., 2009).
Categorical speech boundaries emerge early in life (Eimas et al., 1971)
and are further modified based on one's native tongue (Kuhl et al., 1992)
suggesting that the neural mechanisms underlying CP, while potentially
innate, are also malleable to the experiential effects of learning and lan-
guage experience. Indeed, the fundamental importance of this “phonetic
mode” of listening (Liberman and Mattingly, 1989) to speech and lan-
guage processing is evident by its integral role in speech acquisition
(Eimas et al., 1971; Vihman, 1996) and the grapheme-to-phoneme map-
ping essential for reading and writing skills (Mody et al., 1997; Werker
and Tees, 1987). Despite its importance to everyday communication,
NeuroImage 79 (2013) 201–212
⁎ Corresponding author at: School of Communication Sciences & Disorders, University
of Memphis, 807 Jefferson Ave., Memphis, TN 38105, USA. Fax: +1 901 525 1282.
E-mail address: g.bidelman@memphis.edu (G.M. Bidelman).
1053-8119/$ – see front matter © 2013 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.neuroimage.2013.04.093
Contents lists available at SciVerse ScienceDirect
NeuroImage
journal homepage: www.elsevier.com/locate/ynimg