Contents lists available at ScienceDirect
Brain and Language
journal homepage: www.elsevier.com/locate/b&l
Auditory event-related potentials index faster processing of natural speech
but not synthetic speech over nonspeech analogs in children
Allison Whitten
a,
⁎
, Alexandra P. Key
a,b,c
, Antje S. Mefferd
a,c
, James W. Bodfish
a,b,c,d
a
Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, 1215 21st Ave S., Nashville, TN, USA
b
Department of Psychiatry and Behavioral Sciences, Vanderbilt Psychiatric Hospital, 1601 23rd Ave. S, Nashville, TN, USA
c
Vanderbilt Kennedy Center, 110 Magnolia Cir, Nashville, TN, USA
d
Vanderbilt Brain Institute, 6133 Medical Research Building III, 465 21st Avenue S., Nashville, TN, USA
ARTICLE INFO
Keywords:
Event-related potential (ERP)
Natural stimuli
Speech
Nonspeech
Children
Auditory processing
ABSTRACT
Given the crucial role of speech sounds in human language, it may be beneficial for speech to be supported by
more efficient auditory and attentional neural processing mechanisms compared to nonspeech sounds. However,
previous event-related potential (ERP) studies have found either no differences or slower auditory processing of
speech than nonspeech, as well as inconsistent attentional processing. We hypothesized that this may be due to
the use of synthetic stimuli in past experiments. The present study measured ERP responses during passive
listening to both synthetic and natural speech and complexity-matched nonspeech analog sounds in 22 8–11-
year-old children. We found that although children were more likely to show immature auditory ERP responses
to the more complex natural stimuli, ERP latencies were significantly faster to natural speech compared to cow
vocalizations, but were significantly slower to synthetic speech compared to tones. The attentional results in-
dicated a P3a orienting response only to the cow sound, and we discuss potential methodological reasons for
this. We conclude that our results support more efficient auditory processing of natural speech sounds in chil-
dren, though more research with a wider array of stimuli will be necessary to confirm these results. Our results
also highlight the importance of using natural stimuli in research investigating the neurobiology of language.
1. Introduction
The neural processing of language depends on an initial filtering of
the complex acoustic environment in order to extract only the speech
signal. This process may be enhanced by a biological preference to
orient and attend to human speech over other types of sounds, similar
to the way other animals display an early bias to the vocalizations of
their own species (Barrow Heaton, Miller, & Goodwin, 1978; Braaten &
Reynolds, 1999; Marler, 1990; Penna & Meier, 2011). In humans, be-
havioral findings in infants from sequential looking preference and
high-amplitude sucking tasks demonstrate a preference for speech over
human nonspeech vocalizations and animal vocalizations that emerges
within the first three months of life (Shultz & Vouloumanos, 2010;
Vouloumanos, Hauser, Werker, & Martin, 2010). This bias to attend to
speech in infants has been shown to predict later vocabulary develop-
ment (Vouloumanos & Curtin, 2014) and is presumed to contribute to
the development of language learning networks in the brain (Kuhl,
2007). However, neural evidence that speech signals are prioritized
compared to other nonspeech sounds remains unclear.
Previous fMRI studies suggest that the bilateral superior temporal
sulcus (STS) exhibits speech-specific specialization (Belin, Zatorre, &
Ahad, 2002; Binder, 2000; Fecteau, Armony, Joanette, & Belin, 2004;
Overath, McDermott, Zarate, & Poeppel, 2015). However, finding that
speech is processed in a different region of the brain is insufficient on its
own to conclude that the brain prioritizes the processing of speech
stimuli over other sounds. The temporal resolution of electro-
encephalography (EEG) offers an alternative approach to investigate
this question by focusing on how processing differences may unfold
over time, rather than where in the brain the processing occurs. Several
previous EEG studies have identified a stronger right-lateralized fronto-
temporal positivity to voices (FTPV) that can be discriminated from
nonspeech sounds, however, the emergence of these responses varies
widely across studies between 60 and 164 ms (Bruneau et al., 2013;
Charest et al., 2009; Rogier, Roux, Belin, Bonnet-Brilhault, & Bruneau,
2010; Stavropoulos & Carver, 2016). Thus, it remains unclear at what
stage of processing speech is differentiated or prioritized compared to
nonspeech sounds. This goal can be accomplished by comparing the
timing of event-related potentials (ERPs) related to sensory and
https://doi.org/10.1016/j.bandl.2020.104825
Received 6 May 2019; Received in revised form 29 May 2020; Accepted 30 May 2020
⁎
Corresponding author at: Vanderbilt University Institute of Imaging Science, Vanderbilt University Medical Center, 1161 21st Ave S., Nashville, TN, USA.
E-mail address: allison.p.whitten@vumc.org (A. Whitten).
Brain and Language 207 (2020) 104825
0093-934X/ © 2020 Elsevier Inc. All rights reserved.
T