The Role of Auditory Feedback in Speech and Song
Tim A. Pruitt and Peter Q. Pfordresher
University at Buffalo, State University of New York
When singing a melody or producing sentences, we take for granted the fact that the sounds we create
(auditory feedback) match the intended consequences of our actions. The importance of these perception/
action matches to production is illustrated by the detrimental effects of altered auditory feedback (AAF).
Previous research in the domain of music has shown that when AAF leads to asynchronies between
perception and action, timing of production is disrupted but accuracy of sequencing is not. On the other
hand, AAF manipulations of pitch disrupt sequencing but not timing. Such dissociative effects, as well
as other findings, suggest that sensitivity to AAF may be based on hierarchical organization of sequences.
In the current research we examined whether similar effects are found for the production of speech, for
which syllables rather than pitches may constitute content units. In the first experiment, participants
either sang melodies or spoke sequences of nonsense syllables. In the second experiment, the tasks were
combined such that participants sang syllable sequences. Production in both experiments was accom-
panied by either normal, asynchronous, or content altered auditory feedback. Across experiments, effects
of AAF on the accuracy of sequencing were similar in speaking and singing tasks, and in all cases
reflected the dissociative effects described earlier. For timing of production, however, previous results
were only found when participants sang sequences that did not have varying syllabic content. These
results suggest that sensitivity to timing exists at multiple hierarchical levels, particularly at the syllable
and phonetic levels.
Keywords: auditory feedback, sequencing, timing, music and language
Anecdotes from drive-thru workers, cellular phone users, and
online video game players tell of communication difficulties due to
signal delays between speaking and hearing their own voice. This
postponement in hearing self-produced auditory information re-
sults in speech disfluencies such as stuttering and repeating words.
The novelty mobile phone application Speech Jammer (Hou, 2014)
operates similarly by allowing users to implement a delay between
the input to the device’s microphone and its audio output. Speech
Jammer has gained popularity on the Internet as users have posted
YouTube videos documenting production disturbances during
their attempts to read selections, give consumer reviews, or per-
form songs. Likewise, the speech jammer gun (Kurihara & Tsu-
kada, 2012) technologically elaborates on this principle to create
practical applications for crowd control or maintaining silent en-
vironments by disrupting speech without physically distressing its
targets.
All of the earlier cases illustrate that even the slightest desyn-
chronization between producing and subsequently hearing audi-
tory information can have profound effects on speech. But given
the lack of control in such real-world examples it is difficult to
pinpoint the origin of this disruption. One possibility involves
feedback synchrony, which refers to whether the onsets and offsets
of speech sounds line up in time with each other. Asynchronies
between actions (spoken syllables) and auditory feedback have
been the focus of accounts for such disruptive effects. However,
another possibility emerges from cases in which the resulting
content of auditory feedback has been altered such that the cate-
gorical event (a syllable) of feedback does not match the intended
event. If, for instance, a feedback delay is as long as a spoken
syllable, then the speaker would hear the previous syllable when
generating the current syllable and any resulting disruption would
reflect a deviation in content rather than asynchronous timing.
The distinction between feedback content and timing is critical
here because it bears on the nature of mapping between perception
and action. Previous research in the domain of music, reviewed in
the following, has suggested that these alterations have distinct
effects on production thus suggesting that perception and action
associations are constrained by the temporal hierarchy used to
represent the structure of a sequence. However, no research to date
has addressed whether comparable effects may occur for speech,
thus leaving open the question of whether perception/action asso-
ciation in speech relate to those of music.
In light of this, we report on two experiments that address
critical questions involving how sensory information relates to
motor information. First, do people use feedback to guide speech
in the same way that they use feedback to produce melodies? This
question reflects a critical debate in the current literature regarding
representations used to process music versus language (e.g., Patel,
2008). Second (and related), to what degree is the use of feedback
This article was published Online First November 10, 2014.
Tim A. Pruitt and Peter Q. Pfordresher, University at Buffalo, State
University of New York.
This research was supported in part by NSF Grants BCS– 0642592 and
BCS–1256964. We are grateful to Anastasiya Kobrina and Esther Song for
assistance with data collection, and to Pauline Larrouy-Maestri, James
Mantell, Kathleen Jocoy, Ken Steele, James Sawusch, and Eduardo Mer-
cado III for helpful comments on an earlier version of this paper.
Correspondence concerning this article should be addressed to Tim A.
Pruitt or Peter Q. Pfordresher, Department of Psychology, 204 Park Hall,
University at Buffalo, State University of New York, Buffalo, NY 14260.
E-mail: tapruitt@buffalo.edu or pqp@buffalo.edu
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Journal of Experimental Psychology:
Human Perception and Performance
© 2014 American Psychological Association
2015, Vol. 41, No. 1, 152–166
0096-1523/15/$12.00 http://dx.doi.org/10.1037/a0038285
152