Cognition 211 (2021) 104619 0010-0277/© 2021 Elsevier B.V. All rights reserved. Encoding and decoding of meaning through structured variability in intonational speech prosody Xin Xie a, 1, * , Andr´ es Bux´ o-Lugo b, 1, * , Chigusa Kurumada a a Department of Brain and Cognitive Sciences, University of Rochester, Rochester, NY 14627, USA b Department of Psychology, University of Maryland, College Park, MD 20742, USA A R T I C L E INFO Keywords: Prosody Meaning Intonation Adaptation Language production Language comprehension Variability ABSTRACT Speech prosody plays an important role in communication of meaning. The cognitive and computational mechanisms supporting this communication remain to be understood, however. Prosodic cues vary across talkers and speaking conditions, creating ambiguity in the sound-to-meaning mapping. We hypothesize that listeners ameliorate this ambiguity in part by learning talker-specifc statistics of prosodic cues. To test this hypothesis, we investigate the production and recognition of question vs. statement prosody in American English. Experiment 1 elicits productions of questions and statements from 65 talkers to examine the distributional statistics charac- terizing within- and cross-talker variability in these productions. We use Bayesian ideal observer models to assess the predicted consequences of cross-talker variability on listeners’ recognition of prosody. We fnd that learning of talker-specifc distributional statistics is predicted to facilitate recognition, above and beyond what can be achieved via commonly assumed normalizations of prosodic cues. Experiment 2 tests this prediction in a comprehension experiment. We expose different groups of listeners to different prosodic input statistics and assess listeners’ recognition of questions and statements both prior to, and following, exposure. Prior to exposure, ideal observer-derived predictions based on Experiment 1 provide a good qualitative ft against listeners’ recognition of prosodic contours in Experiment 2. Following exposure, listeners shift the categorization boundary between questions and statements in ways consistent with learning of talker-specifc statistics. 1. Introduction Prosody—the rhythm and cadence of speech—plays a critical role in the communication of meaning. Subtle differences in utterance-fnal intonation contours, for instance, change an utterance’s meaning from a statement (e.g., It’s raining. [falling intonation]) to a question (e.g., It’s raining? [rising intonation]). There is a rich evidence base indicating that listeners recognize such meaning-distinguishing prosodic categories (Bolinger, 1989; Gussenhoven, 2002; (Ladd, D Robert, 2008); Pierre- humbert and Hirschberg, 1990) and integrate the meaning as an utter- ance unfolds (Cutler, 2015; Dahan, 2015; Ito and Speer, 2008; Weber et al., 2006). However, the cognitive and perceptual mechanisms sup- porting this recognition remain poorly understood. One major source of diffculty stems from variability in the prosodic signal across talkers and contexts (Arvaniti, 2019; Brugos et al., 2006; Cangemi et al., 2015; Cangemi and Grice, 2016; Cole, 2015). Continuing on the case of statements vs. questions in American English, the exact form and level of the rise produced to signal a question meaning can vary across talkers as well as talker groups (e.g., age, gender, dialect) (Arvaniti and Garding, 2007; Clopper and Smiljanic, 2011). For example, due to diffculties in controlling their pitch, young children tend to produce a smaller degree of a rise than older children (Patel and Grigos, 2006). Also, rising intonation can be used to signal other, including social, meanings (e.g., ‘uptalk’, Warren, 2016). As a result of this talker variability, one person’s production of a statement and another person’s production of a question can be phonetically identical. The present study explores how listeners may navigate this “lack of invariance” in the realization of prosody. Although talker variability in speech acoustics has been an issue central to speech perception research (e.g., Hillenbrand et al., 1995; Newman et al., 2001; Theodore et al., 2009), relevant accounts for how listeners may cope with the variability focus almost exclusively on segmental (as opposed to prosodic) speech * Corresponding authors at: Department of Brain and Cognitive Sciences, Meliora Hall, University of Rochester, Rochester, NY 14627, and Department of Psy- chology, University of Maryland, Biology/Psychology Building, 4094 Campus Dr., College Park, MD 20742, United States. E-mail addresses: xxie13@ur.rochester.edu (X. Xie), buxolugo@umd.edu (A. Bux´ o-Lugo). 1 The frst two authors contributed equally. Contents lists available at ScienceDirect Cognition journal homepage: www.elsevier.com/locate/cognit https://doi.org/10.1016/j.cognition.2021.104619 Received 1 August 2020; Received in revised form 25 November 2020; Accepted 27 January 2021