INTELLIGIBILITY-ENHANCING MODIFICATIONS Talkers make acoustic-phonetic modifications that are specifically attuned to their communicative situation [1, 2] E.g. talkers may automatically increase intensity of their speech to compensate for background noise, or slow their speaking rate to compensate for a listener with low proficiency Talkers can produce these changes under instruction Though elicitation instructions often differ, e.g. “hyperarticulate”[3], “pretend to speak to someone from a different language background”[4], “pretend to speak to to someone with hearing loss”[5], etc. Adaptations result in increased intelligibility compared to conversational speech Previous research has compared speakers producing modifications under different elicitation methods [1, 2, 8] Lam and Tjaden, 2013: Different instructions (overenuciated, hearing impaired, clear) resulted in similar acoustic modifications (spectral change, vowel space expansion, lengthened segment durations). However, magnitude of acoustic change and perceptual benefit depended on instruction [1] Hazan and Baker, 2011: Different instructions (read clear, spontaneous speech in response to babble and to vocoder) resulted in different acoustic modifications. Magnitude of acoustic change depended on instruction as well [2] Not well understood which of the articulatory modifications contribute to enhanced intelligibility The extent to which such modifications are under explicit talker control remains largely unknown RESEARCH AIMS 1. To examine the acoustic-articulatory enhancements which native and non-native talkers implement following 6 different speech modification instructions 2. To define the extent to which these 6 different methods improve intelligibility IMPLICATIONS Improvement or maintenance of speech intelligibility is a central aim for a whole range of clinical interventions and training programs: Audiologic rehabilitation, hearing aid technologies, ESL instruction, etc. Deeper understanding of which instructions yield the largest intelligibility gain and the extent to which acoustic enhancements are under explicit talker control will have clinical/educational significance. I. Introduction Intelligibility of speaking styles elicited by various instructions Rachael Gilbert 1 , Nicholas Victor 1 , Bharath Chandrasekaran 1,2 , & Rajka Smiljanic 1 Department of Linguistics 1 & Department of Communication Sciences & Disorders 2 , The University of Texas at Austin REFERENCES 1. Lam, J. & Tjaden, K. (2013). Intelligibility of clear speech: Effect of instruction. Journal of Speech, Language, and Hearing Research, 56(5), 1429. 2. Hazan, V., & Baker, R. (2011). Acoustic-phonetic characteristics of speech produced with communicative intent to counter adverse listening condition. The Journal of the Acoustical Society of America, 120, 2139. 3. Dromey, C. (2000). Articulatory kinematics in patients with Parkinson disease using different speech treatment approaches. Journal of Medical Speech-Language Pathology, 8, 155–161. 4. Bradlow, A. R., Kraus, N., & Hayes, E. (2003). Speaking clearly for children with learning disabilities: Sentence perception in noise. Journal of Speech, Language, and Hearing Research, 46, 80–97. 5. Ferguson, S. H., & Kewley-Port, D. (2007). Talker differences in clear and conversational speech: Acoustic characteristics of vowels. Journal of Speech, Language, and Hearing Research, 50, 1241–1255. 6. Krause, J., & Braida, L. (2004). Acoustic properties of naturally produced clear speech at normal speaking rates. The Journal of the Acoustical Society of America, 115, 362– 378. 7. Maniwa, K., Jongman, A., & Wade, T. (2008). Perception of clear fricatives by normal- hearing and simulated hearing-impaired listeners. The Journal of the Acoustical Society of America, 123, 1114–1125. 8. Lam, J., Tjaden, K., & Wilding, G. (2012). Acoustics of clear speech: Effect of instruction. Journal of Speech, Language, and Hearing Research, 55, 1807–1821. doi: 10.1044/1092-4388(2012/ 11-0154). 9. Van Engen, K. J., Chandrasekaran, B., & Smiljanic, R. (2012). Effects of speech clarity on recognition memory for spoken sentences. PloS one, 7(9), e43753. ACKNOWLEDGEMENTS The authors are grateful to Jasmine Beitz, Lauren Franklin, Maddie Oakley, Kiki Adams, Gaby Cook, Emily Tagtow, and Corina Treviño for their contributions to this project. This research was funded by UT Longhorn Innovation Fund for Technology, awarded to Bharath Chandrasekaran and Rajka Smiljanic. CONTACT rachaelgilbert@utexas.edu, bchandra@austin.utexas.edu, rajka@mail.utexas.edu TALKERS 6 native speakers of American English (N) Age mean: 19.0 years old, range: 18-21 5 Korean speakers (NN) Age mean: 29.2 years old, range: 25-34 Age of English acquisition mean: 13.25 years old, range: 13-14 Age moved to the US mean: 27.2, range: 24-30 Recruited through the UT ESL program and word of mouth All normal-hearing (<25 dB thresholds at .5, 1, 2, 4 kHz) STIMULI 160 meaningful sentences (e.g. the hot sun warmed the ground ) [9] All 160 produced in conversational speech + 20 sentence subsets in response to 6 different elicitation instructions to compare modifications in response to different instruction: 1. Conversational (CO) 2. Loud (LD) 3. Slow (SL) 4. Exaggerated (EX) 5. Imitated conversational (IO) 6. Imitated clear (IL) 7. Clear (CL) ACOUSTIC ANALYSES Pitch: mean and range Speaking rate: speech rate, pause rate, pause duration, vowel duration Vowels: triangular vowel space area (VSA) Measurements run on all stimuli (280 sentences per speaker) except for: Vowel measures run on a select subset of 36 /i ɑ u/ tokens from each speaker PERCEPTUAL TEST Word recognition in noise Speech mixed w/ SSN at -5 dB SNR 80 sentences presented to each listener (each speaker’s productions transcribed by ~ 6 listeners) 64 young adult listeners Normal-hearing, native speakers of American English Mean: 20.9 years old, range: 18-33 STATISTICAL MODELS Repeated-measures ANOVA for each dataset 8 total Between-subject factor: Talker L1 Native, Nonnative Within-subject factor: Style IO, IL, EX, SL, LD, CL Dependent variable: Gain Net difference from baseline CO speech II. Methods V. Conclusion The acoustic modifications in response to the different instructions highly varied Talkers able to produce different modifications in response to different instructions (e.g., slowing down in SL but not LD) However, some acoustic-articulatory modifications co- occur (e.g., when instructed to exaggerate their vowels, talkers slowed down; when instructed to speak slowly, talkers expanded their vowel space) Overall, NN talkers capable of producing similar speaking style adaptations, but to a lesser degree compared to N talkers (except VSA in LD) Instructing talkers to enhance their speech resulted in a significant perceptual benefit All elicitation instructions aimed at enhancing intelligibility were successful Both N and NN talkers were able to produce speaking style adaptations in response to the instructions that resulted in intelligibility increase; N talkers’ modifications resulted in larger gains THE NEXT STEPS Examine in detail how each acoustic measure affects intelligibility Examine the extent to which NN talkers can be trained to further exaggerate intelligibility-enhancing features of their speech Examine additional NN talkers and other talker groups that have compromised intelligibility III. Results INTELLIGIBILITY Significant main effect of Style: All instructions resulted in an increase in intelligibility compared to conversational speech (except IO) Significant main effect of L1: N talkers were more able to enhance intelligibility through speech modifications than NN No exact 1:1 correspondence between any one acoustic change + perceptual benefit Pitch and rate factors appear to be more correlated with intelligibility than vowel factors ACOUSTIC-PHONETIC MODIFICATIONS Significant main effect of Style on all acoustic measurements except VSA Different instructions result in different cues enhancements: e.g. F0 changes greatest for LD, speaking rate changes greatest for EX, vowel duration changes greatest for CL Some modifications co-occur: e.g. speaking rate decrease present in IL, EX and CL; VSA increase present in EX, SL LD and CL (for N) Significant main effects of L1: N show larger adaptations in speaking rate, pause rate, pause duration than do NN Significant interactions of L1 and Style: N modify their speaking rate significantly more for EX than LD; NN do not NN expand their VSA more for SL than LD; N do not -10 -5 0 5 10 15 20 F0 Mean Speaking Style Adaptation from CO IO IL EX SL LD CL Speaker Group Native Nonnative -30 -20 -10 0 10 20 30 F0 Range Speaking Style Adaptation from CO IO IL EX SL LD CL Speaker Group Native Nonnative -2.0 -1.5 -1.0 -0.5 0.0 0.5 Speaking Rate Speaking Style Adaptation from CO IO IL EX SL LD CL Speaker Group Nonnative Native -0.2 0.0 0.2 0.4 0.6 Pause Rate Speaking Style Adaptation from CO IO IL EX SL LD CL Speaker Group Native Nonnative -1 0 1 2 Pause Duration Speaking Style Adaptation from CO IO IL EX SL LD CL Speaker Group Native Nonnative -0.04 -0.02 0.00 0.02 0.04 0.06 Vowel Duration Speaking Style Adaptation from CO IO IL EX SL LD CL Speaker Group Nonnative Native -40000 0 20000 60000 Vowel Space Area Speaking Style Adaptation from CO IO IL EX SL LD CL Speaker Group Native Nonnative 5pSC2 -0.1 0.0 0.1 0.2 Intelligibility Speaking Style Adaptation from CO IO IL EX SL LD CL Speaker Group Native Nonnative