Articulatory Inluences on the Categorization of Speech Sounds Method 2. Procedure 1. Stimuli & Experiment H. Henny Yeung Department of Psychology University of British Columbia Bryan Gick Department of Linguistics University of British Columbia Janet F. Werker Department of Psychology University of British Columbia mailto: hhyeung@psych.ubc.ca Goal: This study investigates cross-modal interactions between perception and production of speech. Do articulatory gestures inluence the perception of speech? Method: Participants categorized naturally-produced syllables while a) simply listening, b) concurrently articulating one of those syllables without vocal fold vi- bration, or c) reading text that conlicted with auditory information. Results: In certain pairings of mismatched auditory stimuli and articulatory gestures, articulation resulted in “articulatory-capture.” Participants sometimes misperceived the auditory syllable as the one that they were articulating. Conclusion: Perceptual and motor systems for speech may be linked, but this may depend on acoustic and other perceptual factors that allow articulatory cap- ture only in particular situations. • Cross-modal interactions are common in processing all kinds of perceptual input (Shimojo & Shams, 2001), also in speech (Fig. 1). • While not providing direct evidence for motor and direct- realist theories of speech perception (Liberman & Mattingly, 1985; Liberman & Whalen, 2000; Fowler, 1986), a link is reported here between speech perception - production. • Integration, or “capture” of articulatory information, may depend on perceptual factors, like acoustic saliency. Visual and haptic information affect speech only under particular conditions. • This experiment explored whether and which articulatory movements affect speech perception. Similar results are reported by Sams (cited as personal communication in Fowler et al., 2003) & Rosenblum (personal communication, March 17th, 2005). • 8-naturally produced tokens of /aba/, /ada/, /ava/, & /aa/ were recorded; average length was 850ms. Files can be downloaded from www.psych.ubc.ca/~hhyeung. • 10 native speakers of English (3 male) participated. • Testing was performed in a sound-attenuated booth. Sound levels at the speaker were between -53 to -58 dB; participants sat 60-70cm away from the speakers. • Participants pushed response buttons to categorize auditory syllables as either: o ABA AVA ADA ATHA • Three types of trials were used: o 64 CONTROL – baseline measure of categorization accuracy o 256 NAME – concurrent articulation of either of 4 syllables w/ sound o 256 ORTHO– presentation of either 4 text syllables w/ sound Abstract Introduction Results - Proportion of responses matching the auditory syllable analyzed by condition. NAME < CONTR; t(9) = 8.973, p<0.01 NAME < ORTHO; t(9) = 8.373, p<0.01 ORTHO < CONTR; t(9) = 2.847, p<0.05 - Within NAME, not all sounds were equivalent (F(3, 27) = 5.956, p<0.05) - /aba/ and /ava/ sounds were harder to identify (LSD: p-values all < 0.056). Name Trials Control Trials Ortho Trials Conclusions 1) Articulation interferes with categorization of speech sounds, particularly “aba” and “ava.” 2) Errors depend on the particular combinations of auditory-articulatory information. - “aba” is commonly misperceived as “ava” when articulating “ava” - “ava” is occasionally misperceived as “aba” when articulating “aba” - “atha” is occasionally misperceived as “ada” or “ava” when articulating “ada” 3) Articulatory movements can inluence the perception of speech, supporting the idea that motor and perceptual routines for speech are linked. Further research will need to: a) develop better nonspeech controls also requiring articulatory movements b) explore parameters that allow articulatory capture of only particular combinations c) identify processing levels of cross-modal integration. We thank Lynne E. Bernstein for helpful discussion on these and related issues, as well as Eric Bateson, Laurel Fais, Chandan Narayan, Laura Sabourin, & Ferran Pons for comments on an earlier version. This work was supported by an NSERC grant to JFW, an NSERC Discovery grant to BG, and an NSF GRF to HHY. Presented at the 149th Meeting of the Acoustical Society of America, May 19th, 2005 at 1:30pm. References Fadiga, L, Fogassi, L, Povesi, G, & Rizzolatti, G. (1995). Motor facilitation during action observation: a magnetic stimulation study. Journal of Neurophysiology, 73, 2608-2611. Fowler, CA (1986). An event approach to the study of speech perception from a direct-realist perspective. Journal of Phonetics, 14, 3-28. Fowler, CA & Dekle, DJ (1991). Listening with Eye and Hand: Cross-Modal Contributions to Speech Perception. Journal of Experimental Psychology: Human Perception and Performance, 17(3), 816-828. Fowler, CA, Brown, JM, Sabadini, L, & Weihing, J (2003). Rapid access to speech gestures in perception: evidence from choice and simple response time tasks. Journal of Memory and Language, 49, 396-413. Houde JF & Jordan, MI (1998). Sensorimotor adaptation in speech production. Science, 279(5354), 1213-1216. Kerzel, D & Bekkering, H (2000). Motor activation from visible speech: evidence from stimulus response compatibility. Journal of Experimental Psychology, Human Perception and Performance, 26(2), 634-647. Liberman, AM & Mattingly, I (1985). The motor theory revised. Cognition, 21, 1-36. Liberman, AM & Whalen DH (2000). On the relation of speech to language. Trends in Cognitive Sciences, 4, 187-196. McGurk, H & MacDonald, J. (1976) Hearing lips and seeing voices. Nature, 264, 746-748 Shimojo, S. & Shams L. (2001). Sensory modalities are not separate modalities: plasticity and interactions. Current Opinions in Neurobiology, 11, 505-509. Analysis involved: 1. Proportion ABA, ADA, AVA, & ATHA responses. 2. Whether conlicting information (i.e., text in ORTHO; articulatory movements in NAME) was aba, ada, ava, or atha. 3-way ANOVAs for each syllable were run separately: Response (4 levels) x Conlict (4 levels) x ORTHO, NAME (2 levels). 2-way ANOVAs were run within NAME, to see if the response patterns differed as a function of conlicting information: Response x Conlict. Lower-bound corrections were used in all cases; sphericity was not assumed. ABA: *3-way: F(9, 81) = 8.335, p<0.05 **2-way: F(9, 81) = 13.676, p<0.01 ADA: 3-way: F(9, 81) = 3.252, p=0.105 2-way: F(9, 81) = 3.332, p=0.101 AVA: 3-way: F(9, 81) = 3.411, p=0.098 *2-way: F(9, 81) = 6.489, p<0.05 ATHA: ^3-way: F(9, 81) = 5.144, p=0.05 *2-way: F(9, 81) = 7.456, p<0.05 Acknowledgements Fig. 1: The red arrow indicates how this project its with selected papers in cross-modal speech perception. 4pSC6 Display: Audio: