Processing of audiovisual speech in Broca’s area Ville Ojanen, a,b, * Riikka Mfttfnen, a,b Johanna Pekkola, a,b,c Iiro P. J77skel7inen, a,b,d Raimo Joensuu, b Taina Autti, b,c and Mikko Sams a,b a Laboratory of Computational Engineering, Helsinki University of Technology, P.O. Box 3000, FIN-02015 HUT, Helsinki, Finland b Advanced Magnetic Imaging-Center, Helsinki University of Technology, Helsinki, Finland c Helsinki University Central Hospital, Department of Radiology, Helsinki, Finland d Massachusetts General Hospital-Massachusetts Institute of Technology-Harvard Medical School A. Martinos Center for Biomedical Imaging, Charlestown, MA, USA Received 3 December 2003; revised 1 December 2004; accepted 1 December 2004 Available online 29 January 2005 We investigated cerebral processing of audiovisual speech stimuli in humans using functional magnetic resonance imaging (fMRI). Ten healthy volunteers were scanned with a dclustered volume acquisitionT paradigm at 3 T during observation of phonetically matching (e.g., visual and acoustic /y/) and conflicting (e.g., visual /a/ and acoustic /y/) audiovisual vowels. Both stimuli activated the sensory-specific auditory and visual cortices, along with the superior temporal, inferior frontal (Broca’s area), premotor, and visual–parietal regions bilaterally. Phonetically conflicting vowels, contrasted with matching ones, specifically increased activity in Broca’s area. Activity during phoneti- cally matching stimuli, contrasted with conflicting ones, was not enhanced in any brain region. We suggest that the increased activity in Broca’s area reflects processing of conflicting visual and acoustic phonetic inputs in partly disparate neuron populations. On the other hand, matching acoustic and visual inputs would converge on the same neurons. D 2004 Elsevier Inc. All rights reserved. Keywords: Broca’s area; Audiovisual vowel; Brain Introduction Interaction of the auditory and visual modalities is beneficial in everyday speech perception. Seeing speaker’s articulatory gestures improves identification of acoustic speech stimuli, especially in noisy conditions (Sumby and Pollack, 1954). On the other hand, viewing articulatory gestures, which are in conflict with the acoustic speech, can deteriorate acoustic speech perception. For example, the identification speed of acoustic speech slows down when it is combined with a conflicting visual articulation (e.g., acoustic /a/ and visual /y/) and the conflict between the sensory modalities is perceptually evident (Klucharev et al., 2003). Sometimes, conflicting acoustic and visual speech inputs fuse into a unified percept, as occurs in the bMcGurk effectQ. For example, simultaneously presented conflicting acous- tic /ba/ and visual /ga/ are usually perceived as /da/ (McGurk and MacDonald, 1976). Recent neuroimaging studies have identified brain regions in which activity differs during processing of audiovisual speech stimuli and their separately presented acoustic and visual compo- nents. These baudiovisual speech processing sitesQ include the superior temporal sulcus (STS) (Calvert et al., 2000; Sekiyama et al., 2003; Wright et al., 2003), the sensory-specific cortices (Calvert et al., 1999) and the claustrum (Olson et al., 2002). Furthermore, since Broca’s area and the motor speech regions seem to be common processing sites for both acoustic (Burton et al., 2000; Fadiga, 2002; Watkins et al., 2003; Zatorre et al., 1992, 1996) and visual speech (Campbell et al., 2001; Nishitani and Hari, 2002; Paulesu et al., 2003; Watkins et al., 2003), they are also good candidates for areas where acoustic and visual speech signals could interact. Indeed, there is evidence that these regions would contribute to the audiovisual speech perception in noisy conditions (Callan et al., 2003). It is not known in detail which features (e.g., temporal, spatial, phonetic or semantic) acoustic and visual speech inputs need to share in order to be integrated at a certain cerebral site. To focus on the processing of the phonetic features of audiovisual speech, we used two types of audiovisual stimuli which differed only with respect to phonetic congruency. We presented temporally and spatially matching, but either phoneti- cally matching (e.g., visual and acoustic /y/) or conflicting (e.g., visual /a/ and acoustic /y/) audiovisual vowels to the subjects. The matching vowels produced a unified audiovisual percept, but the conflicting ones were perceptually clearly incongruous. This enabled us to compare the BOLD (Blood Oxygenation Level Dependent)-signals to phonetically matching and conflict- ing audiovisual stimuli and specifically map the brain areas 1053-8119/$ - see front matter D 2004 Elsevier Inc. All rights reserved. doi:10.1016/j.neuroimage.2004.12.001 * Corresponding author. Laboratory of Computational Engineering, Helsinki University of Technology, P.O. Box 3000, FIN-02015 HUT, Helsinki, Finland. E-mail address: viloja@lce.hut.fi (V. Ojanen). Available online on ScienceDirect (www.sciencedirect.com). www.elsevier.com/locate/ynimg NeuroImage 25 (2005) 333 – 338