Research report Pre-attentive categorization of vowel formant structure in complex tones Thomas Jacobsen a, * , Erich Schro ¨ger a , Elyse Sussman b a BioCog-Cognitive and Biological Psychology, Institut fu ¨r Allgemeine Psychologie, Universita ¨t Leipzig, Seeburgstraße 14-20, 04103 Leipzig, Germany b Department of Neuroscience, Albert Einstein College of Medicine, 1410 Pelham Parkway S., Bronx, NY 10461, USA Accepted 30 March 2004 Available online 18 May 2004 Abstract It has been demonstrated that vowel information can be extracted from speech sounds without attention focused on them, despite widely varying non-speech acoustic information in the input. The present study tested whether even complex tones that were constructed based on F0, F1 and F2 vowel frequencies to resemble the defining features of speech sounds, but were not speech, are categorized pre-attentively according to vowel space information. The Mismatch Negativity brain response was elicited by infrequent tokens of the complex tones, showing that the auditory system can pre-attentively categorize speech information on the basis of the minimal, defining auditory features. The human mind extracts the language-relevant information from complex tones despite the non-relevant variation in the sound input. D 2004 Elsevier B.V. All rights reserved. Theme: Neural basis of behavior Topic: Cognition Keywords: Formant structure; Speech perception; Phonemes; Vowels; Mismatch negativity; Auditory sensory memory; Event-related potentials 1. Introduction Speakers of different age, gender, speech styles and voice parameters produce speech of widely varying fundamental frequency (F0), spectral and harmonic structure, intensity and other physical sound characteristics (e.g., Refs. [6,18,25]). Despite this fact, listeners readily extract the relevant information, amid this variability, to effortlessly comprehend spoken language. Speech perception requires highly organized, fast, adaptive processes that extract crit- ical sound features from complex, dynamic stimulus space and map them onto categorical phonological information, part of language-related long-term memory (e.g., Ref. [10]). This occurs while simultaneously disregarding the specific sound features not relevant for speech perception. Recent event-related brain potential (ERP) studies have shown that phoneme information is extracted pre-attentively, despite variation in sound features not relevant for speech (see below). The purpose of the present study was to investigate the extraction of characteristic auditory information that determines a vowel in formant space, an important step in speech perception, by looking at the acoustic processing of complex tones. More specifically, to determine whether phoneme information is even extracted from tones that do not sound like speech but that include the acoustic elements of the phoneme formant information. To address this issue, ERPs were recorded, a technique that allows one to assess speech processing with millisecond accuracy and without the interference of task-related pro- cesses and participant strategies. The Mismatch Negativity (MMN) ERP component, which reflects pre-attentive audi- tory deviance detection based upon auditory sensory mem- ory representations, has been used frequently to investigate the time course of speech processing [12,14,15,20]. Regu- larities from repetitive auditory stimulation are extracted and temporarily stored in auditory sensory memory. New stimuli detected as violating the stored regularities elicit MMN. Violations (or deviations) can be based on simple, complex, or even abstracted auditory regularities, all of which elicit 0926-6410/$ - see front matter D 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.cogbrainres.2004.03.021 * Corresponding author. Tel.: +49-341-9735907; fax: +49-341- 9735969. E-mail address: jacobsen@uni-leipzig.de (T. Jacobsen). www.elsevier.com/locate/cogbrainres Cognitive Brain Research 20 (2004) 473 – 479