Learning and Discrimination of Audiovisual Events in Human Infants: The Hierarchical Relation Between Intersensory Temporal Synchrony and Rhythmic Pattern Cues David J. Lewkowicz New York State Institute for Basic Research in Developmental Disabilities This study examined 4- to 10-month-old infants’ perception of audio–visual (A-V) temporal synchrony cues in the presence or absence of rhythmic pattern cues. Experiment 1 established that infants of all ages could successfully discriminate between two different audiovisual rhythmic events. Experiment 2 showed that only 10-month-old infants detected a desynchronization of the auditory and visual components of a rhythmical event. Experiment 3 showed that 4- to 8-month-old infants could detect A-V desynchroni- zation but only when the audiovisual event was nonrhythmic. These results show that initially in development infants attend to the overall temporal structure of rhythmic audiovisual events but that later in development they become capable of perceiving the embedded intersensory temporal synchrony relations as well. A ticking metronome, a tap dancer, and a talking person all illustrate the fact that many events in our everyday world are specified concurrently in multiple sensory modalities. In addition, the multimodal sensory information that specifies them is distrib- uted over time. The specific way the information is distributed (i.e., its temporal structure) determines the perceptual and cogni- tive meaning of temporally defined events (Baldwin & Baird, 2001; Zacks & Tversky, 2001). The two best examples of the fundamental importance of temporal structure for perception and cognition are, of course, music and language. In each case, the particular temporal organization of a series of elements, be they notes or phonemes, can give rise to very different meanings (Bregman, 1990; Fraisse, 1982a, b; Krumhansl, 2000; Lashley, 1951; Martin, 1972; Pomerantz & Lockhead, 1991). In general, empirical evidence indicates that infants are sensitive to temporal structure in both the auditory and visual modalities (Lewkowicz, 1989, 2000a). For example, it has been reported that infants can perceive the temporal organization of a sequence of identical (Demany, McKenzie, & Vurpillot, 1977) or distinct (Chang & Trehub, 1977) sounds, that they can detect changes in the duration of the silent intervals that separate sounds (Thorpe & Trehub, 1989; Thorpe, Trehub, Morrongiello, & Bull, 1988), and that they can discriminate between different visual rhythms (Men- delson, 1986). Evidence also suggests that whereas sensitivity to some forms of temporally distributed sensory input remains un- changed, sensitivity to other forms of this type of input improves with development. Thus, sensitivity to audio–visual (A-V) syn- chrony relations and the ability to discriminate audiovisual rate variations emerge early and remain unchanged throughout infancy (Lewkowicz, 1992b, 1996). In contrast, thresholds for the detec- tion of auditory gaps decrease with age (Trehub, Schneider, & Henderson, 1995; Werner, Marean, Halpin, Spetner, & Gillenwa- ter, 1992), and the ability to discriminate more complex acoustic rhythms improves with age (Morrongiello, 1984). Usually, when we think of temporal structure, we think of rhythm. Fraisse (1982b) defined rhythm as an ordered succession of elements that can be temporally distributed in either a regular or an irregular fashion. The sound of a ticking metronome exemplifies a regularly distributed sequence and thus constitutes what Fraisse referred to as an isochronous sequence. A Mozart minuet, on the other hand, ex- emplifies what we usually think of as a rhythmic pattern. Its primary characteristic is that its constituent elements are separated by unequal intervals of time. This unequal temporal distribution leads to percep- tual chunking of various groups of sounds into distinct, Gestalt-like groupings that have definite beginnings and ends. Martin (1972) agreed with Fraisse’s distinction between isochronous and rhythmic sequences and made the additional critical point that only patterned rhythmical sequences can be characterized in terms of relative timing differences. By relative timing, Martin (1972) meant that “the locus of each (sound) element along the time dimension is determined relative to the locus of all other elements in the sequence, adjacent and nonadjacent” (p. 488). Applying the relative timing criterion to the study of rhythm perception means that the different rhythmic patterns used in a given study of discrimination must differ in terms of the relative arrangement of the intervals separating each element of a pattern. For example, a 2–2 rhythmic pattern of hammer taps would consist of four taps separated, in turn, by short, long, and short intervals, David J. Lewkowicz, New York State Institute for Basic Research in Developmental Disabilities. This work was supported in part by funds from the New York State Office of Mental Retardation and Developmental Disabilities and in part by National Institute of Child Health and Human Development Grants R03 HD36731 and R01 HD35849. I thank Marcia Dabbene and Linnea Dickson for their assistance. I also express my gratitude to Lorraine Bahrick, Robert Lickliter, and Stuart Marcovitch for helpful discussions regarding this work and for useful comments and suggestions on an earlier version of the manuscript. Correspondence concerning this article should be addressed to David J. Lewkowicz, who is now at the Department of Psychology, Charles E. Schmidt College of Science, Florida Atlantic University, E&S Building, 2912 College Avenue, Davie, Florida 33314. E-mail: lewkowic@fau.edu Developmental Psychology Copyright 2003 by the American Psychological Association, Inc. 2003, Vol. 39, No. 5, 795– 804 0012-1649/03/$12.00 DOI: 10.1037/0012-1649.39.5.795 795