On the syllable-timing of Cantonese and Beijing Mandarin Peggy Pik Ki Mok Department of Linguistics and Modern Languages, The Chinese University of Hong Kong peggymok@cuhk.edu.hk Abstract This study investigates the speech rhythm of Cantonese and Beijing Mandarin using some recently developed acoustic rhythmic measures. The two languages were compared with four languages in the BonnTempo corpus: German and English (stress-timed) and French and Italian (syllable-timed). Six Cantonese and six Beijing Mandarin native speakers were recorded reading the North Wind and the Sun story with a normal speech rate and telling the story semi-spontaneously. Both raw and normalised rhythmic measures were calculated using vocalic, consonantal and syllabic durations (∆C, ∆V, ∆S, %V, VarcoC, VarcoV, VarcoS, rPVI_C, rPVI_S, nPVI_V, nPVI_S). Results confirm the syllable-timing impression of Cantonese and Beijing Mandarin, and suggest that Cantonese may have the most typical syllable-timed rhythm among the languages in this study, probably due to its lack of lexical stress. This study also shows that, in addition to consonantal and vocalic durations, syllable durations can potentially be useful in distinguishing speech rhythm. 1. Introduction Speech researchers have traditionally classified languages into different rhythmic groups: syllable-timed, stress-timed and mora-timed [1, 17]. English and German are typical stress- timed languages; French and Italian are typical syllable-timed languages and Japanese is a typical mora-timed language. This rhythm class hypothesis was based on the notion of isochrony, i.e. there are units of equal or near-equal duration in the speech signal for such classification: syllables for syllable-timed languages, inter-stress intervals (feet) for stress-timed languages and mora for mora-timed languages. However, many experimental studies could not find concrete evidence for such isochronous units in the speech signal to support the rhythmic class hypothesis (see [8, 13, 17] for a review). For example, the syllable durations of syllable-timed languages are equally variable as stress-timed languages [16], while durations of inter-stress intervals in stress-timed languages are not more variable than in syllable-timed languages [8]. Beckman [3] and Laver [14] concluded the early attempts to find acoustic correlates of speech rhythm by suggesting that speech rhythm is merely perceptual, since no reliable evidence could be found for isochrony. Nevertheless, despite the lack of isochronous units, Dauer [8] and Roach [16] pointed out that stress-timed languages and syllable-timed languages differ in several important phonological aspects: syllable structure, vowel reduction and stress. Stress-timed languages have more variation in syllable length and structure, more reduced unstressed syllables, more variation in the phonetic realisation of stress and more stress- related rules than syllable-timed languages. These features, rather than any isochronous unit, combine with one another to give the impression of stress-timing versus syllable-timing. In addition, contrary to the early assumption of categorical distinction of speech rhythm, they suggested that languages can be more or less stress-timed or syllable-timed, with a continuum between the two. The above insights are captured by several recently developed acoustic measures of speech rhythm which could reflect the auditory impression of different rhythmic classes: %V (percentage of vocalic durations in speech), ∆C, ∆V (standard deviations of consonantal and vocalic durations respectively) by Ramus et al. [15] and Pairwise Variability Index (PVI) of vocalic and consonantal durations by Grabe & Low [13]. These measures depart from the search of isochronous phonological units; instead, they consider the variability in speech. They take only the duration of vowels and consonants as the basis for rhythmic classifications. Due to the various phonological differences mentioned above, stress-timed languages would have higher variability of consonant and vowel durations than syllable-timed languages. Their results show that %V and ∆C (Figure 1), the normalised vocalic PVI and the raw consonantal PVI (Figure 2) can categorise different languages into distinct rhythmic clusters, while languages having less typical or unknown rhythm may fall between these clusters. Subsequent studies also confirm that these acoustic measures can be used to distinguish languages with different speech rhythm, e.g. [18]. Figure 1: Results from Ramus et al [13]. Figure 2: Results from Grabe & Low [11]. This study investigates the speech rhythm of Cantonese and Beijing Mandarin using the above acoustic measures.