Vowel Fundamental and Formant Frequency Contributions to English and Mandarin Sentence Intelligibility Daniel Fogerty 1 , Fei Chen 2 1 Department of Communication Sciences and Disorders, University of South Carolina, USA 2 Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China fogerty@sc.edu, fchen@sustc.edu.cn Abstract The current study investigated spectral components of vowels that contribute to Mandarin and English sentence intelligibility. Sentences were processed to preserve various amounts of vowel information. Processing parameters ensured similar proportions of speech preserved between the two languages. In the first experiment, speech segments, primarily containing vocalic cues, were processed to flatten fundamental frequency (F0) cues. In the second experiment, sine-wave speech synthesis was used to coarsely code speech to retain only amplitude and frequency variation associated with the first three formants. Results demonstrated remarkable similarity between Mandarin and English sentence intelligibility with flattened F0 sentences. In contrast, the intelligibility of English sentences surpassed that of Mandarin sentences for sine-wave speech. Combined with earlier reports of superior intelligibility of Mandarin sentences with full spectrum vowels, these results highlight significant contributions of Mandarin F0 information, likely related to lexical tone. In contrast, English listeners may rely more on frequency and/or amplitude variation of the formants. Index Terms: speech recognition, vowels, lexical tone, interruption. 1. Introduction Previous studies have indicated that acoustic information present during vowel segments provides significant contributions to Mandarin and English sentence intelligibility [1-3]. However, it is not currently clear whether the acoustic contributions from vowels are the same between the two languages, or whether listeners of one language weight some acoustic features more than listeners of the other language. Such a language comparison will assist in defining language- specific and language-general processes for how speech information useful for sentence intelligibility is distributed across the complex acoustic parameters of speech. The comparison between English and Mandarin Chinese is informative due to a number of acoustic-phonetic differences between the languages. First, as Mandarin is a tone language, lexical information is conveyed by the fundamental frequency (F0). This may result in different vowel contributions to intelligibility compared to English, where F0 is also important [e.g., 4], but does not directly convey lexical meaning. In addition, Mandarin has a sparse vowel system compared to English. This difference, combined with the phonological structure of the language, is likely related to larger vowel inherent spectral changes (VISC) that have been observed for Mandarin vowels compared to English vowels [5]. VISC reflects the slow varying changes in the vowel formants that play an important role for vowel perception [6]. The current study was designed to specifically investigate language differences that occur as a result of relative differences in the way vowel F0 and VISC contribute to sentence intelligibility. Toward this end, two experiments were conducted to independently assess these two acoustic features. Experiment 1 tested sentence intelligibility for F0 flattened sentences as a way of indexing differences in the way vowel F0 contour contributes to overall sentence intelligibility for the two languages. Experiment 2 was designed to assess differences between the two languages in the contribution of amplitude and frequency variations in the first three formants by using sinewave speech to coarsely represent speech according to only these acoustic features. In this way, differences in overall intelligibility between Mandarin and English could be attributed to how well listeners were able to extract meaning from the preserved acoustic cues. This study extends the literature on cross-linguistic vowel differences to examine how specific acoustic differences determine sentence intelligibility. In addition to differences in F0 and VISC between the two languages, Mandarin and English also have different syllabic structures. Mandarin has a consonant-vowel syllable structure that varies significantly from the complex syllable structure of English that allows for consonant clusters. This difference results in a greater proportion of the sentence accounted for by vowel acoustics in Mandarin compared to English. To control for this durational difference, the total proportion of speech information presented was equated between English and Mandarin testing by examining performance at different preserved proportions of the vowel. Initial testing of the Mandarin listeners was previously reported [7] and is included here to investigate the cross-language comparison with new data from the English-speaking listeners. 2. Experiment 1: Vowel F0 Experiment 1 was designed to investigate the contribution of the F0 contour to English and Mandarin sentence intelligibility. Vowel contributions were isolated by interrupting sentences to preserve primarily vowel cues with F0 contours flattened to the mean sentence level. Consonant segments were replaced with a low-level speech-shaped noise. 2.1. Listeners Two groups of listeners participated in Experiment 1. The first group of listeners (N=18) consisted of native speakers of American English who were tested with the English sentences. Testing for this group was completed at the University of South Carolina. The second group of listeners (N=20) were native speakers of Mandarin Chinese and were tested with the Mandarin sentences. Testing for this group was completed at the University of Hong Kong. All listeners had normal audiograms with octave pure tone thresholds ≤ 20 dB HL. Copyright  2016 ISCA INTERSPEECH 2016 September 8–12, 2016, San Francisco, USA http://dx.doi.org/10.21437/Interspeech.2016-28 1382