ACOUSTIC DESCRIPTION OF A SOPRANO’S VOWELS BASED ON PERCEPTUAL LINEAR PREDICTION Thomas Millhouse 1 & Frantz Clermont 2 1 Sydney Conservatorium of Music, University of Sydney, Australia 2 JP French Associates and University of York, United Kingdom 1 thomas.millhouse@bigpond.com 2 akustikfonetiks@yahoo.com.au ABSTRACT A perceptually-motivated model known as Perceptual Linear Prediction (PLP, [6]) is employed to parameterise and to interpret the cardinal vowels sung by a professional soprano at pitches ranging from 220 to 880 Hz. The PLP model yields perceptual formants (F 1 ′ and F 2 ′), which encode the low and high-spectral regions, respectively. These formants are found to be tractable and robust, thereby facilitating a more complete description of the sung-vowel space. 1. INTRODUCTION A major problem inherent to the acoustic analysis of sung vowels is the lack of a parameterisation method that can resolve phonetic and timbreal information whilst maintaining robustness for rising pitch. Traditional formant analysis has been the primary focus for some time due to the information that formant frequencies readily carry about vocal-tract shapes, phonetic distinctiveness and speaker specificity. However, traditional formant techniques cannot account for a complete characterisation of sung vowels across all singing voice types. Previous works clearly reflect this limitation. 1.1. Background Acoustic formant analysis of sung vowels has been successful for low-pitched voice registers. The wide spacing of harmonics in high-pitched singing however, makes the evaluation of acoustic formant frequencies problematic and unreliable. The use of spectrographic parameterisation and componential description of the acoustic formant structure (i.e., formant by formant, and vowel by vowel), was pioneered by [11], whilst [3] proposed a systemic approach to sung vowels in the phonetic space spanned by the three lowest formants. The results of these studies provided valuable phonetic and timbreal information about the sung vowel but were still limited in regard to high-pitched voices. In an attempt to overcome the problem inherent to high-pitched singing, there have been a number of perceptually-motivated acoustic studies of the singing voice. Principal Component Analysis of 1/3 octave filter bank outputs was utilised by [1] to study the differences between spoken and sung vowels. This dimensionality- reduction approach yielded a spatial representation of sung vowels, which affords discrimination in a phonetic-like space spanned by the two major dimensions. However, the approach is dependent on the availability of a statistically-significant sample of sung vowels, required to define each vowel as a function of its displacement from other vowels, making cross speaker or single vowel comparison problematic. A more promising approach known as Perceptual Linear Prediction (PLP) has arisen from [6]. It affords the possibility of extracting spectral features automatically from the acoustic signal, which are related to formant frequencies while being perceptually-motivated. The PLP model initially developed for spoken sounds was exploited only recently in [8], a study of sung vowels. The results reported therein provide evidence of the interpretive power of PLP-derived formants for spoken as well as for sung vowels. 1.2. Objectives and outlines of this study The work reported in this paper seeks to extend the work of [8] by looking at the behaviour of PLP-derived formants for vowels sung by a soprano through her full range of pitches. The body of the paper consists of three major sections. In Section 2 the sung-vowel material and the PLP procedure are outlined, together with a brief evaluation of the PLP-derived formants. Section 3 gives a componential description of each of the sung vowels, while Section 4 provides a systemic perspective of the sung-vowel space. Section 5 summarises our findings. ICPhS XVI ID 1458 Saarbrücken, 6-10 August 2007 www.icphs2007.de 901