Formation of the Actor’s/Speaker’s Formant: A Study Applying Spectrum Analysis and Computer Modeling *Timo Leino, *Anne-Maria Laukkanen, and †Vojte ˇ ch Radolf, *Tampere, Finland, yPrague, Czech Republic Summary: Hypothesis. A strong peak between 3 and 4 kHz in the long-term average spectrum (LTAS) of speech has been found to be one correlate of a good male speaking voice, for example, among actors. The actor’s or speaker’s formant (resembling the singer’s formant) can be established by certain vocal training. This study investigates the origin of the speaker’s formant. Study Design and Setting. The immediate effects of a vocal exercise series on speaking voice were studied in a Finnish male actor, who is an experienced teacher of the exercises. They consist of nasal vowel syllable strings and words containing nasals. Before and after a 30-minute exercising, the subject (1) read aloud at three loudness levels and (2) phonated the Finnish vowels at habitual level. Methods. Formant frequencies were estimated from spectra of the vowel samples. LTAS was made and equivalent sound level (L eq ) was measured for the text samples. Formant frequencies were used as the input for a one-dimensional (1D) mathematical model. Results. After the exercise, the peak at 3.5 kHz in the LTAS of the reading samples was stronger, although L eq was the same as before, suggesting a level-independent resonance change. Reading samples after exercising were evaluated to sound better in voice quality than before exercising. The strong peak at 3.5 kHz was present in all vowels, and it was mainly formed by clustering of F 4 and F 5 . Conclusions. A 1D model-based optimization suggested that this kind of a formant cluster could be best established by simultaneously narrowing the epilaryngeal tube, widening the pharynx and narrowing the front of the oral cavity. Key Words: Vocal exercising–Voice quality–Spectrum analysis–Mathematical modeling–Optimization. INTRODUCTION The long-term average spectrum (LTAS) provides information on the spectral distribution of the speech signal over a period of time. 1 If the signal is sufﬁciently long (eg, 1 minute) and pho- netically balanced, LTAS gives information on the average voice quality. 1 The method is attractive, because it is easy to use in the vocologist’s daily routine. The method has been found to distinguish between normal and pathological speaking voices and between different degrees of hoarseness. 2,3 It also re- veals differences between phonation types. 4–7 It has, moreover, been used to study singing voice in different styles and song genres 8–10 and the effects of singing voice training. 11 Leino 12 applied LTAS for studying the speaking-voice quality of profes- sional male actors. In these results, samples evaluated as repre- senting poor voice quality were distinguished from those with rather poor, fairly good, and good voice qualities by the steepest spectral slope. 12 The best voices, in turn, were characterized especially by a prominent peak between 3 and 4 kHz. 12 The concept ‘‘good voicequality’’ is naturally very difﬁcult to deﬁne exhaustively and most likely consists of various elements. However, Leino’s 12 results suggest that the shape of LTAS has a certain perceptual relevance in the evaluation of voice quality. A gentle spectral slope and a prominent peak at 3.5 kHz seem to be some of the features often characterizing a good male speaking voice. These spectral characteristics were taken as goals on a special 8-month intense voice training course for student actors. 13 Real-time spectrum analysis was used as an aid in the training sessions. According to the LTAS results of speaking samples recorded before and after the training period, the goals of the training were reached. The samples were also evaluated to sound better after training. The listeners consisted of theater and speech professionals, student actors, and university students. The results conﬁrm the earlier ﬁnding that the spectral slope and prominence of the peak between 3 and 4 kHz are characteristics of a good voice quality in speech. A relatively strong peak at 3.5 kHz can also be seen in the LTAS of speakers of other languages than Finnish: for example, in the article by Dejonckere, it can be seen for a French- speaking male subject 14 ; in the article by Frøkjær-Jensen and Prytz, for a Danish speaker 15 ; and in the book by Nolan, for an English speaker. 6 Nawka et al have reported it in good voices of German speakers 16 ; Bele 17 observed it in Norwegian male professional speakers (actors and teachers); and Master et al 18 observed it in Brazilian Portuguese-speaking male actors. Cleveland et al report it in country singers’ singing voices. 9 The prominent peak between 3 and 4 kHz in the LTAS of a good male speaking voice seems to resemble the singer’s for- mant, a strong energy concentration between 2 and 3 kHz in male operatic singing voice 19 (Figure 1). Although the singer’s formant lies lower in frequency and is stronger than the ‘‘actor’s formant,’’ both seem to be correlates of good voice quality, and both can be achieved through training. However, an actor’s formant can also be seen in the LTAS of untrained good male voices, whereas the singer’s formant is mainly achieved through classical singing training. Accepted for publication October 8, 2009. From the *Department of Speech Communication and Voice Research, University of Tampere, Tampere, Finland; and the yDepartment of Dynamics and Vibrations, Institute of Thermomechanics, the Academy of Sciences of the Czech Republic, Prague, Czech Republic. Address correspondence and reprint requests to Timo Leino, Department of Speech Communication and Voice Research, FIN-33014, University of Tampere, Tampere, Finland. E-mail: Timo.Leino@uta.ﬁ Journal of Voice, Vol. 25, No. 2, pp. 150-158 0892-1997/$36.00 Ó 2011 The Voice Foundation doi:10.1016/j.jvoice.2009.10.002