Stylization of glottal-ﬂow spectra produced by a mechanical vocal-fold model Denisse Sciamarella, Christophe D’Alessandro LIMSI-CNRS Universit´ e de Paris-Sud 91403 ORSAY (France) sciamarella@limsi.fr Abstract A method is proposed to extract glottal-ﬂow spectra from nume- rical simulations of vocal-fold behavior with a two-mass model including dynamic ﬂow separation. The numerical spectrum, whose general form complies with that of signal glottal-ﬂow models, allows stylization with three linear segments. The slope of the ﬁrst segment remains relatively constant when source control parameters are varied, whereas the slope of the last seg- ment (i.e. the spectral tilt) is highly sensitive to the vibrating vocal-fold mass, tension and stiffness. The phase of mobility of the ﬂow separation position within the glottal cycle may intro- duce, if long enough, a dip in the glottal-ﬂow spectrum. 1. Introduction The voice source is the origin of all the features of speech related to voice quality, vocal effort, and prosodic variations. The possibility of charaterizing glottal activity through time- domain physiological analyses (such as electroglottography, ultra-rapid photography or electromyography) strongly encou- raged time-domain modeling of the glottal pulse or waveform, also called ﬂow glottogram ([1, 2, 3]). However, when it comes to describing vocal quality, the glottal-ﬂow spectrum is believed to be more suitable than the glottogram itself (as has been thou- roughly noted in most applications involving speech analysis and synthesis). According to [4] for instance, one of the main spectral parameters for synthesizing voices with different qua- lities is the rate of decay of the voice source spectrum (known as spectral slope or spectral tilt). A number of studies focus on this issue : [5, 6, 7, 8]. A uniﬁed framework for studying the time and frequency domain properties of glottal ﬂow models has been propo- sed in [9]. In this work, the authors show that the glottal- ﬂow spectra can be stylized by 3 straight lines in a log- magnitude/log-frequency representation characterized by ﬁve frequency-domain parameters : fundamental frequency, spec- tral peak amplitude and frequency, quality factor of the spectral peak and spectral tilt cut-off frequency. The aim of this paper is to inspect, by means of numerical simulations, in which manner the adjustment of the mechanical properties of the vocal folds and subglottal pressure inﬂuence the glottal-ﬂow spectrum (see [10] for an exhaustive study on the glottal-ﬂow waveform). The numerical method we present consists in computing the Fourier transform for the glottal pulse derivative obtained by simulating vocal fold behavior with a recent two-mass model. The procedure is completed by mea- suring the three slopes stylizing the spectrum. The results of the numerical experiments give some insight into the dynamic role of the vocal source behavior on voice quality. FIG. 1 – Schematic diagram of the vocal fold 2-mass model. 2. The method The production model used in the simulations is the so cal- led Niels Lous two-mass model, whose main features are the inclusion of dynamic ﬂow separation within the glottal chan- nel and the assumption of a symmetrical glottal structure [11] ( i.e. the punctual masses m1 and m2 in Figure 1 are assigned the same value m). Such a model is chosen for its conceptual simplicity and its well known rich behavior . The main speaker’s control parameters in our model are subglottal pressure Ps, vocal fold tension k, the tension kc cou- pling the lower and upper parts of the vocal folds -also cal- led vocal-fold stiffness, the vocal fold vibrating mass m and length Lg . Typical values for these parameters are : m ≈ 0.1 g, k ≈ 40 N/m, kc ≈ 25 N/m, Lg ≈ 1.4 cm and Ps ≈ 8 cmH2O. These typical values are certainly not static in speech. They make part of the active control parameters that the speaker can change or adjust. For instance, Lg can be stretched in 3 or 4 mm during phonation. We will hereafter focus on the repercusion of each of these parameters on the shape of the glottal-ﬂow spectrum. With this purpose, control parameters will be set to adopt a number of values p within a physiologically meaningful range in order to compute the glottal ﬂow U p g (t) (as well as its derivative, y p 1 (t),y p 2 (t), etc.) for each p value. An algorithmic procedure will extract a glottal-ﬂow pulse sample from each time series U p g (t) (after excluding transients) and analyse U ′p g (t) to com- pute the time domain and frequency domain parameters. The magnitude of the discrete Fourier transform (|F (U ′p g )(s)|) of the glottal-ﬂow derivative is numerically computed and as follows : 10.21437/Interspeech.2005-435