STYLISATION AND SYMBOLIC CODING OF F 0 : A QUANTITATIVE MODEL Estelle Campione, Emmanuel Flachaire, Daniel Hirst, Jean Véronis Laboratoire Parole et Langage, Université de Provence & CNRS 29 Av. Robert Schuman, 13621 Aix-en-Provence Cedex 1, France Tel. : +33 4 42 95 36 33, Fax : +33 4 42 59 50 96, E-mail: Jean.Veronis@lpl.univ-aix.fr ABSTRACT This paper presents a reversible model for the stylisation and the symbolic coding of macroprosodic fundamental frequency patterns. Prosodic labels are generated auto- matically from the speech signal and can be used to re- generate a synthetic F 0 curve which is as close as possi- ble to the original curve. The model has been tested successfully for 20 speakers in French and Italian. 1. INTRODUCTION F 0 is often considered as the combination of a macro- prosodic component reflecting the speaker's choice of intonation pattern, and a microprosodic component [5] which is entirely dependent on the choice of phonemes in the utterance (lowering of F 0 for voiced obstruents etc.). Numerous studies since the 1960's have attempted to factor out these two components and to extract automati- cally the relevant macroprosodic information from the speech signal. This extraction can be broken down into two stages : • stylisation, i.e. the replacement of the F 0 curve by a simpler numerical function conserving the original macroprosodic information; • symbolic coding, i.e. the representation by means of an alphabet of symbols, reducing the stylised curve to a sequence of discrete categories. The first stage is often referred to as close-copy stylisa- tion [4] which aims to replace the original F 0 curve by a stylised curve perceptually indistinguishable from the original. The discrete categories of the second stage can be used to re-generate a curve which may be distinguish- able from the original one but which is considered by listeners as linguistically equivalent (which De Pijper op. cit. calls "standardised perceptual equivalence"). The results presented in this paper suggest that it may be possible to extend the notion of close-copy to the sym- bolic representation, ensuring that no relevant prosodic information is lost in the process of stylisation and sym- bolic coding: the curve generated from the symbolic coding would then be perceptually indistinguishable from both the stylised curve and from the original curve. This would result in a totally reversible system of analysis: an extremely valuable tool for the automatic coding of large speech corpora. Stylisation has been the object of a great number of studies ([3], [7], [14], [16]) and it can be said that the technique has been mastered fairly satisfactorily. A num- ber of systems of symbolic coding have also been pro- posed, but automatisation and reversibility (close copy) are far less advanced for the symbolic coding than for the stylisation. In this paper we present a system of stylisa- tion and of symbolic coding which allows the generation of an F 0 curve which is very close to the original curve. The system has been applied successfully to French and to Italian. 2. STYLISATION The method of stylisation used in this study: MOMEL (MOdélisation de MELodie) was originally proposed by [7] (see also [8]). Contrary to most other methods of stylisation which use a sequence of straight line seg- ments, MOMEL uses a quadratic spline function (se- quence of parabolic segments) resulting in a continuous, smooth curve, without the angles which occur when using straight lines. Unvoiced segments are interpolated so that the resulting curve presents no discontinuities at all. These characteristics of the quadratic spline function are also shared by the more complex stylisation functions used by Fujisaki and colleagues [6]. It has been argued [13] that stylisation by curvilinear functions is not perceptually distinguishable from that using straight-lines. We note however that : • stylisation by quadratic splines produces a curve which is closer to the original F 0 curve and hence introduces less noise into quantitative studies — in particular in the evaluation of models as in this pa- per; • stylisation by quadratic splines produces a macro- prosodic contour which is practically identical to the F 0 curves produced on utterances consisting entirely of sonorant segments which are both continuous and smooth. The quadratic spline functions used for synthesis can be defined by a sequence of target points corresponding to the significant changes of the F 0 curve (zero-crossings of the first derivative).