Voice qualities and speech expressiveness Izabel Cristina Viola 1 , Sandra Madureira 1 1 Pontifícia Universidade Católica de São Paulo, Brazil izabelviola@uol.com.br, madusali@pucsp.br Abstract The objective of this work is to investigate correlations between the use of settings of vocal qualities and the expression of acted emotions and attitudes in speech. For the description of voice qualities, the phonetic model of description of voice quality settings developed by [1] was used. The corpus is constituted by the poem "I-Juca Pirama", interpreted by a professional actor. The perceptual analysis of the data describes the permanent settings in relation to the physical state, while age and temporary settings are related to the expression of anger, sadness, indignation, happiness and anxiety in order to create suspense and drama. The temporary settings respond to the covariance patterns derived from the activation of the organism: the more tense it is, the more activated it becomes. Index Terms: voice quality; speech expressiveness, emotions, attitudes, perceptual analysis. Introduction Voice is a symbolic index of the personality, seen as a gesture that takes part in a total mimetic game, as a result of both social and individual elements, which is in great part an unconscious symbolization of the general attitude of the person. In the verbal expression, both the roles of the society as well as of the individual come across, in different levels [1]: quality and dynamics of voice, pronunciation, vocabulary and style. In each one of the levels, the social variations (of the idiom and the linguistic habits of a particular group) and the individual variation are distinctively contemplated. In reference to the dynamics of voice, intonation, rhythm, fluency and speed, are discussed by [1] while individual as well as social manifestations have to be evaluated separately. According to [1], the quality of voice is the “lower” and fundamental element because it is the first way of expression of the psycho-physic organism. There are two main approaches to describing voice qualities. One of them focus on phonatory and articulatory phenomena separately [2,3] whereas the other consider those two phenomena simultaneously [5,6]. In the first approach, it is necessary to specify the glottal setting (co-adaptation type of the vocal folds), the resonating structures (laryngeal, pharyngeal, oral and nasal) and the form of articulation that refers not only to the articulatory places (precise and imprecise) as well as the level of the jaw opening (stopped, closed, adequate, open and hyper- articulated). The summing of those elements is the global impression that the voice causes on the listener characterizing the vocal quality, described according to the aspect of major prominence. Because the description is based on most acoustically salient cue, it translates not only the quality of the glottis source or of the resonators (for example, whispering, strident or nasal voices), but also may describe a physical or emotional characteristic of the speaker (such as a childish, masculine, feminine or husky voice [2]. Such description of types of voice quality is rather limited in scope. A vast nomenclature describing types of voice qualities has been used in the contexts of speech pathology assessment clinic and voice counseling and training as a point of departure to pinpoint coincidences and divergences between the personal and social vocal images of the individual, such as the descriptors proposed by [3]. They refer to color (light, dark, golden, no color, white, bright), form (open, rounded, pointed, cut), speed (slow/fast), weight (light/heavy), sex (feminine, masculine), temperature (hot/cold), strength (forced, compressed, relaxed, soft, weak, strong), texture ( rough, soft, hard), humidity (dry/wet), thickness (thick/thin), size (tall, big, short), quality (ugly, beautiful, good, bad, delicious), movement (floating, sparkling, oscillating, trembled), age (old, child like, young) attitude (threatening, cruel, false, convincing, seductive, non sympathetic, uncontrolled), emotion (happy, sad, frightening) and other categories that imply opposition such as, rich/poor; dirty/clean. In spite of the high level of subjectivity that these terms bear, they reflect sound symbolism and the impressions that the voice can cause on the listener. There is no nomenclature capable of precisely describing the broad and complex spectrum of the vocal phenomena. Comprehensive descriptors have been used to describe the countless variations that characterize both short and long-term qualities of voices as [1] points out. The second approach concerns the quality of voice thought as modifications of the vocal tract that occur in different temporal perspectives, conveying information which is expressed either permanently (or long term) or temporarily (or almost permanent characteristic) [4] [5] [6]. In a more permanent way and out of the control of the speaker, voice qualities convey innate characteristics, sex, age and physical state (such as respiratory diseases) and temporarily, controlled or not, voice qualities convey information from a communicative and/or emotional nature. Understanding the quality of voice as a final product of the integration of the source (glottis) and the filters (resonator and articulators), according to the Model Source-Filter [7] makes it possible to locate the parts of the vocal system that really contribute to the quality of the generated sound, considering the influence of the organic and phonetic levels (momentary variations related to speech segments). In a first attempt to provide a less subjective description of vocal qualities, [5,6] proposed an atomistic model that describes identifiable and controlled settings of voice qualities. The vocal quality emerges as a cumulative abstraction in a period of time of the characteristic vocal quality of the speaker, in which it is deducted of the sporadic, momentary fluctuation of the segments, although responding to the psychological and