Automatic recognition of emotions in spoken Finnish : preliminary results and applications Juhani Toivanen * , Tapio Seppänen ** , Eero Väyrynen *** Abstract In this paper, research on the automatic recognition of basic emotions in spoken Finnish is reported. The investigation was carried out utilizing the MediaTeam Emotional Speech corpus, which is currently the largest emotional speech database for Finnish. In this investigation, three experiments were carried out. In the first two experiments, mainly speaker-dependent automatic classification of emotions was tested. In the third experiment, a scenario involving speaker-independent classification was used. The first results seem promising, and they can be applied when developing e.g. methods for content-based information retrieval. 1. Introduction Human social communication rests to a great extent on exchanges of non-verbal signals, including the (non-lexical) expression of emotion through speech. Emotions play a significant role in social interaction, both displaying and regulating patterns of behavior and maintaining the homeostatic balance in the organism. In everyday communication, certain emotional states, for example, boredom and nervousness, are probably expressed mainly only non-verbally since socio-cultural conventions usually demand that patently negative emotions be concealed (a face-saving strategy in conversation). Today, the significance of emotions is largely acknowledged across scientific disciplines – “Descartes’ error” (i.e. the view that emotions are “intruders in the bastion of reason”) is being corrected, and the importance of emotions/affect also from the viewpoint of rational decision-making is understood better (Damasio, 1994). Though there is a large literature on the vocal correlates of emotion, no definitive theoretical model of the vocal expression of emotion has been proposed. To date, the research has concentrated on such major languages as English and German ; very little is currently known about the way in which emotion is vocally expressed in Finnish. It should be pointed out that, intonationally, Finnish is substantially different from English, for example : one important difference is that rising utterance-final tones are rare in non-emphatic and non-dialectal spoken Finnish (though they are now becoming more common especially in the Finnish spoken in the Helsinki area). Basically, the role of vocal cues in the communication of affect can be investigated at the signal level and at the symbolic level. Such perceptual features of voice quality as e.g. “tense”, “lax”, “metallic” or “soft” voice (Laver, 1994) can be traced back to a number of continuously variable acoustic/prosodic features of the speech signal. These features are F0-related, intensity-related, temporal and spectral features of the signal, including, for example, average F0 range, average RMS intensity, average speech/articulation rate and the proportion of spectral energy below 1,000 Hz. At the symbolic level, the distribution of tone types and focus structure in different syntactic structures can convey emotional content. The vocal parameters of emotion may be partially language-independent at the signal level (however, this is still very much an open question). For example, according to the “universal frequency code” proposed by Ohala (1983), high pitch universally depicts supplication, uncertainty and defenselessness, while low pitch conveys dominance, power and confidence. Similarly, high pitch is common when the speaker is fearful, such an emotion being typical of a “defenseless” state (Bolinger, 1989:13). The symbolic vocal expression of emotion is typically based on “categorical” contrasts essential to the phonological prosodic structure of language. The distribution and types of accents and tones, as well as signaling the syntactic and informational structure of spoken language, can convey emotional meaning. These features usually represent discrete categories, in contrast to non-discrete vocal correlates of * MediaTeam, University of Oulu, juhani.toivanen@ee.oulu.fi ** MediaTeam, University of Oulu, tapio.seppanen@ee.oulu.fi *** MediaTeam, University of Oulu, eero.vayrynen@ee.oulu.fi