ACOUSTIC CORRELATES OF TASK LOAD AND STRESS K. R. Scherer, D. Grandjean, T. Johnstone, G. Klasmeyer, and T. Bänziger Department of Psychology University of Geneva, Switzerland Klaus.Scherer@pse.unige.ch ABSTRACT It is argued that reliable acoustic profiles of speech under stress can only be found if different types of stress are clearly distinguished and experimentally induced. We report first results of a study with 100 speakers from three language groups, using a computer-based induction procedure that allows distinguishing cognitive load due to task engagement from psychological stress. Findings show significant effects of load, and partly of stress, for speech rate, energy contour, F0, and spectral parameters. It is further suggested that the mean results for the complete sample of speakers do not reflect the amplitude of stress effects on the voice. Future research should isolate and focus on speakers for whom the psychological stress induction has been successful. 1. INTRODUCTION Since antiquity the voice has been considered a reliable readout of the speaker's affective arousal, particularly suited for the communication of emotion. Beginning in the early 20 th century, researchers have attempted to identify the acoustic profiles of the major emotions (see [1] for a review), a quest that is starting to yield reliable results [2]. When the notion of "stress" became a fashionable concept in the fifties and sixties, researchers became interested in identifying the acoustic correlates of stress (see [3] for a review). Deviation of fundamental frequency (F0) parameters (mean and variability) from baseline has been found to be the most reliable indicator of stress, as shown by a large majority of studies in the field [3, 4]. Similar effects have been found in studies of the amplitude and duration of spoken utterances. Mean envelope amplitude is generally greater, and mean utterance duration lower, for highly stressed subjects [3, 5]. In addition, formant structure seems to change under stress [5]. While some of these results seem relatively stable, allowing replication across studies, there is little evidence for a general "acoustic stress profile." As Protopapas and Lieberman [4, p. 2267] state: "…in all of the aforementioned studies it was evident that the acoustic correlates of emotion in the human voice are subject to large individual differences (i.e., among speakers). Streeter et al. (1983) concluded that there are no "reliable and valid indicators of psychological stress" (p. 1359)." Murray et al. [6, p. 12] summarize the insights gained during a workshop on "Speech under stress" as follows: "In conclusion: Stress and its effect on speech seems to be a very complicated area of study which is very poorly understood at present." This state of affairs is hardly surprising since this area of study is beset by two major problems: 1) The existence of many different kinds of stress and the absence of clear conceptual and operational distinctions between them, and 2) the difficulty of experimentally inducing stress in a reliable fashion in the laboratory. As to 1), the existence of the general concept of "stress" and its rather indiscriminate usage in everyday life and in scientific research seems to suggest that the enormous panoply of states loosely referred to by this term have something in common. However, this common factor has yet to be identified. For example, Ruiz et al. [7] compare a laboratory stress situation (a female subject performing a cognitively challenging task) with a real stressor (two pilots discussing technical problems just before an airplane crash). While both situations imply a higher-than-normal cognitive load, the danger component that looms large in the real situation is absent from the lab task. Most likely, then, the crash situation implies an emotional stress, whereas the lab situation should at most produce mild cognitive stress. As one might expect, the authors find important quantitative and qualitative differences between the situations and the speakers involved. Unfortunately, in this study, as in many other studies in this area, single cases, involving only one or very few speakers, are examined. This procedure seems defensible in cases in which there are unique records of extreme stress or emotion in real-life settings (such as cockpit voice recordings). Acoustic analyses of such special cases may provide important clues as to the nature of affective voice changes and may inform the formulation of hypotheses. However, the analysis of single speakers is less appropriate when hypotheses are to be tested or when parametric estimates (e.g., of degree of F0 change) are intended. The single-case approach precludes the use of inferential statistics and thus the examination of the significance and effect of size of observed differences as well as the calculation of confidence intervals. In consequence, there is little justification for using only one (or a very small number) of speakers in laboratory studies. As part of a large-scale study on the psychological, physiological, and behavioral effects of different kinds of stress on different types of individuals, Scherer and his collaborators [see 5] systematically compared the effects of inductions of cognitive stress (difficult arithmetic tasks) and emotional stress (slides showing major injuries) in 60 subjects. Tolkmitt and Scherer [5] showed strong, statistically significant differences in the way different individuals reacted to the two types of