Beyond the right hemisphere: brain mechanisms mediating vocal emotional processing Annett Schirmer 1 and Sonja A. Kotz 2 1 Department of Psychology, University of Georgia, Athens, Georgia, USA 2 Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany Vocal perception is particularly important for under- standing a speaker’s emotional state and intentions because, unlike facial perception, it is relatively inde- pendent of speaker distance and viewing conditions. The idea, derived from brain lesion studies, that vocal emotional comprehension is a special domain of the right hemisphere has failed to receive consistent support from neuroimaging. This conﬂict can be reconciled if vocal emotional comprehension is viewed as a multi-step process with individual neural represen- tations. This view reveals a processing chain that proceeds from the ventral auditory pathway to brain structures implicated in cognition and emotion. Thus, vocal emotional comprehension appears to be mediated by bilateral mechanisms anchored within sensory, cognitive and emotional processing systems. Introduction It is a long-established [1] and persistent notion [2] that the right hemisphere is specialized for processing the emotional information conveyed in a speaker’s voice. This notion derives from research demonstrating that damage to the right hemisphere is more detrimental to an individual’s ability to recognize vocal emotional expressions than is damage to the left hemisphere [3]. Despite its signiﬁcance for current views of brain function, the right hemisphere model is not unchallenged. For example, evidence exists that emotionally relevant acoustic cues such as frequency and temporal information are differently lateralized in the brain [4,5]. Furthermore, some studies implicate subcortical structures such as the basal ganglia [6] and the amygdala [7]. However, rather than leading to a uniﬁed model of vocal emotional processing, these ﬁndings nourish opposing views which divide the ﬁeld of vocal emotion research. One approach to integrating the seemingly conﬂicting ﬁndings is to consider vocal emotional comprehension as a multi-step process with individual sub-processes that are differentially represented in the brain. These sub- processes can be described as (i) analyzing the acoustic cues of vocalizations, (ii) deriving emotional signiﬁcance from a set of acoustic cues, and (iii) applying emotional signiﬁcance to higher order cognition. The work reviewed here addresses these sub-processes and elucidates their neuroanatomical and temporal underpinnings. Moreover, the ﬁndings are integrated into a working model of vocal emotional processing. The sounds of emotion Whether we think someone is scared or annoyed greatly depends on the sound of his or her voice. That the voice can betray these feelings is the result of vocal production being modulated by physiological parameters that change depending upon emotional state. Arousal mediated changes in heart rate, blood ﬂow and muscle tension, among other things, modulate the shape, functionality and sound of the vocal production system. For example, increased emotional arousal is accompanied by greater laryngal tension and increased subglottal pressure which increases a speaker’s vocal intensity. Additionally, vocal emotional expressions reﬂect communicative intentions. For example, Darwin observed that angry utterances sound harsh and unpleasant because they are meant to strike terror into an enemy [8]. Together physiological modulations of the vocal production system and communi- cative intentions shape the way we speak, making us sound frightened or frightening. The acoustic cues that convey emotions comprise amplitude, timing and fundamental frequency (F0), the last of these being perceived as pitch. An additional cue to emotion is voice quality – the percept derived from the energy distribution of a speaker’s frequency spectrum, which can be described using adjectives such as shrill, harsh, or soft. As some emotions are believed to have a unique physiological ‘imprint’, they are proposed to be expressed in a unique manner. For example, happiness is characterized by fast speech rate, high intensity, mean F0 and F0 variability sounding both melodic and energetic. By contrast, sad vocalizations are characterized by slow speech rate, low intensity, mean F0 and F0 variability but high in spectral noise resulting in the impression of a ‘broken’ voice [9] (Figure 1). Thus, understanding a vocal emotional message requires the analysis and integration of a variety of acoustic cues. Corresponding author: Schirmer, A. (schirmer@uga.edu). Available online 29 November 2005 Review TRENDS in Cognitive Sciences Vol.10 No.1 January 2006 www.sciencedirect.com 1364-6613/$ - see front matter Q 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.tics.2005.11.009