DISCRIMINATION OF EMOTIONAL AND LINGUISTIC PROSODY WITH COCHLEAR IMPLANT SIMULATIONS Daan J. van de Velde a,d , Arian Khoshchin b , Linda ter Beek b , Niels O. Schiller a , Johan H. M. Frijns c,d , Jeroen J. Briaire c a Leiden University Centre for Linguistics, b Leiden University, c Leiden University Medical Center, d Leiden Institute for Brain and Cognition d.j.van.de.velde@hum.leidenuniv.nl ABSTRACT In cochlear implants (CI), temporal sound features are more successfully transmitted than spectral sound features. This could have consequences for the perception of prosody. This was tested for emotion vs. focus perception in simulated CI hearing, starting from the assumption that for emotional prosody spectral (F0) features are more important than for focus prosody. Sets of short Dutch phrases were recorded with neutral, emotional (happy and sad) and focused (e.g., 'a BLUE ball' vs. 'a blue BALL') prosody. Temporal or spectral prosody, or both, were cross- spliced from the non-neutral to the neutral utterances, thus controlling for the usable phonetic cues. 17 Dutch subjects identified intended emotions and focus for vocoded (CI-simulated) and unvocoded versions of the phrases. A benefit of F0 vs. temporal information was found for emotional, but not for focus prosody. This could imply that CI users have more trouble hearing emotional than linguistic (focus) prosody. Keywords: cochlear implants, prosody, vocoders, pitch, temporal information 1. INTRODUCTION Cochlear implants (CI) can provide children and adults suffering from sensorineural hearing loss with a sense of hearing. Most users achieve good speech understanding [10]. Nevertheless, hearing is far from normal and problems remain, such as hearing in noise, hearing music and hearing prosody. These problems are partly due to the differential quality of transmission of different acoustic parameters, such as temporal, intensity and spectral information [7]. In the case of prosody, the difference in transmission quality of acoustic parameters is expected to result in more or less perception difficulties with different types or aspects of prosody, since (in a given language) those different types can be conveyed by different acoustic parameters. One distinction of prosody types where this could play a role is between emotional and linguistic prosody. Emotional prosody is the non- segmental information that reflects the emotional state of the speaker; linguistic prosody is the non- segmental information that conveys (certain) pragmatic information about an utterance. Whereas the acoustic realization and paralinguistic meaning of emotional prosody can be of a gradient nature, those of linguistic prosody are discrete. Furthermore, differences have been found on the neural level [9]. For both the prosody of emotions [5] and of focus (accentuation) in Dutch [8], it has been reported that F0 and temporal (durational and rhythmic) information both play a role. The first goal of the present study was therefore to find out if, for the two types of prosody, the cue weightings of F0 and temporal information are different. The emotional vs. linguistic prosody distinction is one that has (almost) never been investigated in the literature on CI perception. The second goal of this study was therefore to find out if under the degraded acoustic circumstances of CI hearing there would be a difference in discriminability of emotional vs. linguistic prosody in the presence of F0 vs. temporal cues. 2. METHODS In order to find out if emotions and focus were discriminable with CI simulations, two tests were developed (the emotion test and the focus test, respectively) in which, in each trial, participants were asked to choose which of two emotions (EMOTION TEST) or focus positions (FOCUS TEST), respectively, was perceived for a given stimulus sound. All stimuli were repeated in a variant with only F0, only temporal or both types of information. 2.1. Participants 17 Dutch native speakers participated for credits or as volunteers as part of a larger study. 13 of them were right-handed, 15 were men, and their mean age was 20 years (SD = 3.4 years). None had a hearing loss of larger than 40 dB on any of frequencies 125 Hz, 250 Hz, 500 Hz, 1 kHz, 2 kHz, 4 kHz or 8 kHz, as tested with the Oscilla AudioConsole 3.3.2 (InMedico, Denmark).