Impact of different speech types on listening effort Olympia Simantiraki 1 , Martin Cooke 1 , Simon King 2 1 Language and Speech Laboratory, Universidad del Pa´ ıs Vasco, Vitoria, Spain 2 Center for Speech Technology Research, University of Edinburgh, Edinburgh, UK olympia.simantiraki@ehu.eus, m.cooke@ikerbasque.org, Simon.King@ed.ac.uk Abstract Listeners are exposed to different types of speech in everyday life, from natural speech to speech that has undergone modifica- tions or has been generated synthetically. While many studies have focused on measuring the intelligibility of these distinct speech types, their impact on listening effort is not known. The current study combined an objective measure of intelligibility, a physiological measure of listening effort (pupil size) and lis- teners’ subjective judgements, to examine the impact of four speech types: plain (natural) speech, speech produced in noise (Lombard speech), speech enhanced to promote intelligibility, and synthetic speech. For each speech type, listeners responded to sentences presented in one of three levels of speech-shaped noise. Subjective effort ratings and intelligibility scores showed an inverse ranking across speech types, with synthetic speech being the most demanding and enhanced speech the least. Pupil size measures indicated an increase in listening effort with de- creasing signal-to-noise ratio for all speech types apart from synthetic speech, which required significantly more effort at the most favourable noise level. Naturally and artificially modified speech were less effortful than plain speech at the more adverse noise levels. These outcomes indicate a clear impact of speech type on the cognitive demands required for comprehension. Index Terms: listening effort, pupil response, speech percep- tion, synthetic speech 1. Introduction In our everyday life we are exposed to a variety of speech types, both naturally and artificially produced. Talkers modify their speech when exposed to noise. Live and recorded public ad- dress announcements may involve modifications designed to enhance intelligibility. Synthetically-generated speech is com- monplace in mobile devices and telephone enquiry systems. Correct message reception is critical in many situations and consequently a great deal of effort has been devoted to under- standing the effect of differing speech styles on intelligibility [1]. However, far less emphasis has been placed on investi- gating the effort required to understand distinct speech types. Exposure to conditions that require a listener to exert substan- tial effort and the engagement of additional cognitive resources may lead to long term fatigue. Such conditions might be de- graded source signals, sound transmission interference or lim- itations of the receiver (see review in [2]). The current study examines listening effort for distinct speech types under condi- tions of additive noise. Listening effort has been estimated using subjective mea- sures such as questionnaires, behavioural metrics (e.g. response time), and via physiological measures such as pupillometry [3]. For instance, [4] obtained psychophysiological recordings (heart rate, skin conductance, skin temperature and electromyo- graphic activity) during speech perception tasks with intelligi- bility close to ceiling, but with varying task demands involving digit presentation to one or both ears. Increased mean skin con- ductance and electromyographic activity were observed when task demand increased. Multi-task paradigms are typically em- ployed in studies for measuring behavioural responses. In [5], a dual-task paradigm was used to assess listening effort at a wide range of signal-to-noise ratios (SNRs). Reaction times, in line with subjective effort measures, showed less effort exertion for lower SNRs. In [6] the benefit of a digital noise reduction algorithm was tested using dual-task paradigms either in quiet or in the presence of a 4-talker babble masker at various SNRs. Noise reduction was found to both reduce effort and benefit per- formance in simultaneous tasks. Behavioural measures alone cannot systematically measure changes in effort, but such changes can be revealed through variations in pupil size [7]. Several studies have used the pupil- lary response as an objective indicator of listening effort while measuring speech perception under noisy conditions. Features typically used to estimate effort are the mean pupil dilation, peak pupil dilation (PPD) or delay to reach the peak (latency). These features show an increasing trend with decreasing intel- ligibility [8]. PPD has been shown to reflect listening effort when tested in speech performance tasks involving sentences presented in conditions of informational or energetic masking [8, 9], with more effort observed for a competing talker masker than stationary or fluctuating maskers [10, 11]. Listening effort typically is maximised for speech-in-noise tasks at intelligibil- ity levels of around 50% [9, 12, 5]. Pupillometric measures of effort have also been obtained as a function of syntactic com- plexity [13], attention to location [14] and spectral resolution [15]. The current study uses pupillometry, subjective judgements and intelligibility scores to investigate the effect of four dis- tinct speech types on listening effort. In addition to plain nat- ural speech, we examine one naturally-modified form (Lom- bard speech), one algorithmically-modified form designed to enhance intelligibility, as well as synthetic speech, using the same set of sentences in each case. Listeners heard sentences presented in one of three levels of speech-shaped noise noise. 2. Methods 2.1. Participants Twenty-six normal-hearing, native British English participants (6 males and 20 females between 18 and 24, with a mean age of 20.5, SD =1.8) were recruited. Participants were asked not to wear glasses and eye makeup [8]. Participants underwent pure- tone hearing screening; all had a hearing level less than or equal to 25 dB in both ears. Data from two participants was excluded from the analysis due to technical problems during recording. Interspeech 2018 2-6 September 2018, Hyderabad 2267 10.21437/Interspeech.2018-1358