Eﬀect of Various Spatial Auditory Cues on the Perception of Threat in a First-Person Shooter Video Game Konstantin Semionov contact@konstantinsemionov.com Edinburgh Napier University Edinburgh, Scotland Dr Iain McGregor i.mcgregor@napier.ac.uk Edinburgh Napier University Edinburgh, Scotland ABSTRACT This study interviewed game audio professionals to establish the implementation requirements for an experiment to ascertain the eﬀect of diﬀerent spatial audio localisation systems on the per- ception of threat in a ﬁrst-person shooter. In addition, a listening study was carried out involving 35 members of the public, and using three scenes made in Unreal Engine 4 with custom designed sound, authored in Wwise. This established that spatial audio does, indeed, have a noticeable eﬀect on players’ perception of threat. Each spatial audio system, however, had diﬀerent eﬀects on the perception of threat, stealth, realism and position estimation, on all three diﬀerent visual scenes. Meanwhile, the absence of spatial audio can confuse people and contribute towards inaccurate enemy localisation. With this in mind, a tailored approach to the game design requirements for each project is recommended. Rather than a single, spatialised design for the entire game, each scene should have its own design solution. CCS CONCEPTS • Applied computing → Sound and music computing; • Human-centered computing → Human computer interac- tion (HCI). KEYWORDS Sound design, ﬁrst-person shooter, threat perception, audio im- plementation, spatial audio ACM Reference Format: Konstantin Semionov and Dr Iain McGregor. 2020. Eﬀect of Various Spatial Auditory Cues on the Perception of Threat in a First-Person Shooter Video Game. In Proceedings of Audio Mostly (AM’20), September 15–17, 2020, Graz, Austria. ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/3411109. 3411119 1 INTRODUCTION Game developers are implementing more advanced audio options in their games [18]. However, there is not enough available research to suggest whether those options enhance gameplay, and if they cause players to experience the game diﬀerently. We cannot, using existing research, deduce to what extent do spatial localisation Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice and the full citation on the ﬁrst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior speciﬁc permission and/or a fee. Request permissions from permissions@acm.org. AM’20, September 15–17, 2020, Graz, Austria © 2020 Association for Computing Machinery. ACM ISBN 978-1-4503-7563-4/20/09. . . $15.00 https://doi.org/10.1145/3411109.3411119 techniques bring beneﬁt to players of FPS games, and if they make games feel more realistic. Linking a character’s generated sound to an expectation of threat, without the threat being visible, is one sound design technique that can alert a player to a potential threat, and prompt caution in sub- sequent gameplay [16]. Audio spatialisation techniques are said to provide a way to identify opponent location accurately [42]. Dis- covering whether these contribute towards the perception of threat within a competitive game would be helpful – as could identifying which are most eﬀective. Localisation and reverberation have been identiﬁed as crucial aspects of a ﬁrst-person shooter experience when enemies are present in the game, due to the “hunter and hunted” nature of the genre [24]. The role of occlusion in spatial localisation, however, is less clear. Spatial localisation could also be achieved with the use of loudspeaker-based surround systems. Two separate experiments, however, found no signiﬁcant improvements going from stereo to surround systems in a third-person computer game on either speakers or headphones [42, 43]. Binaural reproduction has been found to convey better spatial attributes compared to stereo for headphone users [51]. Sound source position in the three-dimensional space is deter- mined for a listener using horizontal plane azimuth, elevation in the vertical plane and distance depth. Horizontally, the azimuth is calculated with information arriving at both ears, using inter- aural time diﬀerences (ITD) and interaural intensity diﬀerences (IID) [44]. Interaural time diﬀerence (ITD) is a human mechanism that localises the source of sound on a horizontal plane, based on the diﬀerence between the nearest ear and the furthest ear away from the sound source [35]. Studies have found this system is able to discriminate between angles down to 2° in the 20Hz to 2kHz range [10]. Similar to ITD, interaural intensity diﬀerence (IID) is a mechanism that helps with the identiﬁcation of a sound source position, however, it takes into account diﬀerence in sound source volume (intensity) between the ears [35]. Compared to ITD, IID provides better localisation for frequencies above 3 kHz [44]. Vertical localisation is a result of absorption, reﬂection and diﬀraction eﬀects, caused by the pinna of the ear as well as the head, shoulders and chest [35, 44]. These spectral cues are called head- related transfer functions or HRTFs and can vary greatly between individual people due to diﬀerences in the shape and size of the outer ear [44]. HRTF-based vertical sound localisation has been found to function monoaurally in humans for high frequency (more than 7 kHz) content [14]. HRTF-based 3D audio renderers are used in video games to deliver a mixdown of various sound sources within the game. However, it has been suggested that more