Proceedings of ICAD 05-Eleventh Meeting of the International Conference on Auditory Display, Limerick, Ireland, July 6-9, 2005 ICAD05-308 Real-time, Head-tracked 3D Audio with Unlimited Simultaneous Sounds Craig Jin, Teewoon Tan, Alan Kan, Dennis Lin, André van Schaik Keir Smith, Matthew McGinity Computing and Audio Research Laboratory, iCinema Centre for Interactive Cinema Research School of Electrical and Information Engineering, University of Sydney, Sydney, Australia. University of New South Wales, Sydney, Australia craig@ee.usyd.edu.au , teewoon@ee.usyd.edu.au , akan@ee.usyd.edu.au , dlin@ee.usyd.edu.au , andre@ee.usyd.edu.au keirs@cse.unsw.edu.au , mmcginity@cse.unsw.edu.au ABSTRACT This research presents a novel 3D audio playback method in which real-time head-tracking is maintained with an unlimited number of simultaneous sound sources. The method presented relies on using a 500-900MByte sound buffer which contains binaural data for 385 head orientations and a processing platform with two hard disks in a RAID 0 configuration that can stream data at a rate of 80-100 MBytes/s. We discuss issues related to how the number of head-orientations influences a smooth presentation, how the window length influences smooth transitions between different head-orientations and the file format used for storing the sounds. The new 3D audio playback method was incorporated into a 3D audio playback engine (3DApe) which can: play a 3D audio soundtrack consisting of an unlimited number of simultaneous sound sources, switch between different 3D audio soundtracks, play back up to 8 simultaneous and instantaneous sound sources on command, use a head-tracker interface via the virtual reality peripheral network (VRPN), supply 3D audio communication using voice over IP, and interface with a Virtools graphical software engine. 3DApe was demonstrated as part of an interactive 3D cinematic artwork, entitled Conversations, that was on display at the Powerhouse Museum in Sydney in December 2004 [1]. 1. INTRODUCTION For many years, interest in 3D binaural systems has been in the area of scientific research, simulation and entertainment [2]. In more recent years, there has been a growing interest in the application of 3D binaural systems in virtual reality and augmented reality applications [3]. In virtual reality applications, sound sources are often presented over headphones to give precise control over the virtual auditory environment. Head-related transfer functions (HRTFs) are used to create virtual auditory space and have commonly been recorded on humans and manikins to allow externalization and spatialisation of sound sources presented in a binaural audio display. An HRTF is an acoustic transfer function that describes the sound pressure transformation from a location in the free-field to the listener’s eardrum [4]. HRTFs contain the acoustic cues necessary for spatial hearing: the interaural time difference cue (ITD), the interaural level difference cue (ILD) and the spectral transformations applied by the outer ear. More details about the acoustic cues for spatial hearing can be found in [5]. Head-orientation sensors have also been used with binaural audio systems to heighten the experience of the user by allowing head movement. However, head movement increases the complexity of the audio rendering engine as it has to present sound sources in the correct locations relative to the user’s head orientation. In current real-time, head-tracked 3D audio systems there is a trade off between keeping the system running in real- time and increasing the number of simultaneous sound sources that are rendered spatially. In one extreme, the number of simultaneous sound sources is limited and the HRTF filtering for spatialisation is performed on-the-fly. At the other extreme, the number of simultaneous sound sources can be unlimited, but the HRTF filtering is performed offline with no head-tracking during playback. Hence the real conflict is between real-time head-tracking and the number of simultaneous sound sources. Dedicated hardware can be used to achieve 3D audio with head- tracking, but this can turn out to be costly, e.g., Lake Technology’s Huron processor uses one DSP chip per sound source [6]. This paper presents a method that has been incorporated into a 3D audio playback engine (3DApe) that runs on standard PC hardware and allows real-time head- tracking to be maintained with an unlimited number of simultaneous sound sources 3DApe was developed to meet the audio demands of Conversations, an interactive 3D cinematic artwork designed by the iCinema Centre for Interactive Cinema Research at the University of New South Wales [1]. A distributed, multi-user virtual environment, Conversations revolves around the escape of Ronald Ryan and Peter Walker from a Melbourne prison on December 19, 1965. During the escape, a prison guard was shot and killed, a crime for which Ryan was subsequently tried. Despite some somewhat incongruous witness accounts (were one or two shots fired?), Ryan was found guilty and sentenced to death. During the experience, the user is able to witness a reenactment of the prison break as a 2 minute spherical stereo film. Using a head-tracker and head-mounted display, the user is placed at the scene of the crime and is free to rotate his or her head to choose any point of view (Figure 1). Given the relatively narrow field of view provided by contemporary head- mounted displays (in this case, a Daeyang i-visor DH-4400VPD