Real-time Spatial Mixing Using Binaural Processing Christos Tsakostas 1 , Andreas Floros 2 and Yannis Deliyiannis 3 1 Holistiks Engineering Systems, Athens, Greece, tsakostas@holistiks.com 2 Dept. of Audiovisual Arts, Ionian University, Corfu, Greece, floros@ionio.gr 3 Dept. of Audiovisual Arts, Ionian University, Corfu, Greece, yiannis@ionio.gr Abstract — In this work, a professional audio mastering / mixing software platform is presented which employs state- of-the-art binaural technology algorithms for efficient and accurate sound source positioning. The proposed mixing platform supports high-quality audio (typically 96kHz/24bit) for an arbitrary number of sound sources, while room acoustic analysis and simulation models are also incorporated. All binaural calculations and audio signal processing/mixing are performed in real-time, due to the employment of an optimized binaural 3D Audio Engine developed by the authors. Moreover, all user operations are performed through a user-friendly graphical interface allowing the efficient control of a large number of binaural mixing parameters. It is shown that the proposed mixing platform achieves subjectively high spatial impression, rendering it suitable for high-quality professional audio applications. I. INTRODUCTION In computer music synthesis and production, it is a very common approach that the accurate spatial positioning of the sound sources is performed by multiple loudspeaker systems. Amplitude panning represents the most frequently used technique for positioning the sound sources using such multiple speaker setups and nearly all audio mixing devices offer controls for manipulating the level of a specific sound source. In two-dimensional setups (where all loudspeakers are positioned in the same plane with the listener), panning is usually performed using pair-wise methods [1]. For the simple stereophonic playback case, a number of panning laws have been proposed [2] that result into the perception of a virtual sound source between the loudspeakers. In all panning cases, the common problem is that the loudspeakers are usually placed in different positions inside the playback enclosure. However, an ideal panning system should be capable of creating identical spatial auditory scenes using any loudspeaker configuration. Towards this aim, enhanced panning and surround sound techniques have been proposed in the literature (such as the Vector-base amplitude panning – VBAP [3] and the Ambisonics matrixing sound reproduction technique [4]), which render the sound source positioning independent from the number of the loudspeakers employed for playback. Lately, there has been a significant proliferation of three-dimensional (3D) audio technologies intended mainly for multimedia and portable consumer electronics applications [5]. Binaural source localization [6] represents a highly accurate technique for achieving 3D audio environment recreation by synthesizing a two- channel audio signal using the well-known Head Related Transfer Functions (HRTFs) [7] between the sound source and each listener’s human ear. Hence, only two loudspeakers or headphones are required for binaural audio playback. The simple setup of a binaural reproduction system renders it convenient for a number of state-of-the-art applications, including mobile applications and communications, especially when headphones are employed. In this work, a spatial mixing and audio mastering application (termed as Amphiotik Synthesis) is presented, which employs binaural technology for effectively producing 3D audio recordings. Based on a powerful binaural audio engine recently developed by the authors [8], the Amphiotik Synthesis application allows the real- time production of high-quality (24bit/96kHz) binaural signals, for a large number of virtual sources. Moreover, as it will be presented in the following paragraphs, transaural playback, room acoustics modeling and an enhanced HRTFs equalization algorithm is supported for efficiently positioning the active sound sources within typical enclosures. The rest of the paper is organized as following: Section II presents a general overview of binaural technology while in Section III, the Amphiotik Synthesis application is described and in Section IV typical test cases are presented that demonstrate its usage and effectiveness. Finally, Section V concludes this work. II. BINAURAL HEARING OVERVIEW Binaural hearing is based on two basic cues that are responsible for human sound localization perception: a) the interaural time difference (ITD) imposed by the different propagation times of the sound wave to the two (left and right) human ears and b) the interaural level difference (ILD) introduced by the shadowing effect of the head. Both sound localization cues result into the reception of two different sound waves by the human ears that perceptually provide information on the direction of an active sound source [9]. In binaural modeling, the effect of the above basic cues is incorporated into directional-dependent HRTFs: convolving the mono sound source wave with the appropriate pair of HRTFs derives the sound waves that correspond to each of the listener’s ears. The binaural left and right signals can be reproduced directly using headphones or a pair of conventional loudspeakers. In the Proceedings SMC'07, 4th Sound and Music Computing Conference, 11-13 July 2007, Lefkada, Greece 291