International Journal on Information Technologies & Security, SP2 (vol. 11), 2019 49 CAPTURE, REPRESENTATION, AND RENDERING OF 3D AUDIO FOR VIRTUAL AND AUGMENTED REALITY Ivan J. Tashev Microsoft Research, One Microsoft Way, Redmond WA 98052 e-mail: ivantash@microsoft.com USA Abstract: Devices for augmented and virtual reality (AR/VR) find appli- cations in gaming, entertainment, education, design and science. An AR/VR headset consists of a head-mounted display, headphones or loud- speakers, and a computing platform. The spatial audio system plays integral party of achieving the realism. In this article we present an overview of our work on technologies for the 3D audio targeting AR/VR scenarios. Key words: spatial audio, augmented reality, virtual reality. 1. INTRODUCTION Virtual reality (VR) is a set of technologies to create a perception in the human for being in another place, both visually and acoustically [1]. With first attempts da- ting as early as in 1950s, today it is a well-developed area in computer science and technologies. In augmented reality (AR) the human remains in its own environment, to which audio-visual objects are added (augmented) [2]. If we consider AR and VR as two extremes of the reality-virtuality continuum, then commonly used term is mixed reality (MR). The MR devices consist of head mounted displays to create the visual portion of MR, headphones for reproducing the spatial audio, system for hu- man posture tracking, components of the user interface and computing platform. In this article we will make an overview of technologies for creating spatial audio ex- perience in the MR systems. As such they need input from the head position and orientation tracking system, with requirements usually much lower than needed for the visual part. Spatial audio is set of devices and signal processing algorithms which can create the perception in listener that the audio comes from any desired position (direction, elevation, and distance). Also, it is called 3D audio. Work on spatial audio reproduc- tion starts with stereophonic (2 channels) audio rendering and continues through quadrophonic (4 channels) to surround sound systems today (5, 7, and more chan- nels). These approaches are from the group of channel-based representation of the