Recording and Reproducing High Order Surround Auditory Scenes for Mixed and Augmented Reality Zhiyun Li, Ramani Duraiswami, Larry S. Davis Perceptual Interfaces and Reality Laboratory, UMIACS University of Maryland, College Park, MD 20742 zli@cs.umd.edu;{ramani;lsd}@umiacs.umd.edu Abstract Virtual reality systems are largely based on computer graphics and vision technologies. However, sound also plays an important role in human’s interaction with the sur- rounding environment, especially for the visually impaired people. In this paper, we will develop the theory of record- ing and reproducing real-world surround auditory scenes in high orders using specially designed microphone and loud- speaker arrays. It is complementary to vision-based tech- nologies in creating mixed and augmented realities. Design examples and simulations are presented. 1 Introduction Virtual reality systems are finding increasing applica- tions in design, simulation, remote control, teleconference and training. Since the human sight accounts for more than 90 percent of all percepted information, most such systems are based on computer graphics and vision technologies. Examples include head-mounted displays, CAVE [3] and tiled displays [9][4][7][6], etc. In addition, to augment human-computer interactions, they are usually equipped with joysticks, ultrasonic pointers, data gloves, and hand- held wands. Other than that, sound also provides rich information about the surrounding world. With the advance in micro- phone, loudspeaker and digital signal processing technolo- gies, it is possible and desirable to build a sound-augmented immersive reality system which records and reproduces real-world surround auditory scenes accurately. 1.1 Previous Work While there exist algorithms to create virtual sound sources using loudspeaker array such as in [11], our concern is to recreate the realistic auditory scenes from real-world recordings. Several schemes have been proposed to build the microphone-loudspeaker array system. In [1], based on the Kirchhoff-Helmholtz integral on a plane, the sound field is captured by a directive microphone array, and recreated by a loudspeaker array. That is called the Wave Field Syn- thesis (WFS) method. While the system in [1] works well in an auditorium environment where the listening area can be separated from the primary source area by a plane, it is hard to render an immersive perception of a surround sound field. In [10], a general framework was proposed, which uses linear microphone arrays to identify, localize and track the sound sources, then uses the loudspeaker array to recre- ate them with the correct spatial cues via WFS. To work properly for complex auditory scenes with multiple moving sound sources, however, it requires a robust and accurate localizing and tracking system and a highly directive beam- former which are usually very expensive, if available. In [5], we have developed a unified and simple approach for capturing and recreating high order 3D auditory scenes us- ing the reciprocity principle that is satisfied between the two processes. It uses a spherical microphone array mounted on a rigid sphere to capture 3D auditory scenes and a spheri- cal loudspeaker array in free space to recreate the recorded scenes. This approach is designed to be independent of sound sources. 1.2 Present Work However, the capture and recreation of full 3D auditory scenes using spherical microphone-loudspeaker arrays are apparently redundant and uneconomic for surround audi- tory scenes which are usually the cases in real life. In such cases, sound sources are roughly on a plane such as in a cocktail party, roundtable conference, surround music recording, auditory traffic scenes, etc. In this paper, we will parallel the design methodology in [5] and develop a recording and reproducing system for high order surround auditory scenes. We first design a circular microphone array mounted on a sound-rigid cylinder, with its axis perpendic- Proceedings of the Third IEEE and ACM International Symposium on Mixed and Augmented Reality (ISMAR 2004) 0-7695-2191-6/04 $20.00 © 2004 IEEE