Recording and Reproducing High Order Surround Auditory Scenes for Mixed
and Augmented Reality
Zhiyun Li, Ramani Duraiswami, Larry S. Davis
Perceptual Interfaces and Reality Laboratory, UMIACS
University of Maryland, College Park, MD 20742
zli@cs.umd.edu;{ramani;lsd}@umiacs.umd.edu
Abstract
Virtual reality systems are largely based on computer
graphics and vision technologies. However, sound also
plays an important role in human’s interaction with the sur-
rounding environment, especially for the visually impaired
people. In this paper, we will develop the theory of record-
ing and reproducing real-world surround auditory scenes in
high orders using specially designed microphone and loud-
speaker arrays. It is complementary to vision-based tech-
nologies in creating mixed and augmented realities. Design
examples and simulations are presented.
1 Introduction
Virtual reality systems are finding increasing applica-
tions in design, simulation, remote control, teleconference
and training. Since the human sight accounts for more than
90 percent of all percepted information, most such systems
are based on computer graphics and vision technologies.
Examples include head-mounted displays, CAVE [3] and
tiled displays [9][4][7][6], etc. In addition, to augment
human-computer interactions, they are usually equipped
with joysticks, ultrasonic pointers, data gloves, and hand-
held wands.
Other than that, sound also provides rich information
about the surrounding world. With the advance in micro-
phone, loudspeaker and digital signal processing technolo-
gies, it is possible and desirable to build a sound-augmented
immersive reality system which records and reproduces
real-world surround auditory scenes accurately.
1.1 Previous Work
While there exist algorithms to create virtual sound
sources using loudspeaker array such as in [11], our concern
is to recreate the realistic auditory scenes from real-world
recordings. Several schemes have been proposed to build
the microphone-loudspeaker array system. In [1], based on
the Kirchhoff-Helmholtz integral on a plane, the sound field
is captured by a directive microphone array, and recreated
by a loudspeaker array. That is called the Wave Field Syn-
thesis (WFS) method. While the system in [1] works well
in an auditorium environment where the listening area can
be separated from the primary source area by a plane, it is
hard to render an immersive perception of a surround sound
field. In [10], a general framework was proposed, which
uses linear microphone arrays to identify, localize and track
the sound sources, then uses the loudspeaker array to recre-
ate them with the correct spatial cues via WFS. To work
properly for complex auditory scenes with multiple moving
sound sources, however, it requires a robust and accurate
localizing and tracking system and a highly directive beam-
former which are usually very expensive, if available. In
[5], we have developed a unified and simple approach for
capturing and recreating high order 3D auditory scenes us-
ing the reciprocity principle that is satisfied between the two
processes. It uses a spherical microphone array mounted on
a rigid sphere to capture 3D auditory scenes and a spheri-
cal loudspeaker array in free space to recreate the recorded
scenes. This approach is designed to be independent of
sound sources.
1.2 Present Work
However, the capture and recreation of full 3D auditory
scenes using spherical microphone-loudspeaker arrays are
apparently redundant and uneconomic for surround audi-
tory scenes which are usually the cases in real life. In
such cases, sound sources are roughly on a plane such as
in a cocktail party, roundtable conference, surround music
recording, auditory traffic scenes, etc. In this paper, we
will parallel the design methodology in [5] and develop a
recording and reproducing system for high order surround
auditory scenes. We first design a circular microphone array
mounted on a sound-rigid cylinder, with its axis perpendic-
Proceedings of the Third IEEE and ACM International Symposium on Mixed and Augmented Reality (ISMAR 2004)
0-7695-2191-6/04 $20.00 © 2004 IEEE