What Virtual Audio Synthesis Could Do for Visually Disabled Humans in the New Era? György Wersényi Széchenyi István University, Department of Telecommunications, Egyetem tér 1., H-9026, Győr, Hungary Proceedings of the AES 12 th Regional Convention, Tokyo, June 12-14, 2005, Japan, pp. 180-183, 2005. ABSTRACT Listening tests were carried out for investigating the localization performance of 42 untrained subjects using noise stimuli in a 2D virtual acoustic display (VAD). Measurements were made on the basis of the former GUIB project (Graphical User Interface for Blind Persons). Results of the evaluation of the average spatial resolution will be presented. Suggestions for optimal partitioning the virtual space is made on a rectangle 2D VAD in front of the listener, focusing on vertical localization. 0 Introduction The GUIB (Graphical User Interface for Blind Persons) project was founded to create a proper virtual environment for blind persons to help them by the use of personal computers [1, 2]. They do not have the possibility of GUIs and thus, events on the screen have to be replaced or extended by sound events [3-6]. A simple, low-cost method is wanted to make visually disabled people able to orientate himself e.g. on a usually PC with MS-Windows. 1 Measurement method The playback system includes the Beachtron DSP card that produces virtual sound events on a 2D rectangle virtual acoustic display in front of the listener. At the first step 40 subjects determined the average, best-case and worst-case individual spatial resolution in the horizontal and median plane respectively [7, 8]. We used white noise and filtered noise stimuli (Fig.1). Test signals were selected to model real sound events in length and loudness in a generic way but in the same time allowing testing localization depending on spectral content. Cut-off frequencies for the filtering were chosen to be drastic and far from each other in the frequency in order for a good separation between Signal A (whatever it will be later) and the filtered signals from it. Listeners reported in a three-category-forced-choice Minimum-Audible-Angle (MAA) measurement and determined a directional-independent average spatial resolution along the horizontal and vertical axes (Fig.2). 300 ms burst-impulse pairs were used and subjects had to discriminate them as the second noise burst was moving away or toward the first (reference) noise burst signal in 1° steps. MAA was found to be optimal for signals that are below 1000 Hz and/or above 4000 Hz. Real-time HRTF filtering originating from a “good localizer” together with proper headphone equalization is made by the DSP card that is necessary for a virtual sound field simulation [9-14]. For a user-friendly mapping between visual and sound events of the screen (e.g. for using the mouse) a rectangle 2D “screen-like surface” is simulated as an extension or replacement of the display. Fig.1. Spectra of the noise signal excitation 1.1 Results of the current investigation The results of 40 subjects delivered an average resolution as shown on Fig.2. Black filled dots correspond to virtual source locations on the 2D VAD as a total average over all subjects and test signals [8, 15]. Based on Fig.2 the average resolution was simulated using the same system, measurement method and stimuli. The goal was to test this resolution and determine how many subjects could actually use a resolution of 13x5. We assumed that 13 sources horizontally (in a resolution of about 7-10°) and 5 vertically (in a resolution of about 15°) will be “too much” and unusable for a real application.