International Journal of Computer and Information Technology (ISSN: 2279 – 0764) Volume 03 – Issue 03, May 2014 www.ijcit.com 469 Evaluation of the Sonification Protocol of an Artificial Vision System for the Visually Impaired Pablo Revuelta Sanz Electronic Technology Carlos III University of Madrid Leganés, Spain Email: prevuelt {at} ing.uc3m.es Belén Ruiz Mezcua Computer Science Carlos III University of Madrid Leganés, Spain José M. Sánchez Pena Electronic Technology Carlos III University of Madrid Leganés, Spain Bruce N. Walker Sonification Lab GeorgiaTech Atlanta, U.S.A. Abstract— In this study we present the results of evaluating the sonification protocol of a new assistive product aiming to help the visually impaired in perceiving their surroundings through sounds organized in different cognitive profiles. The evaluation was carried out with 17 sighted and 11 visually impaired participants. The experiment was designed over both virtual and real environments and divided into 4 virtual reality based tests and one real life test. Finally, four participants became experts by means of longer and deeper trainings and then participated in a focus group at the end of the process. Both quantitative and qualitative results showed that the proposed system is able to effectively represent the spatial configuration of objects through sounds. However, important limitations have been found in the sample used (some important demographic characteristics are intercorrelated, impeding segregated analysis), the usability of the most complex profile, and even the special difficulties faced by totally blind participants relative to the sighted and low vision ones. Keywords-component; formatting; style; styling; insert (key words) I. INTRODUCTION We call “artificial vision” the processing and transmission of visual information into non-visual formats. This processing is very useful for people who temporally or permanently cannot receive the visual information from their surroundings. Likewise, due to the large bandwidth of the auditory system, the use of sounds is one of the most used ways to represent the visual world. Moreover, there is strong evidence of the benefits of auditory displays to transmit visual information to the blind [1]. “Sonification” is the way we translate data into sounds. We can find many types of sonification, such as text-to-speech programs (converting text into audible speech), color readers (color into synthetic voice), Geiger counters (radioactivity into clicks), acoustic radars or MIDI synthesizers, etc. It has also been widely used in the assistive technology field to substitute visual information and thus specially oriented to the visually impaired. Technology has been applied to mobility since the 60’s and 70’s [2;3]. Focusing specially in the image processing based Assistive Products (AP), we can find, among others, the Sonic Pathfinder [4], Tyflos [5], Echolocation [6], vOICe [7], FIU Project [8], 3-D Space Perceptor [9], NAVI [10], SVETA [8;11;12], AudioMan [13], CASBliP [14], EAV [15;16], 3-D Support System [17], Brigham Project [18], the Optophone [19] or the Cross-Modal ETA [20]. Some of the latests advances in this filed can be found in [21-24]. These systems use different strategies to provide the relevant information to the users, mainly tactile and auditive. For a review of them, see [25;26]. Among these proposals, we find an important problem: the sonification uses non-redundant transformations of spatial information into sounds, which make is harder to be understood. II. MATERIALS AND METHODS In this paper, we discuss the evaluation of a redundant sonification protocol described in [27], for its utility in an artificial vision system for the blind. The sonification used is a variation of the point mapping, as described in [28], height is codified as frequency, horizontality as binaural loudness. The volume, again, is related to the brightness. Another example can be found in [29]. Figure 1 shows the block diagram of the system in which these sonification rules will run. The cameras used were a couple of low-cost USB webcams [30] with a resolution of 320×240 pixels at 30 fps. Take into account that the visual cone is 90º width (in vertical and horizontal axis). The programming language for the image processing was the OpenCV library running with an ANSI C program. The sonification was implemented with the MIDI (GM2) protocol. Although the complete system presents 7 different profiles of sonification, we tested only the 4 more complex ones (those useful for artificial vision). The complete set of sounds used in this evaluation in relation to the three dimensions of space (from the user’s point of view) is summarized in Table I.