The 25 th International Conference on Auditory Display (ICAD 2019) 23–27 June 2019, Northumbria University HEARING ARTIFICIAL INTELLIGENCE: SONIFICATION GUIDELINES & RESULTS FROM A CASE-STUDY IN MELANOMA DIAGNOSIS R. Michael Winters GT Center for Music Technology Georgia Institute of Technology Atlanta GA mikewinters@gatech.edu Ankur Kalra Hop Labs Atlanta, GA ankur@hoplabs.com Bruce N. Walker GT Soniﬁcation Lab Georgia Institute of Technology Atlanta, GA bruce.walker@gatech.edu ABSTRACT The applications of artiﬁcial intelligence are becoming more and more prevalent in everyday life. Although many AI systems can operate autonomously, their goal is often assisting humans. Knowledge from the AI system must somehow be perceptualized. Towards this goal, we present a case-study in the application of data-driven non-speech audio for melanoma diagnosis. A physi- cian photographs a suspicious skin lesion, triggering a soniﬁcation of the system’s penultimate classiﬁcation layer. We iterated on soniﬁcation strategies and coalesced around designs representing three general approaches. We tested each in a group of novice listeners (n=7) for mean sensitivity, speciﬁcity, and learning ef- fects. The mean accuracy was greatest for a simple model, but a trained dermatologist preferred a perceptually compressed model of the full classiﬁcation layer. We discovered that training the AI on soniﬁcations from this model improved accuracy further. We argue for perceptual compression as a general technique and for a comprehensible number of simultaneous streams. 1. INTRODUCTION Artiﬁcial Intelligence (AI) algorithms are becoming an increas- ingly important part of interacting with computers [1]. Today, al- most every major content provider uses machine learning, deep learning, or artiﬁcial intelligence more generally to produce their ﬁnal product. In spite of the complexity and sophistication that is required to produce a well-functioning AI system, often the information needs to be displayed to a human recipient. In these contexts, an impor- tant layer of the AI system is the perceptualization of the machine knowledge. This perceptualization can take many sensory, linguis- tic, or cognitive forms, and the best way to communicate will de- pend upon human-factors such as the context, expertise, and task goals. In this paper, we describe a context where an AI system as- sists a human in the diagnosis of skin cancer from photographs of suspicious skin areas (lesions). A doctor takes a photograph of a suspicious area on their patient’s skin, triggering an analysis phase by the AI system. Once the image has been processed, it gener- ates a soniﬁcation that represents what has been sensed/classiﬁed This work is licensed under Creative Commons Attribution Non Commercial 4.0 International License. The full terms of the License are available at http://creativecommons.org/licenses/by-nc/4.0 in the image—good and bad. The doctor then uses this sound, in addition to other factors such as the patient’s medical history, to determine if further tests (biopsy) or treatment is indicated. We describe our design process for creating sounds for this AI system, which included three soniﬁcation designs and a user study with novice listeners. After describing the context around the work, we present the three designs in the order that they were created. We describe the study that we administered and our re- sults, then ﬁnish with general design guidelines for working with AI systems that may prove useful in similar contexts. 2. BACKGROUND CONTEXT Listening has formed a vital component of medical practice. In- deed, auscultation has been considered the ﬁrst “imaging” tech- nology [2], and the stethoscope is still routinely used by general practitioners. Doctors are trained listeners. We worked with an algorithm that has been developed to iden- tify melanomas from photos of skin lesions [3]. The algorithm was a deep learning convolutional neural network, and was trained on thousands of images. The algorithm was designed to produce a binary classiﬁcation output: benign or malignant. A simple auditory display strategy would be to read out a “be- nign” or “malignant” diagnosis for a given input image. How- ever, we sought to use a more sophisticated soniﬁcation to pro- vide additional information and context. We reasoned that if the soniﬁcation targeted the more subtle information behind the course benign/malignant classiﬁcation, a listener might be able to under- stand more of the nuance behind the given classiﬁcation. For ex- ample, each image might produce a unique aural signature that helps convey why the algorithm decided on its ﬁnal classiﬁcation. For the purposes of design, we targeted the penultimate layer in the AI system. While the ﬁnal layer of the network had a binary classiﬁcation, the layer before that had 1024 nodes, each with an associated weight and image-dependent activation. Although the full system contained hundreds of layers and loops, our choice to use the penultimate layer came from the desire to have the most direct and information rich layer available. This layer also made it easy to use the ﬁnal classiﬁcation output. 3. DESIGN PROCESS In the process of designing the soniﬁcation algorithm, we went through several design iterations, which manifested in three dis- tinct design strategies. The three designs all used the penultimate