The 25 th International Conference on Auditory Display (ICAD 2019) 23–27 June 2019, Northumbria University HEARING ARTIFICIAL INTELLIGENCE: SONIFICATION GUIDELINES & RESULTS FROM A CASE-STUDY IN MELANOMA DIAGNOSIS R. Michael Winters GT Center for Music Technology Georgia Institute of Technology Atlanta GA mikewinters@gatech.edu Ankur Kalra Hop Labs Atlanta, GA ankur@hoplabs.com Bruce N. Walker GT Sonification Lab Georgia Institute of Technology Atlanta, GA bruce.walker@gatech.edu ABSTRACT The applications of artificial intelligence are becoming more and more prevalent in everyday life. Although many AI systems can operate autonomously, their goal is often assisting humans. Knowledge from the AI system must somehow be perceptualized. Towards this goal, we present a case-study in the application of data-driven non-speech audio for melanoma diagnosis. A physi- cian photographs a suspicious skin lesion, triggering a sonification of the system’s penultimate classification layer. We iterated on sonification strategies and coalesced around designs representing three general approaches. We tested each in a group of novice listeners (n=7) for mean sensitivity, specificity, and learning ef- fects. The mean accuracy was greatest for a simple model, but a trained dermatologist preferred a perceptually compressed model of the full classification layer. We discovered that training the AI on sonifications from this model improved accuracy further. We argue for perceptual compression as a general technique and for a comprehensible number of simultaneous streams. 1. INTRODUCTION Artificial Intelligence (AI) algorithms are becoming an increas- ingly important part of interacting with computers [1]. Today, al- most every major content provider uses machine learning, deep learning, or artificial intelligence more generally to produce their final product. In spite of the complexity and sophistication that is required to produce a well-functioning AI system, often the information needs to be displayed to a human recipient. In these contexts, an impor- tant layer of the AI system is the perceptualization of the machine knowledge. This perceptualization can take many sensory, linguis- tic, or cognitive forms, and the best way to communicate will de- pend upon human-factors such as the context, expertise, and task goals. In this paper, we describe a context where an AI system as- sists a human in the diagnosis of skin cancer from photographs of suspicious skin areas (lesions). A doctor takes a photograph of a suspicious area on their patient’s skin, triggering an analysis phase by the AI system. Once the image has been processed, it gener- ates a sonification that represents what has been sensed/classified This work is licensed under Creative Commons Attribution Non Commercial 4.0 International License. The full terms of the License are available at http://creativecommons.org/licenses/by-nc/4.0 in the image—good and bad. The doctor then uses this sound, in addition to other factors such as the patient’s medical history, to determine if further tests (biopsy) or treatment is indicated. We describe our design process for creating sounds for this AI system, which included three sonification designs and a user study with novice listeners. After describing the context around the work, we present the three designs in the order that they were created. We describe the study that we administered and our re- sults, then finish with general design guidelines for working with AI systems that may prove useful in similar contexts. 2. BACKGROUND CONTEXT Listening has formed a vital component of medical practice. In- deed, auscultation has been considered the first “imaging” tech- nology [2], and the stethoscope is still routinely used by general practitioners. Doctors are trained listeners. We worked with an algorithm that has been developed to iden- tify melanomas from photos of skin lesions [3]. The algorithm was a deep learning convolutional neural network, and was trained on thousands of images. The algorithm was designed to produce a binary classification output: benign or malignant. A simple auditory display strategy would be to read out a “be- nign” or “malignant” diagnosis for a given input image. How- ever, we sought to use a more sophisticated sonification to pro- vide additional information and context. We reasoned that if the sonification targeted the more subtle information behind the course benign/malignant classification, a listener might be able to under- stand more of the nuance behind the given classification. For ex- ample, each image might produce a unique aural signature that helps convey why the algorithm decided on its final classification. For the purposes of design, we targeted the penultimate layer in the AI system. While the final layer of the network had a binary classification, the layer before that had 1024 nodes, each with an associated weight and image-dependent activation. Although the full system contained hundreds of layers and loops, our choice to use the penultimate layer came from the desire to have the most direct and information rich layer available. This layer also made it easy to use the final classification output. 3. DESIGN PROCESS In the process of designing the sonification algorithm, we went through several design iterations, which manifested in three dis- tinct design strategies. The three designs all used the penultimate