Eurographics Workshop on Visual Computing for Biology and Medicine (2020) B. Kozlíková, M. Krone, and N. N. Smit (Editors) GLANCE: Visual Analytics for Monitoring Glaucoma Progression Astrid van den Brandt 1 , Mark Christopher 4 , Linda M. Zangwill 4 , Jasmin Rezapour 2,4 , Christopher Bowd 4 , Sally L. Baxter 3,4 , Derek S. Welsbie 4 , Andrew Camp 4 , Sasan Moghimi 4 , Jiun L. Do 4 , Robert N. Weinreb 4 , Chris C.P. Snijders 1 and Michel A. Westenberg 1 1 Eindhoven University of Technology, The Netherlands 2 University Medical Center Mainz, Department of Ophthalmology, Mainz, Germany 3 Division of Biomedical Informatics, Department of Medicine, University of California San Diego, La Jolla, CA 4 Hamilton Glaucoma Center, Viterbi Family Department of Ophthalmology and Shiley Eye Institute University of California San Diego, La Jolla, CA Abstract Deep learning is increasingly used in the field of glaucoma research. Although deep learning models can achieve high accuracy, issues with trust, interpretability, and practical utility form barriers to adoption in clinical practice. In this study, we explore whether and how visualizations of deep learning-based measurements can be used for glaucoma management in the clinic. Through iterative design sessions with ophthalmologists, vision researchers, and manufacturers of optical coherence tomog- raphy (OCT) instruments, we distilled four main tasks, and designed a visualization tool that incorporates a visual field (VF) prediction model to provide clinical decision support in managing glaucoma progression. The tasks are: (1) assess reliability of a prediction, (2) understand why the model made a prediction, (3) alert to features that are relevant, and (4) guide future scheduling of VFs. Our approach is novel in that it considers utility of the system in a clinical context where time is limited. With use cases and a pilot user study, we demonstrate that our approach can aid clinicians in clinical management decisions and obtain appropriate trust in the system. Taken together, our work shows how visual explanations of automated methods can augment clinicians’ knowledge and calibrate their trust in DL-based measurements during clinical decision making. 1. Introduction Glaucoma is one of the leading causes of irreversible blindness worldwide and its prevalence will likely continue to rise due to global aging populations [TLW * 14, WAM14]. Glaucoma is a pro- gressive eye disease that is characterized by loss of nerve fibers, resulting in visual field defects [AGS10]. Treatment can slow or even stop progression of the disease, which makes early detection crucial [SYC14]. However, timely detection of disease progression is challenging because glaucoma often remains asymptomatic until there is considerable visual field loss [WAM14]. The current standard-of-care for monitoring glaucoma involves both structural and functional measurements. Structural changes in the eye can be assessed by optical coherence tomography (OCT) and clinical examinations of the optic disc. Visual function is an- alyzed using a visual field (VF) test, which includes global sum- mary statistics such as mean deviation (MD) and visual field index (VFI) [ZDF * 17]. The VF test is essential to detecting and moni- toring the disease, because it provides measurement of peripheral and central visual function of the patient. However, there are limi- tations associated with VF testing; VFs are subjective and variable [WDZ * 13], and some people experience difficulties with taking the test. In contrast to VF, structural measurements performed by OCT are objective and have good reproducibility [PGI * 12]. Moreover, glaucomatous structural damage found on OCT measurements of- ten precedes VF defects [KZZ * 15]. This, and the fact that they are less variable makes structural measurements powerful for detecting and monitoring glaucoma progression [ZDF * 17]. Because the causes of glaucoma are multi-factorial and there ex- ist various measurements (fundus photographs, visual fields, OCT of optic nerve and macula), deep learning (DL) methods have been introduced to analyze the disease [CBB * 18, CBB * 20, LHK * 18]. Although many DL approaches show excellent performance for a variety of tasks, understanding these models and implementing them into clinical practice remains a challenge. Deep learning mod- els are often regarded as black boxes because it is hard to grasp the rationale behind the nonlinear operations used for predictions. This is a problem for deployment in real-world applications, es- pecially in the clinical domain where trust and interpretability are crucial [MWW * 17]. Insufficient interpretability and trust are main barriers to adoption of DL models in clinical practice. Moreover, many experts warn that DL models may not be able to incorporate “outside” or “contextual” factors that are important for decision- making. Further, algorithmically determined features may not al- ways be clinically familiar. In cases where there is strong discon- nection between the two, clinicians may lose trust in the system, especially when no explanations are given [CRH * 19]. Issues with trust and interpretability also affect other application domains, which has made explainable artificial intelligence (XAI) an important area of research. In XAI, visualization techniques are developed to enhance the collaboration between human and AI. c 2020 The Author(s) Eurographics Proceedings c 2020 The Eurographics Association. DOI: 10.2312/vcbm.20201175 https://diglib.eg.org https://www.eg.org