How certain are your uncertainties? ? Luke Whitbread 1[0000-0001-9960-5592] and Mark Jenkinson 1,2[0000-0001-6043-0166] 1 School of Computer Science, The University of Adelaide, Australia 2 Wellcome Centre for Integrative Neuroimaging, University of Oxford, United Kingdom Abstract. Having a measure of uncertainty in the output of a deep learning method is useful in several ways, such as in assisting with inter- pretation of the outputs, helping build confidence with end users, and for improving the training and performance of the networks. Therefore, sev- eral different methods have been proposed to capture various types of un- certainty, including epistemic (relating to the model used) and aleatoric (relating to the data) sources, with the most commonly used methods for estimating these being test-time dropout for epistemic uncertainty and test-time augmentation for aleatoric uncertainty. However, these meth- ods are parameterised (e.g. amount of dropout or type and level of aug- mentation) and so there is a whole range of possible uncertainties that could be calculated, even with a fixed network and dataset. This work investigates the stability of these uncertainty measurements, in terms of both magnitude and spatial pattern. In experiments using the well characterised BraTS challenge, we demonstrate substantial variability in the magnitude and spatial pattern of these uncertainties, and discuss the implications for interpretability, repeatability and confidence in results. Keywords: Uncertainties · Stability · Repeatability · Confidence. 1 Introduction Magnetic resonance imaging (MRI) is often used to acquire structural brain images in both clinical neurology and research settings. It is common practice to produce anatomical segmentations using these images with manual or automated methods to support a number of clinical and research tasks. While it is fundamental to any segmentation task to optimise overall mea- sures of success (e.g. Dice/F1 scores), the uncertainties associated with image segmentations have become a salient field of enquiry for researchers; to (i) im- prove the quality and interpretability of structural delineations, and (ii) improve trust when applying automated techniques to clinical practice [1]. To this end, it is essential to have a thorough treatment of uncertainties to understand the range of possible types of uncertainty and how stable they are. ? Supported by The University of Adelaide. arXiv:2203.00238v1 [cs.LG] 1 Mar 2022