Estimating the Robustness of Classiﬁcation Models by the Structure of the Learned Feature-Space Kalun Ho, Franz-Josef-Pfreundt CC-HPC, Fraunhofer ITWM Fraunhofer-Platz 1 67663 Kaiserslautern Germany kalun.ho@itwm.fhg.de Janis Keuper Institute for Machine Learning and Analytics Offenburg University Germany Margret Keuper Data and Web Science Group University of Mannheim Germany Abstract Over the last decade, the development of deep image classiﬁcation networks has mostly been driven by the search for the best performance in terms of classiﬁcation accuracy on standardized benchmarks like ImageNet. More recently, this focus has been expanded by the notion of model ro- bustness, i.e. the generalization abilities of models towards previously unseen changes in the data distribution. While new benchmarks, like ImageNet-C, have been introduced to measure robustness properties, we argue that ﬁxed testsets are only able to capture a small portion of possible data variations and are thus limited and prone to generate new overﬁtted solutions. To overcome these drawbacks, we sug- gest to estimate the robustness of a model directly from the structure of its learned feature-space. We introduce robust- ness indicators which are obtained via unsupervised clus- tering of latent representations from a trained classiﬁer and show very high correlations to the model performance on corrupted test data. 1. Introduction Deep learning approaches have shown rapid progress on computer vision tasks. Much work has been dedicated to train ever deeper models with improved validation and test accuracies and efﬁcient training schemes [53, 19, 33, 20]. Recently, this progress has been accompanied by discus- sions on the robustness of the resulting model [9]. Speciﬁ- cally, the focus shifted towards the following two questions: 1. How can we train models that are robust with respect to speciﬁc kinds of perturbations? 2. How can we assess the robustness of a given model? These two questions represent fundamentally different perspectives on the same problem. While the ﬁrst question assumes that the expected set of perturbations is known during model training, the second question rather aims at estimating a models behavior in un- ResNet50 PolyNet Deit-t AlexNet Deit-s NasnetAmobile Densenet121 InceptionResnetV2 0.80 0.50 0.20 0.35 0.50 0.70 Robustness R = 0.87 2 Vgg16 Vgg11 Resnet101 Bninception p purity k-means * multicuts Figure 1. Predicting the robustness of models using our proposed cluster purity indicator (ppurity ): The correlation between ppurity of models trained on the original ImageNet with the measured test accuracy on ImageNet-C is R 2 =0.87. foreseen cases and predict its robustness without explicitly testing on speciﬁc kinds of corrupted data. In this paper, we address the second research question. We argue that the clustering performance in a model’s latent space can be an indicator for a model’s robustness. For this purpose, we introduce cluster purity as a robustness mea- sure in order to predict the behavior of models against data corruption and adversarial attacks. Speciﬁcally, we evaluate various classiﬁcation models [29, 53, 21, 15, 44, 52, 22, 47] on the ImageNet-C [16] dataset of corrupted ImageNet images where we measure the robustness of a model as the ratio between the accuracy on corrupted data and clean data. The key result of this paper is illustrated in ﬁgure 1: it shows that the model robustness is strongly correlated to the relative clustering performance on the models’ latent spaces, i.e. the ratio between the cluster purity and the classiﬁcation accuracy, both evaluated on clean data. The clusterability of a model’s feature space can therefore be considered as an easily accessible indicator for model robustness. 1 arXiv:2106.12303v2 [cs.CV] 19 Aug 2021