Estimating the Robustness of Classification Models by the Structure of the Learned Feature-Space Kalun Ho, Franz-Josef-Pfreundt CC-HPC, Fraunhofer ITWM Fraunhofer-Platz 1 67663 Kaiserslautern Germany kalun.ho@itwm.fhg.de Janis Keuper Institute for Machine Learning and Analytics Offenburg University Germany Margret Keuper Data and Web Science Group University of Mannheim Germany Abstract Over the last decade, the development of deep image classification networks has mostly been driven by the search for the best performance in terms of classification accuracy on standardized benchmarks like ImageNet. More recently, this focus has been expanded by the notion of model ro- bustness, i.e. the generalization abilities of models towards previously unseen changes in the data distribution. While new benchmarks, like ImageNet-C, have been introduced to measure robustness properties, we argue that fixed testsets are only able to capture a small portion of possible data variations and are thus limited and prone to generate new overfitted solutions. To overcome these drawbacks, we sug- gest to estimate the robustness of a model directly from the structure of its learned feature-space. We introduce robust- ness indicators which are obtained via unsupervised clus- tering of latent representations from a trained classifier and show very high correlations to the model performance on corrupted test data. 1. Introduction Deep learning approaches have shown rapid progress on computer vision tasks. Much work has been dedicated to train ever deeper models with improved validation and test accuracies and efficient training schemes [53, 19, 33, 20]. Recently, this progress has been accompanied by discus- sions on the robustness of the resulting model [9]. Specifi- cally, the focus shifted towards the following two questions: 1. How can we train models that are robust with respect to specific kinds of perturbations? 2. How can we assess the robustness of a given model? These two questions represent fundamentally different perspectives on the same problem. While the first question assumes that the expected set of perturbations is known during model training, the second question rather aims at estimating a models behavior in un- ResNet50 PolyNet Deit-t AlexNet Deit-s NasnetAmobile Densenet121 InceptionResnetV2 0.80 0.50 0.20 0.35 0.50 0.70 Robustness R = 0.87 2 Vgg16 Vgg11 Resnet101 Bninception p purity k-means * multicuts Figure 1. Predicting the robustness of models using our proposed cluster purity indicator (ppurity ): The correlation between ppurity of models trained on the original ImageNet with the measured test accuracy on ImageNet-C is R 2 =0.87. foreseen cases and predict its robustness without explicitly testing on specific kinds of corrupted data. In this paper, we address the second research question. We argue that the clustering performance in a model’s latent space can be an indicator for a model’s robustness. For this purpose, we introduce cluster purity as a robustness mea- sure in order to predict the behavior of models against data corruption and adversarial attacks. Specifically, we evaluate various classification models [29, 53, 21, 15, 44, 52, 22, 47] on the ImageNet-C [16] dataset of corrupted ImageNet images where we measure the robustness of a model as the ratio between the accuracy on corrupted data and clean data. The key result of this paper is illustrated in figure 1: it shows that the model robustness is strongly correlated to the relative clustering performance on the models’ latent spaces, i.e. the ratio between the cluster purity and the classification accuracy, both evaluated on clean data. The clusterability of a model’s feature space can therefore be considered as an easily accessible indicator for model robustness. 1 arXiv:2106.12303v2 [cs.CV] 19 Aug 2021