Layer-wise Relevance Propagation based Sample Condensation for Kernel Machines Daniel Winter 1 , Ang Bian 2[0000-0002-7667-9780] , and Xiaoyi Jiang 1[0000-0001-7678-9528] 1 Faculty of Mathematics and Computer Science, University of M¨ unster, M¨ unster, Germany 2 College of Computer Science, Sichuan University, Chengdu, China Abstract. Kernel machines are a powerful class of methods for classifi- cation and regression. Making kernel machines fast and scalable to large data, however, is still a challenging problem due to the need of storing and operating on the Gram matrix. In this paper we propose a novel ap- proach to sample condensation for kernel machines, preferably without impairing the classification performance. To our best knowledge, there is no previous work with the same goal reported in the literature. For this purpose we make use of the neural network interpretation of kernel ma- chines. Explainable AI techniques, in particular the Layer-wise Relevance Propagation method, are used to measure the relevance (importance) of training samples. Given this relevance measure, a decremental strategy is proposed for sample condensation. Experimental results on three data sets show that our approach is able to achieve the goal of substantial reduction of the number of training samples. 1 Introduction A fundamental result of learning theory is the family of representer theorems [7], which lead to the powerful kernel machines. Although trained to have zero classification error, kernel machines generalize well to unseen test data [4]. Com- pared to deep neural networks (DNN), they can be interpreted as two-layer NNs. Despite the simplicity, however, kernel machines turned out to be a good alterna- tive to DNNs, capable of matching and even surpassing their performance while utilizing less computational resources in training [8,9]. Making kernel machines fast and scalable to large data is still a challeng- ing problem. A major limiting factor is the need of saving all training sam- ples, computing the corresponding Gram matrix, and solving the related linear equation system (see Section 2). In this paper we thus consider the problem of condensing the training samples, preferably without impairing the classification performance. Based on the interpretation of kernel machines as two-layer neural networks, we make use of explainable AI techniques [15], in particular Layer- wise Relevance Propagation (LRP) [14], as a means to measure the relevance (importance) of training samples. A decremental strategy is proposed to use this measure for sample condensation.