Combining Cluster Validation Indices for Detecting Label Noise Veselka Boeva, Jan Kohstall, Lars Lundberg and Milena Angelova Abstract In this paper, we show that cluster validation indices can be used for filtering mislabeled instances or class outliers prior to training in supervised learning problems. We propose a technique, entitled Cluster Validation Index (CVI)-based Outlier Filtering, in which mislabeled instances are identified and eliminated from the training set, and a classification hypothesis is then built from the set of remaining instances. The proposed approach assigns each instance several cluster validation scores representing its potential of being an outlier with respect to the clustering properties the used validation measures assess. We examine CVI-based Outlier Filtering and compare it against the Local Outlier Factor (LOF) detection method on ten data sets from the UCI data repository using five well-known learning algorithms and three different cluster validation indices. In addition, we study and compare three different approaches Veselka Boeva · Jan Kohstall † · Lars Lundberg Blekinge Institute of Technology, SE-371 79, Karlskrona, Sweden veselka.boeva@bth.se lars.lundberg@bth.se jan.kohstall@bth.se † Jan Kohstall is also affiliated with Hasso Plattner Institute, University of Potsdam (Germany). Milena Angelova Technical University of Sofia, Plovdiv, Bulgaria mangelova@tu-plovdiv.bg Archives of Data Science, Series A (Online First) DOI: 10.5445/KSP/1000087327/18 KIT Scientific Publishing ISSN 2363-9881 Vol. 5, No. 1, 2018