METRICS FOR MULTI -C LASS C LASSIFICATION : AN OVERVIEW AWHITE PAPER Margherita Grandini CRIF S.p.A. * m.grandini@crif.com Enrico Bagli CRIF S.p.A. * Giorgio Visani CRIF S.p.A. * Department of Computer Science † , University of Bologna August 14, 2020 ABSTRACT Classification tasks in machine learning involving more than two classes are known by the name of "multi-class classification". Performance indicators are very useful when the aim is to evaluate and compare different classification models or machine learning techniques. Many metrics come in handy to test the ability of a multi-class classifier. Those metrics turn out to be useful at different stage of the development process, e.g. comparing the performance of two different models or analysing the behaviour of the same model by tuning different parameters. In this white paper we review a list of the most promising multi-class metrics, we highlight their advantages and disadvantages and show their possible usages during the development of a classification model. 1 Introduction In the vast field of Machine Learning, the general focus is to predict an outcome using the available data. The prediction task is also called "classification problem" when the outcome represents different classes, otherwise is called "regression problem" when the outcome is a numeric measurement. As regards to classification, the most common setting involves only two classes, although there may be more than two. In this last case the issue changes his name and is called "multi-class classification". From an algorithmic standpoint, the prediction task is addressed using the state of the art mathematical techniques. There are many different solutions, however each one shares a common factor: they use available data (X variables) to obtain the best prediction ˆ Y of the outcome variable Y . In Multi-class classification, we may regard the response variable Y and the prediction ˆ Y as two discrete random variables: they assume values in {1, ··· ,K} and each number represents a different class. The algorithm comes up with the probability that a specific unit belongs to one possible class, then a classification rule is employed to assign a single class to each individual. The rule is generally very simple, the most common rule assigns a unit to the class with the highest probability. A classification model gives us the probability of belonging to a specific class for each possible units. Starting from the probability assigned by the model, in the two-class classification problem a threshold is usually applied to decide which class has to be predicted for each unit. While in the multi-class case, there are various possibilities; among them, the highest probability value and the softmax are the most employed techniques. Performance indicators are very useful when the aim is to evaluate and compare different classification models or machine learning techniques. * CRIF S.p.A., via Mario Fantin 1-3, 40131 Bologna (BO), Italy † Università degli Studi di Bologna, Dipartimento di Ingegneria e Scienze Informatiche, viale Risorgimento 2, 40136 Bologna (BO), Italy arXiv:2008.05756v1 [stat.ML] 13 Aug 2020