METRICS FOR MULTI -C LASS C LASSIFICATION : AN OVERVIEW AWHITE PAPER Margherita Grandini CRIF S.p.A. * m.grandini@crif.com Enrico Bagli CRIF S.p.A. * Giorgio Visani CRIF S.p.A. * Department of Computer Science † , University of Bologna August 14, 2020 ABSTRACT Classiﬁcation tasks in machine learning involving more than two classes are known by the name of "multi-class classiﬁcation". Performance indicators are very useful when the aim is to evaluate and compare different classiﬁcation models or machine learning techniques. Many metrics come in handy to test the ability of a multi-class classiﬁer. Those metrics turn out to be useful at different stage of the development process, e.g. comparing the performance of two different models or analysing the behaviour of the same model by tuning different parameters. In this white paper we review a list of the most promising multi-class metrics, we highlight their advantages and disadvantages and show their possible usages during the development of a classiﬁcation model. 1 Introduction In the vast ﬁeld of Machine Learning, the general focus is to predict an outcome using the available data. The prediction task is also called "classiﬁcation problem" when the outcome represents different classes, otherwise is called "regression problem" when the outcome is a numeric measurement. As regards to classiﬁcation, the most common setting involves only two classes, although there may be more than two. In this last case the issue changes his name and is called "multi-class classiﬁcation". From an algorithmic standpoint, the prediction task is addressed using the state of the art mathematical techniques. There are many different solutions, however each one shares a common factor: they use available data (X variables) to obtain the best prediction ˆ Y of the outcome variable Y . In Multi-class classiﬁcation, we may regard the response variable Y and the prediction ˆ Y as two discrete random variables: they assume values in {1, ··· ,K} and each number represents a different class. The algorithm comes up with the probability that a speciﬁc unit belongs to one possible class, then a classiﬁcation rule is employed to assign a single class to each individual. The rule is generally very simple, the most common rule assigns a unit to the class with the highest probability. A classiﬁcation model gives us the probability of belonging to a speciﬁc class for each possible units. Starting from the probability assigned by the model, in the two-class classiﬁcation problem a threshold is usually applied to decide which class has to be predicted for each unit. While in the multi-class case, there are various possibilities; among them, the highest probability value and the softmax are the most employed techniques. Performance indicators are very useful when the aim is to evaluate and compare different classiﬁcation models or machine learning techniques. * CRIF S.p.A., via Mario Fantin 1-3, 40131 Bologna (BO), Italy † Università degli Studi di Bologna, Dipartimento di Ingegneria e Scienze Informatiche, viale Risorgimento 2, 40136 Bologna (BO), Italy arXiv:2008.05756v1 [stat.ML] 13 Aug 2020