Model Selection for Multi-class SVMs Yann Guermeur 1 , Myriam Maumy 2 , and Fr´ ed´ eric Sur 1 1 LORIA-CNRS Campus Scientiﬁque, BP 239, 54506 Vandœuvre-l` es-Nancy Cedex, France (e-mail: Yann.Guermeur@loria.fr, Frederic.Sur@loria.fr) 2 IRMA-ULP 7 rue Ren´ e Descartes 67084 Strasbourg Cedex, France (e-mail: mmaumy@math.u-strasbg.fr) Abstract. In the framework of statistical learning, ﬁtting a model to a given problem is usually done in two steps. First, model selection is performed, to set the values of the hyperparameters. Second, training results in the selection, for this set of values, of a function performing satisfactorily on the problem. Choosing the values of the hyperparameters remains a diﬃcult task, which has only been addressed so far in the case of bi-class SVMs. We derive here a solution dedicated to M-SVMs. It is based on a new bound on the risk of large margin classiﬁers. Keywords: Multi-class SVMs, hyperparameters, soft margin parameter. 1 Introduction When support vector machines (SVMs) [Vapnik, 1998] were introduced in the early nineties, they were seen by some as oﬀ-the-shelf tools. This ideal- istic picture soon proved too optimistic. Not only does their training raise technical diﬃculties, but the tuning of the kernel parameters and the soft margin parameter C also remains a diﬃcult task. In literature, this ques- tion is addressed for (two-class) pattern recognition and function estimation SVMs. The methods proposed often rest on estimates of the true risk of the machine [Chapelle et al., 2002]. The case of multi-class discriminant analysis was only considered in the framework of decomposition schemes [Passerini et al., 2004]. The case of multi-class SVMs (M-SVMs) calls for speciﬁc solu- tions. Indeed, the implementation of the structural risk minimization (SRM) inductive principle [Vapnik, 1982] utterly rests on the availability of tight error bounds and the standard uniform convergence results do not carry over nicely to the case of multi-category large margin classiﬁers. In this paper, we derive a new bound on the generalization performance of M-SVMs in terms of constraints on the hyperplanes. This bound, interesting in its own right, makes central use of a result relating covering problems and the degree of compactness of operators. It serves as an objective function to tune the value of the soft margin parameter. This way, the value of C and the dual variables α can be determined simultaneously, at a cost of the same order