How to Make AdaBoost.M1 Work for Weak Base Classiﬁers by Changing Only One Line of the Code G¨ unther Eibl and Karl Peter Pfeiﬀer InstituteofBiostatistics,Innsbruck,Austria guenther.eibl@uibk.ac.at Abstract. If one has a multiclass classiﬁcation problem and wants to boostamulticlassbaseclassiﬁerAdaBoost.M1isawellknownandwidely applicatedboostingalgorithm.HoweverAdaBoost.M1doesnotwork,if thebaseclassiﬁeristooweak.Weshow,thatwithamodiﬁcationofonly onelineofAdaBoost.M1onecanmakeitusableforweakbaseclassiﬁers, too. The resulting classiﬁer AdaBoost.M1W is guaranteed to minimize anupperboundforaperformancemeasure,calledtheguessingerror,as longasthebaseclassiﬁerisbetterthanrandomguessing.Theusability ofAdaBoost.M1Wcouldbeclearlydemonstratedexperimentally. 1 Introduction A weak classiﬁer is a map h : X → G (with G = {1,..., |G|}), which assigns an object with measurements x ∈ X to one of |G| prespeciﬁed groups with a high error rate. The task of a boosting algorithm is to turn a weak classiﬁer into a strong classiﬁer, that has a low error rate. To simplify notation we deﬁne, that arg max g∈G u(g) is the group g, which maximizes the function u. Most papers about boosting theory consider twoclass classiﬁcation problems (|G|=2). Multiclass problems can then be reduced to twoclass problems using for example error-correcting codes [1,2,4,5]. However if one has a multiclass problem and also a base classiﬁer for mul- ticlass problems like decision trees one would prefer a more direct boosting method. Freund and Schapire [3] proposed the algorithm AdaBoost.M1 (Fig.1), which is a straightforward generalization of AdaBoost for 2 groups for the multiclass problem using multiclass base classiﬁers. One of the main ideas of the algo- rithm is to maintain a distribution D of weights over the learning set L = {(x 1 ,g 1 ),..., (x N ,g N ); x i ∈ X,g i ∈ G}. The weight of this distribution on training example i on round t is denoted by D t (i). On each round the weights of incorrectly classiﬁed examples are increased so that the weak learner h is forced to focus on the ”hard” examples in the training set. The goal of the weak learner is to ﬁnd a hypothesis h t appropriate for the distribution D t . The goodness of h t T. Elomaa et al. (Eds.): ECML, LNAI 2430, pp. 72–83, 2002. c  Springer-Verlag Berlin Heidelberg 2002