How to Make AdaBoost.M1 Work for Weak Base Classifiers by Changing Only One Line of the Code G¨ unther Eibl and Karl Peter Pfeiffer InstituteofBiostatistics,Innsbruck,Austria guenther.eibl@uibk.ac.at Abstract. If one has a multiclass classification problem and wants to boostamulticlassbaseclassifierAdaBoost.M1isawellknownandwidely applicatedboostingalgorithm.HoweverAdaBoost.M1doesnotwork,if thebaseclassifieristooweak.Weshow,thatwithamodificationofonly onelineofAdaBoost.M1onecanmakeitusableforweakbaseclassifiers, too. The resulting classifier AdaBoost.M1W is guaranteed to minimize anupperboundforaperformancemeasure,calledtheguessingerror,as longasthebaseclassifierisbetterthanrandomguessing.Theusability ofAdaBoost.M1Wcouldbeclearlydemonstratedexperimentally. 1 Introduction A weak classifier is a map h : X → G (with G = {1,..., |G|}), which assigns an object with measurements x ∈ X to one of |G| prespecified groups with a high error rate. The task of a boosting algorithm is to turn a weak classifier into a strong classifier, that has a low error rate. To simplify notation we define, that arg max g∈G u(g) is the group g, which maximizes the function u. Most papers about boosting theory consider twoclass classification problems (|G|=2). Multiclass problems can then be reduced to twoclass problems using for example error-correcting codes [1,2,4,5]. However if one has a multiclass problem and also a base classifier for mul- ticlass problems like decision trees one would prefer a more direct boosting method. Freund and Schapire [3] proposed the algorithm AdaBoost.M1 (Fig.1), which is a straightforward generalization of AdaBoost for 2 groups for the multiclass problem using multiclass base classifiers. One of the main ideas of the algo- rithm is to maintain a distribution D of weights over the learning set L = {(x 1 ,g 1 ),..., (x N ,g N ); x i ∈ X,g i ∈ G}. The weight of this distribution on training example i on round t is denoted by D t (i). On each round the weights of incorrectly classified examples are increased so that the weak learner h is forced to focus on the ”hard” examples in the training set. The goal of the weak learner is to find a hypothesis h t appropriate for the distribution D t . The goodness of h t T. Elomaa et al. (Eds.): ECML, LNAI 2430, pp. 72–83, 2002. c Springer-Verlag Berlin Heidelberg 2002