How to Make AdaBoost.M1 Work for Weak Base Classifiers by Changing Only One Line of the Code unther Eibl and Karl Peter Pfeiffer InstituteofBiostatistics,Innsbruck,Austria guenther.eibl@uibk.ac.at Abstract. If one has a multiclass classification problem and wants to boostamulticlassbaseclassifierAdaBoost.M1isawellknownandwidely applicatedboostingalgorithm.HoweverAdaBoost.M1doesnotwork,if thebaseclassifieristooweak.Weshow,thatwithamodificationofonly onelineofAdaBoost.M1onecanmakeitusableforweakbaseclassifiers, too. The resulting classifier AdaBoost.M1W is guaranteed to minimize anupperboundforaperformancemeasure,calledtheguessingerror,as longasthebaseclassifierisbetterthanrandomguessing.Theusability ofAdaBoost.M1Wcouldbeclearlydemonstratedexperimentally. 1 Introduction A weak classifier is a map h : X G (with G = {1,..., |G|}), which assigns an object with measurements x X to one of |G| prespecified groups with a high error rate. The task of a boosting algorithm is to turn a weak classifier into a strong classifier, that has a low error rate. To simplify notation we define, that arg max gG u(g) is the group g, which maximizes the function u. Most papers about boosting theory consider twoclass classification problems (|G|=2). Multiclass problems can then be reduced to twoclass problems using for example error-correcting codes [1,2,4,5]. However if one has a multiclass problem and also a base classifier for mul- ticlass problems like decision trees one would prefer a more direct boosting method. Freund and Schapire [3] proposed the algorithm AdaBoost.M1 (Fig.1), which is a straightforward generalization of AdaBoost for 2 groups for the multiclass problem using multiclass base classifiers. One of the main ideas of the algo- rithm is to maintain a distribution D of weights over the learning set L = {(x 1 ,g 1 ),..., (x N ,g N ); x i X,g i G}. The weight of this distribution on training example i on round t is denoted by D t (i). On each round the weights of incorrectly classified examples are increased so that the weak learner h is forced to focus on the ”hard” examples in the training set. The goal of the weak learner is to find a hypothesis h t appropriate for the distribution D t . The goodness of h t T. Elomaa et al. (Eds.): ECML, LNAI 2430, pp. 72–83, 2002. c Springer-Verlag Berlin Heidelberg 2002