Balanced Boosting with Parallel Perceptrons Iv´ an Cantador and Jos´ e R. Dorronsoro Dpto. de Ingenier´ ıa Inform´atica and Instituto de Ingenier´ ıa del Conocimiento Universidad Aut´onoma de Madrid, 28049 Madrid, Spain Abstract. Boosting constructs a weighted classifier out of possibly weak learners by successively concentrating on those patterns harder to clas- sify. While giving excellent results in many problems, its performance can deteriorate in the presence of patterns with incorrect labels. In this work we shall use parallel perceptrons (PP), a novel approach to the classical committee machines, to detect whether a pattern’s label may not be correct and also whether it is redundant in the sense of being well represented in the training sample by many other similar patterns. Among other things, PP allow to naturally define margins for hidden unit activations, that we shall use to define the above pattern types. This pat- tern type classification allows a more nuanced approach to boosting. In particular, the procedure we shall propose, balanced boosting, uses it to modify boosting distribution updates. As we shall illustrate numerically, balanced boosting gives very good results on relatively hard classification problems, particularly in some that present a marked imbalance between class sizes. 1 Introduction As it is well known, boosting constructs a weighted classifier out of possibly weak learners by successively concentrating on those patterns harder to classify. More precisely, it keeps on each iteration a distribution d t (X) of the underlying X patterns, and after a new hypothesis h t has been constructed in the t–th iteration, d t (X) is updated to d t+1 (X)= 1 Z t d t (X)e αty X ht(X) , (1) where y X = ±1 is the class label associated to X, Z t is a probability normal- ization constant and α t is related to the training error ǫ t of h t (more details in the third section). Therefore, after each iteration boosting concentrates on the patterns harder to classify, as we have e αty X ht(X) > 1 if y X h t (X) < 0, i.e., X has been incorrectly classified; as a consequence, the training error ǫ t will tend to 0 under mild hypothesis on the weak learner [6]. The final hypothesis is the average h(X)= t α t h t (X) of the successively built weak hypotheses h t . Boosting has been used with great success in several applications and over various data sets [2]. However, its has also been shown that it may not yield With partial support of Spain’s CICyT, TIC 01–572