Ensemble Methods Martin Sewell 2007 1 Introduction The idea of ensemble learning is to employ multiple learners and combine their predictions. There is no deﬁnitive taxonomy. Jain, Duin and Mao (2000) list eighteen classiﬁer combination schemes; Witten and Frank (2000) detail four methods of combining multiple models: bagging, boosting, stacking and error- correcting output codes whilst Alpaydin (2004) covers seven methods of combin- ing multiple learners: voting, error-correcting output codes, bagging, boosting, mixtures of experts, stacked generalization and cascading. We focus on four methods, then review the literature in general, with, where possible, an empha- sis on both theory and practical advice. 2 Bagging Bagging (Breiman 1996), a name derived from “bootstrap aggregation”, was the ﬁrst eﬀective method of ensemble learning and is one of the simplest methods of arching 1 . The meta-algorithm, which is a special case of the model averaging, was originally designed for classiﬁcation and is usually applied to decision tree models, but it can be used with any type of model for classiﬁcation or regression. The method uses multiple versions of a training set by using the bootstrap, i.e. sampling with replacement. Each of these data sets is used to train a diﬀerent model. The outputs of the models are combined by averaging (in case of regression) or voting (in case of classiﬁcation) to create a single output. Bagging is only eﬀective when using unstable (i.e. a small change in the training set can cause a signiﬁcant change in the model) nonlinear models. 3 Boosting (including AdaBoost) Boosting (Schapire 1990) is a meta-algorithm which can be viewed as a model averaging method. It is the most widely used ensemble method and one of 1 Arching (adaptive reweighting and combining) is a generic term that refers to reusing or selecting data in order to improve classiﬁcation. 1