Ensemble Methods Martin Sewell 2007 1 Introduction The idea of ensemble learning is to employ multiple learners and combine their predictions. There is no definitive taxonomy. Jain, Duin and Mao (2000) list eighteen classifier combination schemes; Witten and Frank (2000) detail four methods of combining multiple models: bagging, boosting, stacking and error- correcting output codes whilst Alpaydin (2004) covers seven methods of combin- ing multiple learners: voting, error-correcting output codes, bagging, boosting, mixtures of experts, stacked generalization and cascading. We focus on four methods, then review the literature in general, with, where possible, an empha- sis on both theory and practical advice. 2 Bagging Bagging (Breiman 1996), a name derived from “bootstrap aggregation”, was the first effective method of ensemble learning and is one of the simplest methods of arching 1 . The meta-algorithm, which is a special case of the model averaging, was originally designed for classification and is usually applied to decision tree models, but it can be used with any type of model for classification or regression. The method uses multiple versions of a training set by using the bootstrap, i.e. sampling with replacement. Each of these data sets is used to train a different model. The outputs of the models are combined by averaging (in case of regression) or voting (in case of classification) to create a single output. Bagging is only effective when using unstable (i.e. a small change in the training set can cause a significant change in the model) nonlinear models. 3 Boosting (including AdaBoost) Boosting (Schapire 1990) is a meta-algorithm which can be viewed as a model averaging method. It is the most widely used ensemble method and one of 1 Arching (adaptive reweighting and combining) is a generic term that refers to reusing or selecting data in order to improve classification. 1