Between Two Extremes: Examining Decompositions of the Ensemble Objective Function Gavin Brown 1 , Jeremy Wyatt 2 , and Ping Sun 2 1 School of Computer Science, University of Manchester, Kilburn Building, Oxford Road, Manchester, M13 9PL gavin.brown@manchester.ac.uk http://www.cs.man.ac.uk/~gbrown/ 2 School of Computer Science, University of Birmingham, Edgbaston Park Road, Birmingham, B15 2TT {j.l.wyatt,p.sun}@cs.bham.ac.uk http://www.cs.bham.ac.uk/~jlw/ Abstract. We study how the error of an ensemble regression estimator can be decomposed into two components: one accounting for the indi- vidual errors and the other accounting for the correlations within the ensemble. This is the well known Ambiguity decomposition; we show an alternative way to decompose the error, and show how both decomposi- tions have been exploited in a learning scheme. Using a scaling param- eter in the decomposition we can blend the gradient (and therefore the learning process) smoothly between two extremes, from concentrating on individual accuracies and ignoring diversity, up to a full non-linear optimization of all parameters, treating the ensemble as a single learn- ing unit. We demonstrate how this also applies to ensembles using a soft combination of posterior probability estimates, so can be utilised for classifier ensembles. 1 Introduction It is well recognised that for best performance, an ensemble of estimators should exhibit some kind of disagreement on certain datapoints. When estimators pro- duce class labels and are combined by a majority vote, this is the often cited, but little understood notion of “diversity”. When in a regression framework, using estimators combined by a simple averaging operation, the notion of disagreement between estimators is rigorously defined: with a single estimator, we have the well known bias-variance trade-off [5], and with an ensemble of estimators, we have a bias-variance-covariance trade-off [10]. The regression diversity issue can now be understood quite simply: in a single estimator we have a two way trade- off, and in a regression ensemble the optimal “diversity” is that which optimally balances the bias-variance-covariance three way trade-off. The understanding of regression ensembles is therefore quite mature. The understanding of classification ensembles, using a majority vote combiner, is