Measuring Diversity in Regression Ensembles Haimonti Dutta The Center for Computational Learning Systems, Columbia University, New York, USA. haimonti@ccls.columbia.edu Abstract. The problem of combining predictors to increase accuracy (often called ensemble learning) has been studied broadly in the machine learning community for both classiﬁcation and regression tasks. The design of an ensemble is based on the individual accuracy of the predictors and also how different they are from one another. There is a signiﬁcant body of literature on how to design and mea- sure diversity in classiﬁcation ensembles. Most of these metrics are not directly applicable to regression ensembles since the regression task inherently deals with continuous valued labels for learning. To measure diversity in regression ensem- bles, Krogh and Vedelsby show that the quadratic error of an ensemble estimator is guaranteed to be less than or equal to average quadratic error of components. However, this does not give a way to measure or create diverse regression ensem- bles. This paper presents metrics (correlation coefﬁcient, covariance, dissimilar- ity measure, chi-square and mutual information) that can be used for measuring diversity in regression ensembles. Careful selection of diverse models can be used to reduce the overall ensemble size without substantial loss in performance. We present extensive empirical results to show the performance of diverse regression ensembles formed by Bagging and Random Forest techniques. 1 Introduction Ensemble based learning algorithms have been used for many machine learning appli- cations ([Die02], [OT08], [BWT05]). They were ﬁrst used for classiﬁcation as early as the 1960’s [Nil65] and more recently techniques such as Stacking ([Wol92], [Bre93]), Bagging [Bre96], Boosting ([Sch99], [DC96]) and Random Forest [Bre01] have been developed. Ensembles are groups of learners (such as decision trees, neural networks, Support Vector Machines (SVMs)) where each learner provides an estimate of the tar- get variable which can be categorial or continuous; these estimates are combined by some technique (such as majority voting, averaging etc) thereby reducing the general- ization error produced by a single learner. Thus, the central idea in ensemble learning is to exploit the information provided by the weak learners for improved performance. Success of an ensemble of learners relies upon the diversity among the individual learners [RP06]. Diversity is the degree of disagreement [BWT05] among the individ- ual learners. The concept had its origin in software engineering where the aim was to increase reliability of solutions by combining programs whose failures were uncorre- lated [SS97]. In the context of supervised machine learning, diversity measures have been studied widely for classiﬁcation problems from different perspectives ([OS99],