On The Size of Training Set and The Benefit from Ensemble Zhi-Hua Zhou 1 , Dan Wei 1 , Gang Li 2 , and Honghua Dai 2 1 National Laboratory for Novel Software Technology Nanjing University, Nanjing 210093, China zhouzh@nju.edu.cn dwei@ai.nju.edu.cn 2 School of Information Technology Deakin University, Burwood, Vic3125, Australia {gangli, hdai}@deakin.edu.au Abstract. In this paper, the impact of the size of the training set on the benefit from ensemble, i.e. the gains obtained by employing ensem- ble learning paradigms, is empirically studied. Experiments on Bagged/ Boosted J4.8 decision trees with/without pruning show that enlarging the training set tends to improve the benefit from Boosting but does not significantly impact the benefit from Bagging. This phenomenon is then explained from the view of bias-variance reduction. Moreover, it is shown that even for Boosting, the benefit does not always increase consistently along with the increase of the training set size since single learners some- times may learn relatively more from additional training data that are randomly provided than ensembles do. Furthermore, it is observed that the benefit from ensemble of unpruned decision trees is usually bigger than that from ensemble of pruned decision trees. This phenomenon is then explained from the view of error-ambiguity balance. 1 Introduction Ensemble learning paradigms train a collection of learners to solve a problem. Since the generalization ability of an ensemble is usually better than that of a single learner, one of the most active areas of research in supervised learning has been to study paradigms for constructing good ensembles [5]. This paper does not attempt to propose any new ensemble algorithm. Instead, it tries to explore how the change of the training set size impacts the benefit from ensemble, i.e. the gains obtained by employing ensemble learning paradigms. Having an insight into this may be helpful to better exerting the potential of ensemble learning paradigms. This goal is pursued in this paper with an empirical study on ensembles of pruned or unpruned J4.8 decision trees [9] generated by two popular ensemble algorithms, i.e. Bagging [3] and Boosting (In fact, Boosting is a family of ensemble algorithms, but here the term is used to refer the most famous member of this family, i.e. AdaBoost [6]). Experimental results show that enlarging training set does not necessarily enlarges the benefit from ensemble. Moreover, interesting issues on the benefit from ensemble, which is related to the