Bagging Is A Small-Data-Set Phenomenon Nitesh Chawla , Thomas E. Moore, Jr. , Kevin W. Bowyer , Lawrence O. Hall , Clayton Springer , and Philip Kegelmeyer Department of Computer Science and Engineering University of South Florida, Tampa, Florida 33620 USA Department of Computer Science and Engineering University of Notre Dame, Notre Dame, Indiana 46556 USA Sandia National Laboratories, Biosystems Research Department P.O. Box 969, MS 9951, Livermore, CA, 94551-0969 chawla, tmoore4 @csee.usf.edu, kwb@cse.nd.edu, hall@csee.usf.edu, csprin, wpk@ca.sandia.gov Abstract Bagging forms a committee of classifiers by bootstrap ag- gregation of training sets from a pool of training data. A simple alternative to bagging is to partition the data into disjoint subsets. Experiments on various datasets show that, given the same size partitions and bags, the use of dis- joint partitions results in better performance than the use of bags. Many applications (e.g., protein structure predic- tion) involve the use of datasets that are too large to handle in the memory of the typical computer. Our results indicate that, in such applications, the simple approach of creating a committee of classifiers from disjoint partitions is to be preferred over the more complex approach of bagging. 1. Introduction Many data mining applications use data sets that are too large to be handled in the memory of the typical computer. One possible approach is to sub-sample the data in some manner [1, 2]. However, it can be difficult a priori to know how to subsample so that accuracy is not affected. An- other possible approach is to partition the original data into smaller subsets, and form a committee of classifiers [3, 4]. One advantage of this approach is that the partition size can simply be set at whatever amount of the original data can be conveniently handled on the available system. Another ad- vantage is that the committee potentially has better accuracy than a single classifier constructed on all the data. In its typical form, bagging involves random sampling with replacement from the original pool of training data to create “bags” of data for a committee of thirty to one hun- dred classifiers. Bagging has been shown to result in im- proved performance over a single classifier created on all of the original data [5, 6, 7]. The success of bagging suggests that it might be a useful approach to creating a committee of classifiers for large data sets. We define large data sets as those which do not fit in the memory of a typical scientific computer. However, experience with bagging has primarily been in the context of “small” data sets. If the original data set is too large to handle conveniently, then creating and processing thirty or more bags will of course present even greater problems. This raises the question of which partic- ulars of the bagging approach are essential in the context of large data sets. In this work, we show that simple partition- ing of a large original data set into disjoint subsets results in better performance than the creating bags of the same size. 2. Literature Review Breiman’s bagging [5] has been shown to improve classi- fier accuracy. Bagging basically combines models learned on different samplings of a given dataset. According to Breiman, bagging exploits the instability in the classifiers, since perturbing the training set produces different classi- fiers using the same learning algorithm. Quinlan exper- imented with bagging on various datasets and found that bagging substantially improved accuracy [6]. However, the experiments were performed on “small” datasets, the largest one being 20,000 examples. Domingos empirically tested two alternative theories supporting bagging: (1) bagging works because it approx- imates Bayesian model averaging or (2) it works because it shifts the priors to a more appropriate region in the deci- sion space [8]. The empirical results showed that bagging worked possibly because it counter-acts the inherent sim- plicity bias of the decision trees. That is, with M different bags, M different classifiers are learned, and together their 1