Maximizing Tree Diversity by Building Complete-Random Decision Trees Fei Tony Liu 1 , Kai Ming Ting 1 , and Wei Fan 2 1 School of Computing and Information Technology, Monash University Churchill, Victoria 3842 Australia {Tony.Liu, KaiMing.Ting}@infotech.monash.edu.au 2 IBM T.J. Waston Research, Hawthorne, NY 10532 weifan@us.ibm.com Abstract. One of the ways to lower generalization error of decision tree ensemble is to maximize tree diversity. Building complete-random trees forgoes strength obtained from a test selection criterion. However, it achieves higher tree diversity. We provide a taxonomy of different ran- domization methods and find that complete-random test selection pro- duces diverse trees and other randomization methods such as bootstrap sampling may impair tree growth and limit tree diversity. The well ac- cepted practice in constructing decision trees is to apply bootstrap sam- pling and voting. To challenge this practice, we explore eight variants of complete-random trees using three parameters: ensemble methods, tree height restriction and sample randomization. Surprisingly, the most ac- curate variant is very simple and performs comparably to Bagging and Random Forests. It achieves good results by maximizing tree diversity and is called Max-diverse Ensemble. 1 Introduction Random tree ensembles introduce different random elements to construct di- versified decision trees. For classification problems, results from these trees are combined by an ensemble method to produce the final prediction. Random De- cision Trees [8] is one that is constructed without conventional test selection criteria, which questions the utility of these heuristics that are widely employed in many decision tree learning algorithms. The underlying argument is that they are effective to compute accurate single trees but there is no guarantee on the final accuracy of a tree ensemble. As it stands, there is no creditable report known to us that extensively anal- yses and compares complete-random trees with other decision tree ensembles. This paper aims to explore complete-random trees and compare them with Bag- ging [3] and Random Forests [5] which are widely accepted and use techniques such as randomized feature selection, bootstrap sampling and voting. The fun- damental objective of randomization in tree construction is to create diversity. After all, there is no point in combining a forest of identical trees. Section 2 of T.B. Ho, D. Cheung, and H. Liu (Eds.): PAKDD 2005, LNAI 3518, pp. 605–610, 2005. c Springer-Verlag Berlin Heidelberg 2005