Evaluation of Decision Tree Pruning with Subadditive Penalties Sergio Garc´ ıa-Moratilla, Gonzalo Mart´ ınez-Mu˜ noz and Alberto Su´ arez Universidad Aut´onoma de Madrid, Avenida Francisco Tom´as y Valiente, 11, Madrid 28049, Spain, sergio.garciamoratilla@estudiante.uam.es, gonzalo.martinez@uam.es,alberto.suarez@uam.es Abstract. Recent work on decision tree pruning [1] has brought to the attention of the machine learning community the fact that, in classi- fication problems, the use of subadditive penalties in cost-complexity pruning has a stronger theoretical basis than the usual additive penalty terms. We implement cost-complexity pruning algorithms with general size-dependent penalties to confirm the results of [1]. Namely, that the family of pruned subtrees selected by pruning with a subadditive penalty of increasing strength is a subset of the family selected using additive penalties. Consequently, this family of pruned trees is unique, it is nested and it can be computed efficiently. However, in spite of the better theo- retical grounding of cost-complexity pruning with subadditive penalties, we found no systematic improvements in the generalization performance of the final classification tree selected by cross-validation using subaddi- tive penalties instead of the commonly used additive ones. 1 Introduction Decision trees are one of the most extended types of classifiers. The reasons for their wide use are the availability of efficient algorithms for the automatic induction of decision trees from labeled data (CART [2], C4.5 [3]), the high processing speed and accuracy that can be obtained in many classification prob- lems of practical interest, and the interpretability of the classification models generated. A decision tree is a hierarchical questionnaire that partitions the data into disjoint subsets according to the result of tests associated to each of the non- terminal nodes of the tree. By applying the sequence of tests at the internal nodes, an example is associated to a single leaf node on the fringe of the decision tree. The classification given by the tree is the class label of the leaf node to which the example is assigned. Assuming that only Boolean tests are used, as in CART, the decision tree is a rooted binary tree. The root node has all the examples associated to it and yields as a classification the majority class of the examples in the whole training set. The binary decision tree is grown from the root node by performing a test that splits the data into two disjoint subsets.