Pattern Recoynition, Vol. 26, No. 6, pp. 953 961, 1993 Printed in Great Britain 0031 3203/93 S6.00+.00 Pergamon Press Ltd ~) 1993 Pattern Recognition Society A COMPARISON OF DECISION TREE CLASSIFIERS WITH BACKPROPAGATION NEURAL NETWORKS FOR MULTIMODAL CLASSIFICATION PROBLEMS DONALD E. BROWN, VINCENTCORRUBLEand CLARENCELOUIS PITTARD Institute for Parallel Computation and Department of Systems Engineering, University of Virginia, Charlottesville, VA 22901, U.S.A. (Received 1 April 1992; in revised form 5 October 1992; received for publication 9 December 1992) Abstract--Multi-modal classification problems involve the recognition of patterns where the patterns associated with each class can come from disjoint regions in feature space. Traditional linear discriminant methods cannot cope with these problems. While a number of approaches exist for classifying patterns with multiple modes, decision trees and backpropagation neural networks represent leading algorithms with special capabilities for dealing with this problem class. This paper provides a comparison of decision trees with backpropagation neural networks for three distinct multi-modalproblems:two from emitter classification and one from digit recognition. These real-world problems provide an interesting range of problem characteristics for our comparison: one emitter classificationproblem has few features and a large data set; and the other has many features and a small data set. Additionally,both emitter classification problems have real-valued features, while the digit recognition problem has binary-valued features. The results show that both methods produce comparable error rates but that direct application of either method will not necessarily produce the lowest error rate. In particular, we improve decision tree results with multi-variable splits and we improve backpropagation neural networks with feature selection and mode identification. Classification trees Backpropagation neural networks Emitter identification Digit recognition i. MULTI-MODAL CLASSIFICATION Multi-modal classification problems involve the rec- ognition of patterns where the patterns associated with each class can come from disjoint regions in fea- ture space. The classic XOR problem that doomed the perceptron ") is an example of a simple (two- dimensional), deterministic multi-modal problem. These problems are beyond the reach of linear discriminant functions and present additional problems when ob- servations are stochastic. This paper compares two approaches to multi-modal classification: backpropa- gation neural networks (BNNs) and decision trees (DTs). The backpropagation algorithm for training multi- layer neural networks ~21 provided the machinery for neural networks to learn functions like XOR. In theory 131 these networks can compute decision surfaces for arbitrary multi-modal problems. In particular, three layer networks can represent arbitrary patterns because the first layer defines hyperplane decision bound- aries, the second layer intersects these boundaries to define polygonal regions, and the final layer provides for unions of these regions.I.t However, practical problems remain, such as the number of elements in the hidden layer, the learning rate, and the appropriate optimization algorithm for minimizing classification error. Despite these practical challenges, BNNs rep- resent a very promising approach to multimodal clas- sification problems (see Lippmann141 and Gold et al. 151 for examples). DTs 16) provide another approach to multi-modal problems. Users can easily interpret and understand decision tree classifiers and developers can easily implement them in either hardware or software. The procedure which generates the tree, called the recursive partitioning algorithm, consists of a series of local searches for good partitions or splits. The recursive partitioning algorithm first grows a tree so that each leaf node contains very few observations. Once it has a fully grown tree, the algorithm then prunes the tree back with a heuristic procedure designed to guard against overfitting. In I'ollowing these steps to build the decision tree, the recursive partitioning algorithm produces unions of piecewise linear decision surfaces, which makes it appropriate for multi-modal problems. Note also that in principle DTs and multi-layer neural networks produce similar decision surfaces. Later we note differences that arise because of implementation details. No one has provided a theoretical reason to judge one of these methods as superior to the other. Hence, this paper undertakes an empirical comparison using data from problems in emitter classification. Several previous studies have reported on empirical compari- sons of the two approaches. Atlas et al) 71 compared the approaches on three problems, two in power systems and one in vowel identification, and found BNNs to have lower error rates in all problems although they found a statistically significant difference for only one data set. Lee and LippmannIs) compared DTs, BNNs and other approaches on four problems 953