Proceedings of the International Symposium on Neuroinformatics and Neurocomputers, pp. 238-45, 1995 Using Multiple Statistical Prototypes to Classify Continuously Valued Data Dan Ventura and Tony R. Martinez Computer Science Department, Brigham Young University, Provo, Utah 84602 e-mail: dan@axon.cs.byu.edu, martinez@cs.byu.edu Multiple Statistical Prototypes (MSP) is a modification of a standard minimum distance classification scheme that generates multiple prototypes per class using a modified greedy heuristic. Empirical comparison of MSP with other well-known learning algorithms shows MSP to be a robust algorithm that uses a very simple premise to produce good generalization and achieve parsimonious hypothesis representation. 1. Introduction The idea of using prototypes to represent classes has proven to be a powerful mechanism for learning [1][2][14][12][9]. It is a simple and natural approach to the problem of dealing with continuously valued attributes. The basic assumption is that given an m -dimensional space defined by the input variables, there exists one or more representative points in that space for each output class. These representative points are termed prototypes. The multiple statistical prototypes algorithm (MSP) is a simple variation on this idea. It assumes that all input variables are continuously valued and that each output class can be represented by one or more gaussian functions over these input variables. This assumption is not unreasonable because all functions may be approximated by one or more gaussian bases, the worst case being the degenerate one. The idea of using statistical information obtained from a training set in the formation of prototypes has also been used in other models. Two examples of similar systems are radial basis function networks [8] and CLASSIT [6]. MSP differs from CLASSIT in its supervised approach to learning, its utility measure (distance metric), and in the fact that MSP does not use a merge- type operation. MSP and RBF both use prototypes to perform a non-linear mapping from the input space to the output space. However, they differ in their manner of calculating prototypes and in their mapping function. Section two presents the basic statistical prototypes (SP) which employ a single prototype per class. Section three extends this to multiple prototypes per class (MSP). Both sections include empirical results and comparisons with other algorithms. Section four provides further empirical results and analysis and section five concludes the paper. 2. Creating Statistical Prototypes Initially, each output class is assumed to be represented by a single m-dimensional gaussian base over the input space. Therefore, by assumption, each output class is represented with a single prototype. Define: T as a set of training instances; n as the number of instances in T, that is n = T; c as the number of output classes represented in T; v as the number of input variables represented in T; i as an index that ranges 0≤i <c and indicates output class; j as an index that ranges 0≤j <v and indicates input variable; T i as the ith sub-training set obtained by partitioning T by output class; o i as the ith output class of T; m ij as the mean of the jth input variable for T i ; σ ij , as the standard deviation of the jth input variable for T i ; p i as the prototype for class i; x as a vector of inputs representing an instance; x j as the jth input of an instance; d i as the normal distance between a point x and p i .