Pattern Recognition Pergamon Pros 1971. Vol. 3, pp. 225-234. Printed in Great Britain On Dimensionality and Sample Size in Statistical Pattern Classification LAVEEN KANAL The Computer Science Center, University of Maryland, College Park, Maryland, U.S.A. and B. CHANDRASEKARAN Department of Computer and Information Science, The Ohio State University, Columbus, Ohio, U.S.A. (Received 26 June 1970) Abstract--The basic question of how to optimally make use of a finite number of available samples in designing pattern recognition systems is considered. This has several components: optimal use of the samples for design and testing; and the relationship between the number of measurements and the number of samples for various prob- ability structural constraints. A spectrum of possibilities has been demonstrated, placing several apparently conflicting recent results in perspective. I. INTRODUCTION SOMEquestions on dimensionality and sample size which arise in the statistical approach to the design of pattern classification systems are: what is the best way to use a fixed size sample to design a classification system and evaluate its performance? When a certain finite number of samples is available what should be the dimensionality of the pattern vector, i.e. how many variables should be used, and if one can get as many samples as one wants, can the probability of error be made arbitrarily small by increasing the number of variables? Surprising as it may seem now, in much of the earlier work in pattern classification, especially that based on adaptive algorithms, the entire set of available samples was first used for design and future performance was then predicted to be that achieved on this design set. By now it is well known that this procedure is biased, resulting in too optimistic an estimate of performance. The choice between competing design procedures can only be based on predicted performance. We would like the ranking of procedures based on performance estimated from a fixed size sample to correspond to the ranking that would occur given actual per- formance. Moreover we want the estimated performance of the system finally selected to be a "good" predictor of its actual performance. Both the sizes of the design and test sample sets influence the accuracy of these estimates. We are then faced with the problem of the optimum use of a fixed size sample for maximizing the accuracy of these estimates. This can be considered without reference to the specific competing design procedures. Pattern vector dimensionality enters into the effectiveness of the design based on finite samples. In statistical classification, estimation, and prediction, it has often been noted that, with finite samples, performance does not always improve as the number of variables is arbitrarily increased. Sometimes it may even deteriorate. This, added to the increase in 225