REMOTE SENS. ENVIRON. 47:362-368 (1994) Assessing the Classification Accuracy of Multisource Remote Sensing Data R. W. Fitzgerald* and B. G. Lees* Classification accuracy has traditionally been expressed by the overall accuracy percentage computed from the sum of the diagonal elements of the error, confusion, or misclassification matrix resulting from the application of a classifier. This article assesses the adequacy of the overall accuracy measure and demonstrates that it can give misleading and contradictory results. The Kappa test statistic assesses interclassifier agreement and is applied in assessing the classification accuracy of two classifiers, a neural network and a decision tree model on the same data set. The Kappa statistic is shown to be a more discern- ing statistical tool for assessing the classification accuracy of different classifiers and has the added advantage of being statistically testable against the standard normal distribution. It gives the analyst better interclass discrimi- nation that the overall accuracy measure. The authors recommend that the Kappa statistic be used in preference to the overall accuracy as a means of assessing classifica- tion accuracy. INTRODUCTION In his timely article on assessing classification accuracy, Congalton (1991) reviewed a number of interrelated aspects of classification accuracy. The aspects reviewed by Congalton include site- and non-site-specific accu- racy, statistical techniques (the Kappa test statistic) for comparing between classifiers, ground truthing errors, the discreteness of classification schemes, spatial auto- correlation, and sampling issues. Like Congalton, we believe that the assessment of errors in the classification of remote sensing and GIS data has been poorly exam- ined and deserves intensive scrutiny. *Geography Department, Australian National University, Can- berra. Address correspondence to R. W. Fitzgerald, Geography Dept., Australian National Univ., Canberra ACT 0200, Australia. Received 29 April 1992; revised 8 June 1993. 362 The article examines two of the issues raised by Congalton (1991): the assessment of site-specific accu- racy by the application of a statistical technique which was designed to test interclassifier agreement and the Kappa test statistic itself. In doing so, we demonstrate that the accepted method of assessing classification ac- curacy, the overall accuracy percentage, is misleading especially so when applied at the class comparison level. The Kappa statistic is shown to be a statistically more sophisticated measure of interclassifier agreement than the overall accuracy and gives better interclass discrimi- nation than the overall accuracy. The article advances the application of the Kappa statistic by using it to compare the classification results of two supervised classifiers, a neural network and deci- sion tree classifier applied to the same input data set. Each classifier successfully fused multisource remote sensing and GIS images into a single classified floristic land cover image. These two fused images were com- pared to highlight the relative utility of the Kappa statis- tic over the overall accuracy percentage especially at the interclass level. A backpropagation ne~aral network (Fitzgerald and Lees, 1992) and a decision tree model (Lees and Ritman, 1991) were applied to the task of floristic land cover classification. Both classification techniques come from the artificial intelligence field and have been applied in many disciplines where traditional statistical classifiers have been found wanting. Decision trees are a rule-based classifier and apply top-down induction to the input data to partition each input record into a class at the end of the branches from user-defined decision rules. The resulting classifi- cation is robust against the underlying statistical distri- butions of the input variables and easily mixes continu- ous (remote sensing data) and categorical variables (GIS attribute data). It does, however, require large numbers of training sites as a squared function of the number of decision nodes. The most popular and robust neural network is 0034-4257 / 94 / $7.00 ©Elsevier Science Inc., 1994 655 Avenue of the Americas, New York, NY 10010