Classifier Performance Measures in Multi-Fault Diagnosis for Aircraft Engines Weizhong Yan, Kai Goebel Information & Decision Technology Lab GE Global Research Center One Research Circle, Niskayuna, NY 12309 {yan, goebelk} @crd.ge.com James C. Li Department of Mechanical Engineering, Aeronautical Engineering & Mechanics Rensselaer Polytechnic Institute Troy, NY 12181 lic3@rpi.edu ABSTRACT Classifier performance evaluation is an important step in designing diagnostic systems. The purposes of performing classifier performance evaluation include: 1) to select the best classifiers from the several candidate classifiers, 2) to verify that the classifier designed meets the design requirement, and 3) to identify the need for improvements in the classifier components. In order to effectively evaluate classifier performance, a classifier performance measure needs to be defined that can be used to measure the goodness of the classifiers considered. This paper first argues that in fault diagnostic system design, commonly used performance measures, such as accuracy and ROC analysis are not always appropriate for performance evaluation. The paper then proposes using misclassification cost as a general performance measure that is suitable for binary as well as multi-class classifiers, and -most importantly- for classifiers with unequal cost consequence of the classes. The paper also provides strategies for estimating the cost matrix by taking advantage of fault criticality information obtained from FMECA. By evaluating the performance of different classifiers considered during the design process of an engine fault diagnostic system, this paper demonstrates that misclassification cost is an effective performance measure for evaluating the performance of multi-class classifiers with unequal cost consequence for different classes. Keywords: Classifier performance, performance measures, aircraft engine, fault diagnosis, ROC 1. INTRODUCTION Typical diagnostic system design involves several steps, including data preprocessing, feature extraction and selection, classifier design, and classifier performance evaluation, as shown in Figure 1. Classifier performance evaluation is an indispensable step in diagnostic system design. This is because the same classifier performs differently from application to application, i.e., classifier performance is problem specific 1 . Given that no single classifier is always superior over others for all applications, common practice for designing classifier for a given problem, therefore, involves experimenting with many different classifiers, comparing their performance, and selecting the classifier (individual or combined) with the best performance. Obviously, in this design practice, classifier performance evaluation is essential. Performing classifier performance evaluation is required not only for selecting the best classifiers from the several candidate classifiers, but also for verifying that the classifier designed meets the design requirement and for identifying the need for improvements in the classifier components. Classifier performance generally refers to both computational performance and classification performance. In this paper, however, we limit our study to classification performance only. In order to effectively evaluate classifier performance, a classifier performance measure has to be defined. A classifier performance measure is a single index that measures the goodness of the classifiers considered. Depending on the design or application requirements, different problems may call for different performance measures to ensure that the classifiers considered can be properly compared and selected. Given a