2011 3 rd Conference on Data Mining and Optimization (DMO) 28-29 June 2011, Selangor Malaysia 978-1-61284-212-7/11/$26.00©2011 IEEE A Hybrid Evaluation Metric for Optimizing Classifier Hossin M. *,#,1 , Sulaiman M.N. *,2 , Mustapha A. *,3 , Mustapha N. *,4 , Rahmat R.W. *,5 * Faculty of Computer Science and Information Technology Universiti Putra Malaysia (UPM) 43400, UPM Serdang, Selangor, Malaysia # Faculty of Cognitive Sciences and Human Development Universiti Malaysia Sarawak (UNIMAS) 93400 Kota Samarahan, Sarawak, Malaysia 1 mhossin78@gmail.com, { 2 nasir, 3 aida, 4 norwati, 5 rahmita}@fsktm.upm.edu Abstract - The accuracy metric has been widely used for discriminating and selecting an optimal solution in constructing an optimized classifier. However, the use of accuracy metric leads the searching process to the sub-optimal solutions due to its limited capability of discriminating values. In this study, we propose a hybrid evaluation metric, which combines the accuracy metric with the precision and recall metrics. We call this new performance metric as Optimized Accuracy with Recall-Precision (OARP). This paper demonstrates that the OARP metric is more discriminating than the accuracy metric using two counter-examples. To verify this advantage, we conduct an empirical verification using a statistical discriminative analysis to prove that the OARP is statistically more discriminating than the accuracy metric. We also empirically demonstrate that a naive stochastic classification algorithm trained with the OARP metric is able to obtain better predictive results than the one trained with the conventional accuracy metric. The experiments have proved that the OARP metric is a better evaluator and optimizer in the constructing of optimized classifier. Keywords - Hybrid evaluation metric, Accuracy metric, Precision metric, Recall metric, Classifier optimization I. INTRODUCTION The definition of classification can be varied. In a simple definition, classification can be defined as a process of assigning data or instances to a particular pre-specified fixed number of classes according to its predicted class by measuring a number of attributes on a particular data or instances. Technically, classification process involves two main phases. Firstly, classification process starts with building an optimized classifier through a training data set. Secondly, the optimized classifier is used for testing, which is used to predict the unknown data. In this study, we are interested to investigate the use of new evaluation metric in building the optimized classifier during the training phase. To produce an optimized classifier, certain metrics are used to evaluate the generated solutions. The most common and popular evaluation metric used for evaluating and selecting the optimal solution is accuracy or error rate (1- accuracy). However, using accuracy metric as benchmark measurement has limitations. Let us consider two solutions s={A, B} in a two-class confusion matrix where both A and B produce an equivalent total of correct predicted instances as shown in Table I. However, as indicated in Table I, both solutions exhibit the same accuracy, which is 95%. In this case, the accuracy metric could not discriminate whether A or B is better. This clearly indicates that the accuracy metric exhibit poor discriminating value to discriminate the optimal solution due to less information about the overall results. Nonetheless, intuitively, we can conclude that solution B is better than A. This is because the total instances that have been correctly predicted by A for both classes are approximately balanced (48:47). In contrast, the total of instances that correctly predicted by B is unbalanced (50:45) for both classes. TABLE I THE PROBLEM WITH ACCURACY FOR BALANCED CLASS DISTRIBUTION (50:50) Solution A Solution B Actual Class Pos Neg Pos Neg Predicted class Pos 50 5 48 3 Neg 0 45 2 47 On top of that, few studies have reported that the simplicity of the accuracy metric can lead the selection and the discrimination processes to the sub-optimal solutions especially when dealing with imbalanced class instances [1][2][3][4]. This is because a small class of instances has very little impact on the accuracy as compared to the large class of instances. Clearly, this indicates that the performance of accuracy metric is not robust and can be drastically affected by the changes in data proportion. On the other hand, the precision and recall are two evaluation metrics that are commonly used as the alternative metric to measure the performance of binary classifiers for two different aspects [5]. Basically, precision is used to determine the fraction of positive instances that are correctly predicted in a positive class, while recall measures the 165