Rule Evaluation Measures: A Unifying View Nada Lavraˇ c 1 , Peter Flach 2 , and Blaz Zupan 3,1 1 Department of Intelligent Systems Joˇ zef Stefan Institute, Ljubljana, Slovenia 2 Department of Computer Science University of Bristol, United Kingdom 3 Faculty of Computer and Information Sciences University of Ljubljana, Slovenia Abstract. Numerous measures are used for performance evaluation in machine learning. In predictive knowledge discovery, the most frequently used measure is classification accuracy. With new tasks being addressed in knowledge discovery, new measures appear. In descriptive knowledge discovery, where induced rules are not primarily intended for classifica- tion, new measures used are novelty in clausal and subgroup discovery, and support and confidence in association rule learning. Additional mea- sures are needed as many descriptive knowledge discovery tasks involve the induction of a large set of redundant rules and the problem is the ranking and filtering of the induced rule set. In this paper we develop a unifying view on some of the existing measures for predictive and de- scriptive induction. We provide a common terminology and notation by means of contingency tables. We demonstrate how to trade off these measures, by using what we call weighted relative accuracy. The paper furthermore demonstrates that many rule evaluation measures develo- ped for predictive knowledge discovery can be adapted to descriptive knowledge discovery tasks. 1 Introduction Numerous measures are used for performance evaluation in machine learning and knowledge discovery. In classification-oriented predictive induction, the most fre- quently used measure is classification accuracy. Other standard measures include precision and recall in information retrieval, and sensitivity and specificity in me- dical data analysis. With new tasks being addressed in knowledge discovery, new measures need to be defined, such as novelty in clausal and subgroup discovery, and support and confidence in association rule learning. These new knowledge discovery tasks belong to what is called descriptive induction. Descriptive induc- tion also includes other knowledge discovery tasks, such as learning of properties, integrity constraints, and attribute dependencies. This paper provides an analysis of selected rule evaluation measures. The analysis applies to cases where single rules have to be ranked according to how well they are supported by the data. It also applies to both predictive and de- scriptive induction. As we argue in this paper, the right way to use standard rule S. Dˇ zeroski and P. Flach (Eds.): ILP-99, LNAI 1634, pp. 174–185, 1999. c Springer-Verlag Berlin Heidelberg 1999