Int. J. Human-Computer Studies (2002) 56, 445–474 doi:10.1006/ijhc.1002 Available online at http://www.idealibrary.com.on F AN : Finding AccurateiNductions JOSE ´ RANILLA AND ANTONIO BAHAMONDE Centro de Inteligencia Artiﬁcial, Universidad de Oviedo, Campus de Viesques, 33271 Gij ! on, Spain. emails: ranilla@aic.uniovi.es; antonio@aic.uniovi.es (Received 24 May 2001 and accepted in revised form 6 March 2002) In this paper we present a machine-learning algorithm that computes a small set of accurate and interpretable rules. The decisions of these rules can be straight-forwardly explainedastheconclusionsdrawnbyacase-basedreasoner.OursystemisnamedFAN, an acronym for finding accurate inductions. It starts from a collection of training examples and produces propositional rules able to classify unseen cases following a minimum-distance criterion in their evaluation procedure. In this way, we combine the advantages of instance-based algorithms and the conciseness of rule (or decision-tree) inducers.ThealgorithmfollowedbyFAN canbeseenastheresultofsuccessivestepsof pruningheuristics.Themaintoolemployedisthatofthe impurity level,ameasureofthe classiﬁcation quality of a rule, inspired by a similar measure used in IB3. Finally, a number of experiments were conducted with standard benchmark datasets of the UCI repository to test the performance of our system, successfully comparing FAN with a wide collection of machine-learning algorithms. # 2002 Elsevier Science Ltd. All rights reserved. KEYWORDS: machine learning; classiﬁcation rules; minimum distance; induction from examples. 1. Introduction Instance-based or lazy machine-learning algorithms share a nearest-neighbour philosophy. Given a dataset of examples, one has to select a subset of representative elements, and a metric or similarity function; then, when the goal is to classify a new case,theclassofthenearest(ormostsimilar)previouslyselectedelementwillbeoffered astherequiredclass(Cover&Hart,1967).Moreover,theanswersofthesealgorithms caneasilybeendowedwithastraight-forwardexplanation:thingsarelikelytohappen as they did in the most similar case. Inotherwords,thesealgorithmsinduceacase-baseddecisionmechanism.Thisis,in fact,anaturalapproachtolearning:recallpastexperiencesforfutureactions.However, the advantage of the naturalness of the solutions provided decreases as the number of representativeexamplesselectedgrows.Additionally,thequalityoftheexplanationsis poorerifwehavetocommunicatethesimilarityfunctionwhenitisanessentialpartof what was learned (Cost & Salzberg, 1993; Wilson & Mart ! ınez, 1997). On the other hand, pure rule-based algorithms manipulate training examples to explicitly build a partition of the space of all known possible examples. In this case, classiﬁcationaccuracyisoftenhigherthanthatofinstance-basedsystems;however,the qualityoftheexplanationsattachedtotheirclassiﬁcationmechanismsisusuallylower. 1071-5819/02/$-see front matter # 2002 Elsevier Science Ltd. All rights reserved.