Chipper – A Novel Algorithm for Concept Description Ulf JOHANSSON a,1 , Cecilia SÖNSTRÖD a , Tuve LÖFSTRÖM a,b , Henrik BOSTRÖM b a University of Borås, School of Business and Informatics, Borås, Sweden b University of Skövde, School of Humanities and Informatics, Skövde, Sweden Abstract. In this paper, several demands placed on concept description algorithms are identified and discussed. The most important criterion is the ability to produce compact rule sets that, in a natural and accurate way, describe the most important relationships in the underlying domain. An algorithm based on the identified criteria is presented and evaluated. The algorithm, named Chipper, produces decision lists, where each rule covers a maximum number of remaining instances while meeting requested accuracy requirements. In the experiments, Chipper is evaluated on nine UCI data sets. The main result is that Chipper produces compact and understandable rule sets, clearly fulfilling the overall goal of concept description. In the experiments, Chipper’s accuracy is similar to standard decision tree and rule induction algorithms, while rule sets have superior comprehensibility. 1. Introduction In most cases, a data mining project has its origin in a business problem, where a decision-maker or an executive requests improved support for their decisions. Depending on the type of business problem, different data mining tasks or problem types can be identified. Several taxonomies of data mining problems exist and they agree upon the most important problem types. The problem type concept description does not, however, appear in all taxonomies and when it is included, the definitions differ. The CRISP-DM [1] framework identifies six basic problem types in data mining: • Data description and summarization, aimed at concise description of data characteristics, typically in elementary and aggregated form. • Segmentation, aimed at separating data into interesting and meaningful subgroups or classes. • Concept descriptions, aimed at understandable descriptions of concepts or classes. • Classification, aimed at building models which assign correct class labels to previously unseen and unlabeled data items. • Prediction, which differs from classification only in that the target attribute or class is continuous. Prediction is normally referred to as regression. • Dependency analysis, aimed at finding a model that describes significant dependencies or associations between data items or events. 1 Corresponding author: Ulf Johansson and Cecilia Sönströd are equal contributors to this work. Email: {ulf.johansson, cecilia.sonstrod }@hb.se.