AbstractThe main goal of data mining is to extract accurate, comprehensible and interesting knowledge from databases that may be considered as large search spaces. In this paper, a new, efficient type of genetic algorithm (GA) called uniform two-level GA is proposed as a search strategy to discover truly interesting, high-level prediction rules, a difficult problem and relatively little researched, rather than discovering classification knowledge as usual in the literatures. The proposed method uses the advantage of uniform population method and addresses the task of generalized rule induction that can be regarded as a generalization of the task of classification. Although the task of generalized rule induction requires a lot of computations, which is usually not satisfied with the normal algorithms, it was demonstrated that this method increased the performance of GAs and rapidly found interesting rules. KeywordsClassification Rule Mining, Data Mining, Genetic Algorithms. I. INTRODUCTION ATA mining (DM) consists of the discovery of highly accurate, comprehensible and interesting (novel) knowledge from large databases. There are several kinds of tasks of DM depending mainly on the application domain and the user interest. The task of classification task consists of supervised learning methods that induce a classification model from a database. However, in many classification algorithms, the emphasis is discovery of accurate knowledge as measured e.g. by the classification error rate. In this paper, knowledge accuracy is secondary concern and the emphasis is discovering novel, interesting (surprising), comprehensible knowledge. As demonstrated in various application domains, GAs have proved to be an appealing alternative to classical search algorithms for exploring a large search space. Besides their robustness and less likely to getting stuck in local optima, they have tendency to cope better with attribute interaction. Moreover, they are highly parallel in nature and therefore attractive to parallel and distributed implementations. This paper proposes a new, efficient type of GA, called uniform two-level GA, to discover interesting rules for the task of generalized rule induction where different rules can predict different goal attributes. This task can be regarded as a B. Alata is with Department of Computer Engineering, Fırat University, 23119, Elazı / Turkey (phone: +90 424 237 00 00 / 5293; fax: +90 424 218 19 07); e-mail: balatas@firat.edu.tr). A. Arslan is with Department of Computer Engineering, Selcuk University, Konya / Turkey (e-mail: ahmetarslan@selcuk.edu.tr). generalization of the very well known classification task, where all rules predict the same goal attribute. The two key issues in the proposed approach are the use of uniform population [11-12], which distributes the initial population in the feasible region uniformly and the new type of GA, two- level GA, which uses an island model for initial population and distributes initial population on different islands methodically. This paper is organized as follows. Section 2 briefly describes the related works about interestingness of the rules. The basic characteristics of the task of generalized rule induction and advantages of using GAs for this task from a DM viewpoint is also described Section 3. Section 4 is the detailed description of proposed method. Section 5 briefly describes the data set used in the experiments. Section 6 discusses the experimental results. Finally, section 7 concludes the paper. II. RELATED WORKS ABOUT RULE INTERESTINGNESS Recently, several researchers have presented different viewpoints on the rule interestingness. In [1] the need for a better grasp on the concept of interestingness for DM with an example from marketing is demonstrated. Applying a traditional apriori association algorithm to the analysis of 87,437 records of consumer purchase data, over 40,000 association rules were generated, “many of which were irrelevant or obvious.” Identifying the important and actionable discoveries from amongst these 40,000 “nuggets” is itself a key task for DM. The concept of interestingness is difficult to formalize and varies considerably across different domains. A growing literature in DM is beginning to address the question. Early work attempted to identify objective measures of interestingness, and the confidence and support measures used in association algorithms are examples of objective measures. One of the earliest efforts to address the explosion of discoveries by identifying interestingness was through the use of rule templates with attribute hierarchies and visualization [2]. In [3], interestingness measures are partitioned into objective and subjective measures, and further partition subjective measures into those that capture unexpectedness and those that capture actionability. Many authors have focused on capturing unexpectedness as a useful measure, particularly in the context of discovering associations [4] and classifications [5]. In [6], an Mining of Interesting Prediction Rules with Uniform Two-Level Genetic Algorithm Bilal Alatas and Ahmet Arslan D World Academy of Science, Engineering and Technology International Journal of Computer and Information Engineering Vol:1, No:7, 2007 2316 International Scholarly and Scientific Research & Innovation 1(7) 2007 scholar.waset.org/1307-6892/8147 International Science Index, Computer and Information Engineering Vol:1, No:7, 2007 waset.org/Publication/8147