1 On leave from Depto. de Ingeniería en Sistemas Computacionales, Universidad de las Américas, Puebla, México. Intelligent Data Mining Assistance via CBR and Ontologies Michel Charest, Sylvain Delisle, Ofelia Cervantes 1 , Yanfen Shen Département de mathématiques et d’informatique Université du Québec à Trois-Rivières Québec, Canada, G9A 5H7 {michel.charest, sylvain.delisle, yanfen.shen}@uqtr.ca, ofelia.cervantes@udlap.mx Abstract Most commercial data mining products provide a large number of models and tools for performing various data mining tasks, but few provide intelligent assistance for addressing many important decisions that must be considered during the mining process. In this paper, we propose the realization of a hybrid data mining assistant, based on the CBR paradigm and the use of an ontology, in order to empower the user during the various phases of the data mining process. 1. Introduction The following is a continuation from previous work where a strong potential has been established for better integrating data mining (DM) and decision support (DS) via the use of computational intelligence [1]. In order to remain competitive in the business world, decision-makers have begun to turn to data mining (DM) technology to cope with the information deluge and meet their informational needs. Although data mining does promise to uncover potentially valuable, useful and implicit knowledge from one’s abundant data repositories, the effective application of data mining still faces some very serious challenges: • Data mining research seems to be based on utterly specialized techniques (statistics, machine learning, information theory, database technology, etc.), whereas research on strategic, methodological, and even epistemological aspects of DM are rare. • Current DM processes make very little use of existing corporate knowledge. Consequently, DM is more tedious than is necessary and can tend to produce already known information. • Existing DM methodologies only provide general directives, however what a non-specialist really needs are explanations, heuristics and recommendations on how to effectively carry out the particular steps of the methodology. • Existing methods for evolving domain ontologies rely heavily on manually driven knowledge solicitation efforts from domain experts. In this paper, Section 2 presents some key challenges associated with attempting to provide intelligent DM assistance. Section 3 summarizes related work in the fields of DM assistance, Case-Based Reasoning (CBR) and ontologies. In Section 4, we provide a system overview of our proposed intelligent DM assistant. Lastly, Section 5 provides a brief discussion and Section 6 presents a conclusion and future work. 2. The Challenges of Intelligent Assistance 2.1 Support the Non-Expert Data Miner Most commercial data mining products either do not offer any intelligent assistance (i.e. decision support) or tend do so in the form of rudimentary “wizard-like” interfaces. These wizard-like interfaces make hard assumptions about the level of background knowledge required by a user (i.e. Oracle Data Miner, SAS Enterprise Miner, etc.). This fact has been further supported by [2]. For instance, the following is a concise list of some important decisions that must be considered during a DM process: • How to successfully specify business and DM objectives? • How to effectively perform data quality verification? • How to efficiently perform the data preparation phase (i.e. normalization, discretization)? • Which statistical or machine learning algorithm is most appropriate for the task at hand? • Which training parameters are most appropriate for applying a DM algorithm? • How to evaluate the results of the data mining effort? Over the past several decades research and applications in the fields of statistics and machine