Inductive Databases in the Relational Model: The Data as the Bridge Stefan Kramer, Volker Aufschild, Andreas Hapfelmeier, Alexander Jarasch, Kristina Kessler, Stefan Reckow, J¨ org Wicker, Lothar Richter Technische Universit¨ at M¨ unchen, Institut f¨ ur Informatik Boltzmannstr. 3, 85748 Garching bei M¨ unchen, Germany kramer@in.tum.de Abstract. We present a new and comprehensive approach to inductive databases in the relational model. The main contribution is a new in- ductive query language extending SQL, with the goal of supporting the whole knowledge discovery process, from pre-processing via data mining to post-processing. A prototype system supporting the query language was developed in the SINDBAD (structured inductive database devel- opment) project. Setting aside models and focusing on distance-based and instance-based methods, closure can easily be achieved. An exam- ple scenario from the area of gene expression data analysis demonstrates the power and simplicity of the concept. We hope that this preliminary work will help to bring the fundamental issues, such as the integration of various pattern domains and data mining techniques, to the attention of the inductive database community. 1 Introduction Many of the recent proposals for inductive databases and constraint-based data mining focus on single pattern domains (such as itemsets or molecular fragments) or single tasks, such as pattern discovery or decision tree induction [15, 2, 6, 13, 7]. Although the closure property is fulfilled by many of those approaches, the possibilities of combining various techniques in multi-step and compositional data mining are rather limited. In this paper, we report the first results of a project that explores a different avenue. The SINDBAD (structured inductive database development) project 1 aims at the development of a prototype of an inductive database system that supports the most basic preprocessing and data mining operations such that they can be combined more or less arbitrarily. One explicit goal of the project is to support the complete knowledge discovery process, from pre-processing to post-processing. Since it is at the moment far from clear what the requirements of a full-fledged inductive database will be, it is our belief that we can only find out by building prototype systems. The research described in this paper follows ideas worked out at the Dagstuhl perspectives workshop “Data Mining: The Next Generation” [1], where a system 1 Structured in the sense of SQL – structured query language.