Discretization Numbers for Multiple-Instances Problem in Relational Database Rayner Alfred 1,2 , Dimitar Kazakov 1 1 University of York, Computer Science Department, Heslington, YO105DD York, United Kingdom {ralfred, kazakov}@cs.york.ac.uk http://www-users.cs.york.ac.uk/~ralfred 2 On Study Leave from Universiti Malaysia Sabah, School of Engineering and Information Technology, 88999, Kota Kinabalu, Sabah, Malaysia ralfred@ums.edu.my Abstract. Handling numerical data stored in a relational database is different from handling those numerical data stored in a single table due to the multiple occurrences of an individual record in the non-target table and non-determinate relations between tables. Most traditional data mining methods only deal with a single table and discretize columns that contain continuous numbers into nominal values. In a relational database, multiple records with numerical attributes are stored separately from the target table, and these records are usually associated with a single structured individual stored in the target table. Numbers in multi-relational data mining (MRDM) are often discretized, after considering the schema of the relational database, in order to reduce the continuous domains to more manageable symbolic domains of low cardinality, and the loss of precision is assumed to be acceptable. In this paper, we consider different alternatives for dealing with continuous attributes in MRDM. The discretization procedures considered in this paper include algorithms that do not depend on the multi-relational structure of the data and also that are sensitive to this structure. In this experiment, we study the effects of taking the one-to-many association issue into consideration in the process of discretizing continuous numbers. We implement a new method of discretization, called the entropy- instance-based discretization method, and we evaluate this discretization method with respect to C4.5 on three varieties of a well-known multi-relational database (Mutagenesis), where numeric attributes play an important role. We demonstrate on the empirical results obtained that entropy-based discretization can be improved by taking into consideration the multiple-instance problem. Keywords: Discretization, Entropy-based, Semi-supervised clustering, Genetic Algorithm, Multiple Instance. 1. Introduction Most multi-relational data mining deals with nominal or symbolic values, often in the context of structural or graph-based mining (e.g. ILP) [1]. Much less attention has been given to the area of discretization of continuous attributes in a relational