FINDING MINIMAL REDUCT WITH BINARY INTEGER PROGRAMMING IN DATA MINING Azuraliza Abu Bakar, Md Nasir Sulaiman, Mohamed Othman, Mohd Hasan Selamat Faculty of Computer Science and lnformation Technology University Putra Malaysia 43400 Serdang, Selangor, Malaysia. azul328@,webmail.uum. &.my { nasir,mothman,hasan } @fsktm.upm. edu. my Abstract: The search for the minimum size of reduct is based on the assumption that within the dataset, there are attributes that are more important than the rest. In this paper we present an algorithm in finding minimum size reducts which is based on rough set approach and a dedicated decision related binary integer programming (BIP) algorithm. The algorithm transforms an equivalence class obtained from a decision system into a BJP model. An algorithm for solving the BIP is given. The presented work has link to rough set theory, data mining and non- monotonic reasoning. Keywords Rough Set, Prime Implicant, Reduct, Binary Integer Programming (BIP), CXF formula. L INTRODUCTION The major process of discovering knowledge in database is the extraction of rules from classes of data. The rules obtained can be definite or indefinite. The process of extracting the indefinite rules is called non monotonic reasoning which concerns finding certain selection of attributes that significant to develop the rules jn the uncertain situation without any losses of knowledge Rough set theory has become a research interest of data mining researchers in the 99s According to Pawlak [6] the rough set approach was designed as a tool to deal with uncertain or vague knowledge in artificial intelligence Tt gave rise to new formal approaches to approximate reasoning, digital logic analysis and reduction, control algorithm acquisition, machine learning algorithms and pattern recognition. Finding rules using rough set approach is wncerns with calculating reduct from a given information system. Reduct is a minimal selection of attributes that preserve dissimilarity of all objects from one another. The resulting reduct is a minimal set of attributes that enables us to introduce the same similarity relation on the universe as the whole set of attributes does. The reducts obtained are used to detennine the attributes to be considered when 0-7803-6355-8/00/$10.0002000 IEEE generating the rules from databases. It is important that the reduct is minimal[l][lO][ll]. The search for the minimum size of reduct is based on the assumption that within the dataset, there are attributes that are more important than the rest. The minimal reduct will decrease the size of the conditional attributes used to generate rules. The decreasing size of rules will assure the higher chance of their applicability to new cases and that we need to gather less information a b u t them. Staiistically, smaller size of rules are expected to classify new cases more properly because of the larger support in data and in some sense the most stable and frequently appearing reducts gives the best decision rules. The set of all decision rules generated fiom all conditional attributes can be too large and can contain many chaotic rules that are not appropriate for unseen object classification. Several research in finding minimal reducts have been discussed in [1][11]. The main idea of BIP is based on the exploring the structure of the conjunctive normal form (CNF) which is obtained from the information systems in this paper we present an algorithm based on rough set approach and binary integer programming (BIP) algorithm. The algorithm is built on an existing Propositional Satisfiability (S.4T) model for finding minimal size prime implicant which was introduced in [7], [8] This paper is organised as follows In Section 2. we give brief definition of rough set theory in decision systems Our method will incorporate the rough set approach and the Binary Tnteger Programming (RP) to find minimum si7e reducts An algorithm for computing minimal reducts is presented in Section 3 The algorithm is based on creating a BIP model that simplifies the existing formulation Branch and bound algorithm that was used for solving BP and several search techniques is also given In section 4, Two pruning techniques are proposed Some results obtained from this preliminary work are presented in section 5 Finally. firther work and suggestion concludes this paper in section 6 U. RASICNOTION . The theory of rough set is a relatively new research direction concerned with the analysis and modeling of classification 111- 14 1