J. Ind. Soc. Agri/. Statist. 59(2), 2005 : 161-168 Dynamic RDT Model for Mining Rules from Real Data Rajni Jain and Sonajharia Minz l National Centre for Agricultural Economics and Policy Research, New Delhi (Received: July, 2005) SUMMARY Dynamic Rough Set based Decision Tree Induction (ROT) model is proposed to deal with the noise present in the real time dataset. The paper explores the variants of ROT models for learning classification rules. The required set of classification rules is aimed to help in identification of the households which are vulnerable to food shortage. The classification rules are desired to be as simple -' as possible. In this paper, classical rough set method, C4.5 algorithm, the hybrid algorithm ROT and its variants as well as dynamic ROT model are used for mining rules from a real dataset. The experimental results are compared graphically with that of the base algorithms based on the performance parameters classification accuracy, complexity, number of rules and the CS score for the resulting classifier. The performance parameter accuracy as obtained by using Linear Discriminant Analysis is used as a benchmark for comparing accuracy of the proposed model called dynamic ROT. The performance of the proposed model is observed to be better for the real dataset. Key words : Rough set, Data mining, Dynamic reduct, Classification, Dynamic ROT model, Rule extraction, LDA. 1. INTRODUCTION completeness. All these approaches are compared by applying each to a real data, Nutrition dataset. A study The aim of the paper is to present a basic framework for the nutritional security of rural households in India of dynamic Rough Set based Decision Tree Induction using other methods has already been carried out in (ROT) model for learning class!fication rules from real Adhiguru and Ramasamy (2003). data. ROT model is an integratiOn ofthe classical Rough Set (RS) approach and the Decision Tree (DT) induction 2. REVIEW OF BASIC CONCEPTS (Minz and Jain (2003a». The set of decision rules obtained using all the conditional attributes can be too 2.1 Noise in Real Dataset large to be suitable for the classification of the unseen Real world data is almost always characterized by object(s). It is also difficult for human users to incomplete data as well as imperfect values of the comprehend and use them directly without the use of a attributes which require due attention in designing a computer. The problem oflearning simpler classification learning algorithm. Data is incomplete because attributes rules is analyzed in this paper using the ROT model. of interest may not always be available or the data may The suitability of the ROT model for learning simple not be included because it was not considered important and accurate classification rules from small datasets and at the time of entry. The data is imperfect if it is noisy from the repository of datasets from University of and inconsistent (e.g., containing discrepancies in the California (UCI) available at http://www.ics.ucLeduJ codes used to categorize items or counter examples in -mlearn is examined in Jain and Minz (2003a); and Minz the training data) and/or contains only aggregated values. and Jain (2003b) based on the experimental results. In Noise is a random error or variance in a measured this paper, we propose a novel variant of ROT namely variable (Han and Kamber (2001». There may be many dynamic ROT which involves computation of dynamic possible reasons for noisy data. The data collection reducts (Bazan et al. (1994». Some other variations of instruments used may be faulty. There may have been basic ROT approach are also presented for the sake of human or computer errors occurring at the time of data School of Computers and Systems Sciences, Jawaharlal entry. Zhu et al. (2003) defines two types of noise: Nehru University, New Delhi-110067 (a) attribute noise; and (b) class noise. The former is the I