On elimination of redundant attributes in decision tables Long Giang Nguyen Institute of Information Technology, VAST, Viet nam Email: nlgiang@ioit.ac.vn Hung Son Nguyen Institute of Mathematics, Warsaw University Banach 2, 02-097, Warsaw, Poland Email: son@mimuw.edu.pl Abstract—Most decision support systems based on rough set theory are related to the minimal reduct calculation problem, which is NP-hard. This paper investigates the problem of search- ing for the set of useful attributes that occur in at least one reduct. By compliment, this problem is equivalent to searching for the set of redundant attributes, i.e. the attributes that do not occur in any reducts of the given decision table. We show that the considered problem is equivalent to a Sperner system for relational data base system and prove that it can be solved in polynomial time. On the base of these theoretical results, we also propose some algorithms for elimination of redundant attributes in decision tables. Index Terms—rough sets, reducts, relational database, minimal keys, Sperner system I. I NTRODUCTION F EATURE selection is one of the crucial problems in machine learning and data mining. The accuracy of many classification algorithms depends on the quality of selected attributes. Rough set approach to feature selection problem is based on reducts, which are in fact the minimal (with respect to inclusion) sets of attributes that preserve some necessary amount of information. Unfortunately, the number of all reducts for a given decision table can be exponential with respect to the number of attributes. Therefore we are forced to search either for minimal length reducts or for core attributes, i.e. the attributes that occur in all reducts. The minimal reduct problem is NP-hard whilst the searching for core attribute problem can be solved in polynomial time. This paper investigates the problem of identifying the set of attributes, that are present in at least one reduct. Such attributes are called the reductive attributes. The not reductive attributes are called redundant attributes because they do not play any role in object classification. For a given decision table, the problem of searching for all reductive attributes becomes the problem of determining the union of all reducts of the given decision table, or determining the set of all redundant attributes of a decision table. In this paper we present two approaches to the investigated problem. Firstly, we present the fundamental analysis of the problem of searching for reductive attributes. Using Boolean reasoning approach we prove that the problem can be solved completely in polynomial time. Moreover, we can consider the decision table as the relation over the set of attributes and apply some results in relational database theory to solve the mentioned problems. We propose an algorithm to determine the set of all reductive attributes of consistent decision tables based on the methods of searching for keys, antikeys and prime attributes in decision table (see [1], [2]). The structure of this paper is as follows. Section II and Section III presents some basic concepts in rough set theory as well as the computational complexity of the reduct calcu- lation problems. Section IV presents the concept of reducts in decision table from the view point of relational database theory. We also propose an algorithm to determine the set of all reductive attributes of a consistent decision table. In Section V, we perform some experiments of the proposed algorithm. The conclusions and future remarks are presented in the last section. II. BASIC CONCEPTS An information system is a pair A =(U, A), where the set U denotes the universe of objects and A is the set of attributes, i.e. the mappings of the form: a : U V a . The set V a is called the domain or the value set of attribute a. A decision system is an information system D =(U, A {dec}) where dec is a distinguished attribute called the de- cision attribute or briefly decision. The remaining attributes are called conditional attributes or briefly conditions. For convenience, we assume that the domain of decision attribute consists of two or very few values. For any k V dec the set CLASS k = {u U : dec(u)= k} is called the decision class of D As an example, let us consider the decision system below (Table I). Attributes Diploma, Experience, French and Refer- ence are condition attributes, whereas Decision is the decision attribute. We will refer to decision attribute Decision as dec, and to conditional attributes Diploma, Experience, French and Reference as to a 1 ,...,a 4 in this order. In this example there are two decision classes related to the values Accept and Reject of the decision attribute domain. These decision classes are as follow: CLASS Accept = {x 1 ,x 4 ,x 6 ,x 8 } CLASS Reject = {x 2 ,x 3 ,x 5 ,x 7 } Rough set theory has been introduced by Professor Z.Pawlak [6] as a tool for concept approximation under Proceedings of the Federated Conference on Computer Science and Information Systems pp. 317–322 ISBN 978-83-60810-51-4 978-83-60810-51-4/$25.00 c 2012 IEEE 317