International Journal of Computer Applications (0975 – 8887) Volume 78 – No.15, September 2013 34 Mining Functional Dependency in Relational Databases using FUN and Dep-Miner: A Comparative Study Anupama A Chavan M.Tech (2 nd year) Department of Computer Science and Engineering Lord Krishna College of Technology Vijay Kumar Verma Asst. Professor M.Tech (CSE) Department of Computer Science and Engineering Lord Krishna College of Technology ABSTRACT Database is a collection of tables of data items, if the database is organized according to relational model it is called relational database. In a relational database, a logical and efficient design is just as critical. A poorly designed database may provide erroneous information, or may even fail to work properly may be difficult to use. Most of these problems are the result of two bad design features called redundant data and anomalies. Database normalization is the process of designing a database satisfying a set of integrity constraints, efficiently and in order to avoid inconsistencies when manipulating the database. Most of the research work has been devoted to functional dependencies. There are several algorithms have been developed in the past year like TANE, FD_Mine FD_Discover, Dep-Miner, FUN, FD Analysis using Rough sets, FD discovery by Bayes Net. In This paper we present a comparative study over Dep-Miner and FUN. We compare the working process of Dep-Miner and FUN using a simple example. Keywords Functional dependencies, closure of set, redundancy, normalization. 1. INTRODUCTION To discover dependency existing in an instance of a relation received considerable interest as it allowed automatic database analysis. Knowledge discovery and data mining database management reverse engineering and query optimization are among the main applications benefiting from efficient dependencies discovery algorithms [6]. Redundancy is often caused by functional dependency. A functional dependency is a link between two sets of attributes in a relation. We can normalize a relation by removing unwanted FDs. Normalization transforms unstructured relation into separate relations, called normalized ones. The main purpose of this separation is to eliminate redundant data and reduce data anomaly. The data is inconsistent due insert, update, and delete operations and repetition of information. There are many different levels of normalization depending on the purpose of database designer such as 1NF, 2NF, 3NF, BCNF, 4NF, 5NF to make database free from all the anomalies Most database applications are designed to be either in the third, or the Boyce-Codd normal forms in which their dependency relations are sufficient for most organizational requirements. [2, 8] 2. BASIC CONCEPTS 2.1 Functional Dependency Given a relation ‘R’, attribute ‘Y’ of ‘R’ is functional dependant on attribute ‘X’ of ‘R’ if- each ‘X’ value of “r’ is associated with precisely one value of ‘Y’ in ‘R’ ” . A functional dependency is a statement X →Y requiring that X functionally determines Y. For example city → state i.e. the state value depends on city value [7,8] 2.2 Free Set A free set is a minimal set X of attributes in schema R such that for any subset Y of X, |r[Y]|<|r[X]|. Thus, every single attribute is a free set because they do not have a subset. If X is a free set, A (R-X), and |X|<|XA| and |A| < |XA|, then XA is another free set. The lhs of any minimal FD is necessarily a free set. The free set of relation r, denoted by Fr(r), is a set of all free sets on r. 2.3 Closure of Set The closure of set X is calculated using cardinality as X + =X + {A|A (R-X) ^ |r[X]|}. That is, X + contains attribute A on a node at next level if X→ A. 2.4 Quasi-closure of Set The quasi-closure of X is X o =X + (X - A 1 ) + + - - - - - - + (X - A k ) + . In fact X o contains the attributes on all the parent nodes of X and all the dependent nodes of the parent nodes [1]. 2.5 Maximal Equivalence Class Let r be a stripped partition database. The set MC of maximal equivalence classes of r is defined as follows MC = max {c n | n r}. 2.6 Agree Set Let ti and tj be tuples and X an attribute set. The tuples ti and tj agree on X if ti[X] = tj [X]. The agree set of ti and tj is defined as follows: ag(ti,tj) ={A R/ti[A]=tj[A]}. If r is a relation, ag(r) = {ag(ti,tj)/ti,tj r,ti tj}. 2.7 Maximal Set A maximal set is an attribute set X which, for some attribute A, is the largest possible set not determining A. We denote by max(dep(r),A) the set of maximal sets for A [4].