© The British Computer Society 2014. All rights reserved. For Permissions, please email: journals.permissions@oup.com doi:10.1093/comjnl/bxu047 A Method to Find Functional Dependencies Through Refutations and Duality of Hypergraphs Joel Fuentes 1, , Pablo Sáez 1 , Gilberto Gutiérrez 1 and Isaac D. Scherson 2 1 Department of Computer Science and Information Technologies, Universidad del Bío-Bío, Chillán, Chile 2 Department of Computer Science, University of California, Irvine, CA, USA Corresponding author: jfuentes@ubiobio.cl One of the most important steps in obtaining a relational model from legacy systems is the extraction of functional dependencies (FDs) through data mining techniques. Several methods have been proposed for this purpose and most use direct search methods that traverse the search space in exponential time in the number of attributes of the relation. As it is not uncommon to find in practice relations with tens of attributes, a need exists to further develop more efficient techniques to find FDs. The method studied here finds the minimal set of minimal FDs using algorithms that solve the hypergraph duality problem applied on the complement of the refutation hypergraph of the relation without going through the exponential search space. After showing that the extraction of FDs can be reduced to the hypergraph duality problem, experimental results are given as verification and characterization of the correctness and time complexity of the proposed tool. Keywords: functional dependencies; duality of hypergraphs; minimal transversals Received 4 December 2013; revised 1 May 2014 Handling editor: Rada Chirkova 1. INTRODUCTION The extraction of functional dependencies (FDs from now on) from an instance of a relation is an important data mining technique, used in database design, query optimization and reverse engineering among others [1]. Studies like [2] propose methods and strategies to obtain the relational database model from legacy systems, where one of the steps is the automatic extraction of the FDs. This shows the importance of having efficient tools that perform this task. A number of tools and algorithms have been indeed proposed for this purpose, but most of them are exponential in time in the number of attributes of the relation. But in real situations it is common to have relations with a high number of attributes (for instance more than 20 or 30 attributes). This motivates the search for more efficient techniques than those used by these tools. We will show in the following sections that the problem of finding FDs can be efficiently reduced, by making use of FD refutations, to the well-known hypergraph duality problem [3], for which for instance O(n log n ), i.e. quasi-polynomial algorithms are known [4]. The idea of using hypergraph transversals for inferring FDs was independently proposed in [5, 6] and refutations were referred to as antikeys. Our main contributions in this paper can be summarized as follows: 1. Implementation of an efficient computational method to obtain the set of refutations for FDs, given an instance r of a relation R. 2. A method to store and process these refutations, represented as hyperedges of a hypergraph. 3. The obtention of all the minimal FDs that are valid in an instance of a relation, by means of the computation of the set of minimal transversals of this hypergraph. 4. A complete tool to compute the set of minimal FDs, together with an analysis of the time spent by the tool on the two main processes. The present article is divided into six sections. In Section 2, the problem is stated and related work is reviewed. In Section 3, we briefly recall the hypergraph duality problem and the known algorithms that solve it. In Section 4, we present the proposed approach to the problem. And in Sections 5 and 6 some experimental results and conclusions are given, respectively. Section A: Computer Science Theory, Methods and Tools The Computer Journal, 2014 The Computer Journal Advance Access published June 9, 2014 by guest on August 6, 2014 http://comjnl.oxfordjournals.org/ Downloaded from