Data Min Knowl Disc (2007) 14:367–407 DOI 10.1007/s10618-006-0056-4 A fast and effective method to find correlations among attributes in databases Elaine P. M. de Sousa · Caetano Traina Jr. · Agma J. M. Traina · Leejay Wu · Christos Faloutsos Received: 7 July 2005 / Accepted: 19 July 2006 / Published online: 10 February 2007 Springer Science+Business Media, LLC 2007 Abstract The problem of identifying meaningful patterns in a database lies at the very heart of data mining. A core objective of data mining processes is the recognition of inter-attribute correlations. Not only are correlations necessary for predictions and classifications — since rules would fail in the absence of pattern — but also the identification of groups of mutually correlated attributes expedites the selection of a representative subset of attributes, from which exist- ing mappings allow others to be derived. In this paper, we describe a scalable, effective algorithm to identify groups of correlated attributes. This algorithm can handle non-linear correlations between attributes, and is not restricted to a specific family of mapping functions, such as the set of polynomials. We show the results of our evaluation of the algorithm applied to synthetic and real world datasets, and demonstrate that it is able to spot the correlated attributes. Responsible editor: E. Keogh. E. P. M. de Sousa (B ) · C. Traina Jr. · A. J. M. Traina Department of Computer Science, University of São Paulo at São Carlos, São Carlos, Brazil e-mail: parros@icmc.usp.br C. Traina Jr. e-mail: caetano@icmc.usp.br A. J. M. Traina e-mail: agma@icmc.usp.br L. Wu · C. Faloutsos Departmentof Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA e-mail: lw2j@cs.cmu.edu C. Faloutsos e-mail: christos@cs.cmu.edu