Feature Construction and δ -Free Sets in 0/1 Samples Nazha Selmaoui 1 , Claire Leschi 2 , Dominique Gay 1 , and Jean-Fran¸ cois Boulicaut 2 1 ERIM, University of New Caledonia {selmaoui, gay}@univ-nc.nc 2 INSA Lyon, LIRIS CNRS UMR 5205 {claire.leschi, jean-francois.boulicaut}@insa-lyon.fr Abstract. Given the recent breakthrough in constraint-based mining of local patterns, we decided to investigate its impact on feature construc- tion for classification tasks. We discuss preliminary results concerning the use of the so-called δ-free sets. Our guess is that their minimality might help to collect important features. Once these sets are computed, we pro- pose to select the essential ones w.r.t. class separation and generalization as new features. Our experiments have given encouraging results. 1 Introduction We would like to support difficult classification tasks (from, e.g., large noisy data) by designing well-founded processes for building new features and then using available techniques. This is challenging and our thesis is that the re- cent breakthrough in constraint-based mining of local patterns might provide some results. Considering the case of 0/1 data whose some attributes denote class values 1 , many efficient techniques are now available for computing com- plete collections of patterns which satisfy user-defined constraints (e.g., minimal frequency, freeness, closeness). Our goal is not only to consider such patterns as features but also to be able to predict (part of) the classification behavior based on these pattern properties. In this paper, we discuss preliminary results con- cerning the so-called frequent δ-free sets in 0/1 samples. When δ = 0, these sets have been studied as minimal generators for the popular (frequent) closed sets. Otherwise (δ> 0), they provide a “near equivalence” perspective and they have been studied as an approximate condensed representation for frequent sets [1]. Furthermore, the minimality of δ-free sets has been exploited for class character- ization (see, e.g., [2]) and non redundant association rule mining (see, e.g., [3]). Our guess is that this minimality, in the spirit of the MDL principle, might help to collect relevant features. This is suggested in [4] as a future direction of work, and we provide some results in that direction. Section 2 introduces δ-freeness and our feature construction process. Section 3 reports about classification tasks on both UCI data sets [5] and a real-world medical data set. Section 4 concludes. 1 It is trivial to derive Boolean data from categorical data and discretization operators can be used to transform continuous attributes into Boolean ones. N. Lavraˇ c, L. Todorovski, and K.P. Jantke (Eds.): DS 2006, LNAI 4265, pp. 363–367, 2006. c Springer-Verlag Berlin Heidelberg 2006