kDCI++: a Multi-Strategy Algorithm for Discovering Frequent Sets in Large Databases Salvatore Orlando 1 , Paolo Palmerini 1,2 , Raffaele Perego 2 , Claudio Lucchese 1 , Fabrizio Silvestri 2,3 1 Dipartimento di Informatica, Universit` a Ca’ Foscari di Venezia, Via Torino 155, 30172 Venezia, Italy - email: orlando@dsi.unive.it 2 Istituto ISTI, Consiglio Nazionale delle Ricerche (CNR), Via Moruzzi, 1, 56100, Pisa, Italy – email: {raffaele.perego,paolo.palmerini}@isti.cnr.it 3 Dipartimento di Informatica, Universit` a di Pisa, Corso Italia, 56100 Pisa, Italy - email: silvestri@di.unipi.it 25th September 2003 Abstract This paper presents the implementation of kDCI++, an enhancement of DCI,a scalable algorithm for discovering frequent sets in large databases. The main contribution of kDCI++ resides on a novel counting inference strategy, based on a previously known result by Basted et al.. Also, multiple heuristics and efficient data structures are used in order to adapt the algorithm behavior to the features of the specific dataset mined and of the computing platform used. kDCI++ turns out to be effective in mining both short and long patterns from a variety of cases. We conducted a wide range of experiments on synthetic and real- world datasets, both in-core and out-of-core. The results obtained allow us to state that kDCI++ performances are not over-fitted on a special case, and its high performance is maintained on datasets with different characteristics. 1 Introduction Despite the considerable amount of algorithms proposed in the last decade for solving the problem of finding frequent patterns in transactional databases (among the many we mention 1