A Novel Incremental Algorithm for Frequent Itemsets Mining in Dynamic Datasets Raudel Hern´ andez-Le´on 1,2 , Jos´ eHern´andez-Palancar 1 , J.A. Carrasco-Ochoa 2 , and J. Fco. Mart´ ınez-Trinidad 2 1 Advanced Technologies Application Center (CENATAV), 7a 21812 e/ 218 and 222, Rpto. Siboney, Playa, C.P. 12200, La Habana, Cuba 2 Computer Science Department National Institute of Astrophysics, Optics and Electronics Luis Enrique Erro No. 1, Sta. Mar´ ıa Tonantzintla, Puebla, CP:72840, Mexico {rhernandez,jpalancar}@cenatav.co.cu, {ariel,fmartine}@ccc.inaoep.mx Abstract. Frequent Itemsets (FI) Mining is one of the most researched areas of data mining. When some new transactions are appended, deleted or modified in a dataset, updating FI is a nontrivial task since such updates may invalidate existing FI or introduce new ones. In this paper a novel algorithm suitable for FI mining in dynamic datasets named Incremental Compressed Arrays is presented. In the experiments, our algorithm was compared against some algorithms as Eclat, PatriciaMine and FP-growth when new transactions are added or deleted. Keywords: Data mining, Frequent itemsets, Dynamic datasets. 1 Introduction Mining FI in transaction datasets is useful and technically feasible in several application areas, particularly in retail sales [1]. Traditional methods for data mining typically make the assumption that the dataset is static and a dataset update requires recomputing all the itemsets by scanning the updated dataset. The use of the prior knowledge to find out new itemsets on the updated dataset produces three kinds of problems: (1) to discover FI under new support threshold without dataset updating; (2) to discover FI when the dataset is updated, but support threshold is unchanged; (3) to discover FI when the dataset is updated and the support threshold is changed. In this paper, a novel algorithm for FI mining in dynamic datasets named Incremental Compressed Arrays (ICA) which solves the three kinds of problems above mentioned is presented. Our algorithm is based on a breadth first search of FI and the use of equivalence classes to group them. The use of equivalence classes combined with a compressed vertical binary representation of the dataset allows a very fast support count. The paper is organized as follows: in section 2 the related work is exposed; in section 3 we give some formal definitions; section 4 contains the description of ICA; the experimental results are discussed in the section 5 and finally the conclusion are given in section 6. J. Ruiz-Shulcloper and W.G. Kropatsch (Eds.): CIARP 2008, LNCS 5197, pp. 145–152, 2008. c Springer-Verlag Berlin Heidelberg 2008