Mining generalized temporal patterns based on fuzzy counting Francisco Guil a,⇑ , Antonio Bailón b , José A. Álvarez a , Roque Marín c a High School of Engineering, University of Almeria, Almería, Spain b High Technical School of Computer Science and Telecommunications, University of Granada, Granada, Spain c Faculty of Computer Science, University of Murcia, Murcia, Spain article info Keywords: Temporal data mining Fuzzy sets Temporal patterns Event-based sequences abstract Event-based sequences are a kind of pattern based on temporal associations with two essential charac- teristics: they are syntactically simple and have a great expressive power. For this reason, event-based sequence mining is an interesting solution to the problem of knowledge discovery in dynamic domains, mainly characterized by a time-varying nature. The inter-transactional model has led to the design of algorithms aimed to obtain this sort of patterns from time-stamped datasets. These algorithms extend the well-known Apriori algorithm, by explicitly adding the temporal context where associations among frequent events occurs. This leads to the possibility of extracting a larger number of patterns with a potential interest in decision making. However, its usefulness is diminished in those datasets where the characteristics of variability and uncertainty are present, which is a common issue in real domains. This is due to the rigidity of the counting method, which uses an exact measure of distance between tem- poral events. As a solution, we propose a generalization of the temporal mining process, which implies a relaxation of the counting method including the concept of approximate temporal distance between events. In particular, in this paper we present an algorithm, called TSET fuzzy -Miner, which incorporates a fuzzy-based counting technique in order to extract general, ﬂexible, and practical temporal patterns taking into account the particular characteristics of real domains. Ó 2012 Elsevier Ltd. All rights reserved. 1. Introduction Data mining is an essential step in the process of knowledge discovery in databases that consists of applying data analysis and discovery algorithms that produce a particular enumeration of structures (models and patterns) over the data (Fayyad, Piatetky- Shapiro, & Smyth, 1996). Depending on the structure, data mining can be approached from two different perspectives: global and lo- cal methods (Mannila, 2002). In this work we are interested in local methods, commonly known as frequent pattern mining. The sim- plest case of pattern discovery is ﬁnding association rules (Agrawal, Imielinski, & Swami, 1993), a kind of pattern used as a means to help in the analysis of large transactional databases. Such associa- tion rules, when discovered, provide valuable knowledge for deci- sion making. One approach is integrating the data mining process into the development of Knowledge Based Systems (Li, Xie, & Xu, 2011; Ordonez, Santana, & de Braal, 2000, 2011). Since the problem of mining association rules was introduced by Agrawal et al. in Agrawal et al. (1993), many research work has been accomplished in a wide range of directions, including the improvement of the Apriori algorithm, mining generalized, multi-level, or quantitative association rules, mining weighted association rules, fuzzy association rules mining, constraint-based rule mining, efﬁcient long patterns mining, maintenance of the dis- covered association rules, etc. In general, temporal data mining can be seen as an extension of this work. Temporal data mining can be deﬁned as the activity of search- ing for interesting correlations or patterns in large sets of temporal data accumulated for other purposes than those originally ex- pected (Bettini, Wang, & Jajodia, 1996). It has the ability of mining activity, inferring associations of contextual and temporal proxim- ity, some of which may also indicate a cause-effect association. This important kind of knowledge can be overlooked when the temporal component is ignored or treated as a simple numeric attribute (Roddick & Spiliopoulou, 2002). Data mining is an inter- disciplinary ﬁeld which has received contributions from a variety disciplines, mainly from databases, machine learning and statistic. However, in the case of temporal data mining techniques, the most inﬂuential ﬁeld has been Artiﬁcial Intelligence, the reason why can be found in the extensive efforts in the temporal reasoning line that gave rise to the development of many of these techniques. In non-temporal data mining techniques, there are usually two dif- ferent tasks, the description of the characteristics of the database (or analysis of the data) and the prediction of the evolution of the population. However, in temporal data mining this distinction is less appropriate, because the evolution of the population is 0957-4174/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.eswa.2012.08.061 ⇑ Corresponding author. Tel.: +34 950015787. E-mail addresses: fguil@ual.es (F. Guil), bailon@decsai.ugr.es (A. Bailón), jaberme@ual.es (J.A. Álvarez), roquemm@um.es (R. Marín). Expert Systems with Applications 40 (2013) 1296–1304 Contents lists available at SciVerse ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa