Mining Frequent Sequential Patterns under a Similarity Constraint Matthieu Capelle, Cyrille Masson, and Jean-Fran¸ cois Boulicaut Institut National des Sciences Appliqu´ ees de Lyon Laboratoire d’Ing´ eni´ erie des Syst` emes d’Information F-69621 Villeurbanne Cedex, France {cmasson,jfboulic}@lisi.insa-lyon.fr Abstract. Many practical applications are related to frequent sequen- tial pattern mining, ranging from Web Usage Mining to Bioinformatics. To ensure an appropriate extraction cost for useful mining tasks, a key issue is to push the user-defined constraints deep inside the mining algo- rithms. In this paper, we study the search for frequent sequential patterns that are also similar to an user-defined reference pattern. While the ef- fective processing of the frequency constraints is well-understood, our contribution concerns the identification of a relaxation of the similarity constraint into a convertible anti-monotone constraint. Both constraints are then used to prune the search space during a levelwise search. Prelim- inary experimental validations have confirmed the algorithm efficiency. 1 Introduction Many applications domains need for the analysis of sequences of events, like the design of personalized interface agents [5]. The extraction of frequent sequential patterns in huge databases of sequences has been heavily studied since the design of apriori-like algorithms [1,6]. Recent contributions consider the use of other criteria for the objective interestingness of the mined sequential patterns. Other kinds of user-defined constraints (e.g., enforcing a minimal gap between events) have been defined [3,10]. Provided a conjunction of constraints specifying the potential interest of patterns, the algorithmic challenge is to make use of these constraints in order to efficiently prune the search space. In this paper, we are interested in the conjunction of two constraints: a frequency constraint and a similarity constraint. Two patterns are considered similar if the similarity measure between them is smaller than some threshold. Many research have been done in that field (for a survey, see, e.g., [7]), and the similarity measure we use allows us to identify a constraint that can be efficiently used inside our levelwise mining algorithm. Indeed, by mining sequential patterns satisfying a conjunction of an anti-monotone constraint (the frequency one) and a convertible anti-monotone constraint (the similarity one), we improve the global pruning efficiency during a levelwise exploration of the candidate patterns. Research partially funded by the European contract cInQ IST 2000-26469. H. Yin et al. (Eds.): IDEAL 2002, LNCS 2412, pp. 1–6, 2002. c Springer-Verlag Berlin Heidelberg 2002